View
44
Download
3
Category
Preview:
DESCRIPTION
slides from first quarter of class.
Citation preview
Research Design in Counseling
Psychology
Fall, 2014
Tuesdays, 1:00 to 3:50; 142 HEDCO Building
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
office hrs: after class & by appt.
1
Course Overview • Scientific Methods
• Ethical research practice
• Sampling, measurement, and methods of data collection
• Research designs o Experimental
o Quasi-experimental
o Correlational
o Longitudinal
• Types of validity & plausible threats
• Culturally competent research
• Randomized clinical trials
• Implementation (taking effective interventions to scale)
2
Course Overview
Scheduled
date
Weight
(in %)
Activity
All term 15 Class Preparation & Participation
Week 3 5 CITI Certification
Week 5 25 Exam I
Week 8 30 Exam II
All term 25 In-class/Homework Activities
(5@10 pts each)
3
Introductions
Three Programs
• Counseling Psychology
• Couples & Family Therapy
• Prevention Science
Introductions
Name __________________
Program ____________________
Interests ____________________
RATE YOUR Research self-efficacy (1 2 3 4) (a little – a lot)
Research skill/experience (1 2 3 4) (a little – a lot)
4
Ways of ‘Knowing’
• Method of tenacity o The beliefs I firmly adhere to are ‘true’
• Method of authority o If noted authorities (i.e., my father, the president, my therapist, my pastor) say it is so,
then it is ‘truth’
• ’A priori’ method o What makes sense is ‘true’
• Scientific method o What is discovered through empiricism is ‘true’
o Empiricism = the approach of collecting data and using it to develop, support, or challenge a theory
5
Science A dynamic view regards science as an activity
• Make discoveries
• Learn facts
• Advance knowledge
o Establish general laws & connect knowledge of separately known
events, make reliable predictions of events yet unknown
• Improve quality of life
The basic aim of science is discovery that leads to theory
Theory 1. a set of interrelated constructs, definitions, and propositions
2. presents a systematic view of phenomena by specifying relations among
variables,
3. its purpose is to explain and predict phenomena
6
The Research Process
(theory building & testing ~ inextricably related)
7
Theory Building & Theory Testing in
Research
8
Fundamentals of Scientific Exploration
1. Describe o What is happening? How does it occur?
o Identify and understand phenomena—special, meaningful events whose cause is in question—in order to reveal their underlying regularities
o Enables us to build models and construct theory to account for those regularities
2. Explain o Why is it happening? How are things interrelated?
o Involves revealing the nature and structure of phenomena and their operation in specific conditions
• Empirical pattern identification
• Theory testing
3. Predict o Speculate or test what will happen in the future, based on our (theoretical/empirical)
models for what happens and why it happens
4. (Influence)
9
The Scientific Method
• Make observations
• Ask questions about the observations
(i.e., frequency, association, causal)
• Form a hypothesis
• Design research study appropriate to test the hypothesis
• Collect data
• Analyze data
• Accept or reject the hypothesis o Accept ~ confirm theory
o Reject ~ reject or revise theory
10
Three Kinds of Conclusions To
Draw from Research • Frequency Claims
o Describe a particular rate or level of something
o Typically focus on 1 variable
o Variable is measured, not manipulated
• “More than 2 million U.S. teens depressed”
• “Half of Americans struggle to stay happy”
• “Almost 1 million children were abused or neglected last year”
• Associational Claims o Argue that 2+ variables are related (+/-)
o Involve at least 2 variables
o Variables are measured, not manipulated
• “Belly fat linked to dementia”
• “Laptop computer use linked to poor sperm quality”
• “Poor nutrition associated with school failure”
• Causal Claims o Argue that one variable causes another
o Study must have meet 3 criteria of covariation, temporal precedence, and internal validity
• “Music lessons enhance IQ”
• “PCIT reduces child maltreatment recidivism”
• “Debt stress causes health problems”
11
Three Rules for Causation
In order to make the claim that one variable CAUSES another variable, the following 3 conditions must apply:
1. Covariation o Two variables are associated. As A changes, B changes
(e.g., A= public service ad on parenting, B = child abuse)
2. Temporal Precedence o Cause precedes effect. A appears and then B follows; changes in A precede
changes in B
(e.g., public service ads appear on TV, then child abuse rates drop)
3. Internal Validity o Plausible alternative explanations for the results (i.e., 3rd variable threats) are
ruled out
o There are no likely alternative explanations for the change in B; A is the only thing that changed
12
Instructions for Registering and Completing
CITI RCR training
• Go to the https://www.citiprogram.org/ CITI Website
• New Users: Click on the “New Users Register Here” link. o From the “Participating Institutions” drop-down menu, select “University of Oregon” as your institution.
o Create your username, password and security question and answer.
o Enter your contact information.
• To complete the CITI course, you must complete all required modules and quizzes, achieving a minimum passing score of 80%. A quiz can be taken more than once to achieve this minimum score. You are not required to complete the course in one sitting. Your progress will be saved if you choose to stop the course and return at a later time.
• When you complete all required modules successfully, please print or download your completion report. A copy will be sent automatically to Research Compliance Services. Send a copy of your completion report to Dr. Skowron, at eskowron@uoregon.edu with the message topic “CITI training completed”. You can return to the CITI site at any time to obtain a copy of your completion report.
13
Next Week
• Research Continuum
• Variables & methods of
measurement
• Developing a research study
_________________
• Ethical conduct in research
• CITI training review
14
Research Design in Counseling
Psychology
Class 2
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
1
Fundamentals of Scientific Exploration aka “the course of scientific progress”
• Description o What is happening? How does it occur?
o Identify and understand phenomena—special, meaningful events whose cause is in question—in order to reveal their underlying regularities
o Enables us to build models and construct theory to account for those regularities
• Explanation o Why is it happening? How are things interrelated?
o Involves revealing the nature and structure of phenomena and their operation in specific conditions
• Empirical pattern identification
• Theory testing
• Prediction o Speculate or test what will happen in the future, based on our (theoretical/empirical)
models for what happens and why it happens
• (Influencing)
2
Theory—Data Feedback Loop
• The basic aim of science is to understand/explain natural phenomena
• These explanations are called ‘Theories’ • Instead of trying to explain each and every separate behavior of children, we seek
general explanations that encompass and link together many kinds of (similar) behavior
• We formulate hypotheses based on our theory
• We collect data to test the hypotheses
• Data informs accuracy of our theory, and leads to revisions & modifications to theory
• We formulate new hypotheses
• We collect data to test the hypotheses
• AND SO ON…
3
The Contact-Comfort Theory
(Another example of Theory-Data Cycle)
Example: Theory—Data Loop • Repairing ruptures in the therapeutic alliance in psychotherapy
(Safran et al., 2011) • Roughly 50% of psychotherapy cases experience an alliance rupture
• Theory of how to repair alliance ruptures was constructed,
• Data collected via studies of psychotherapy process during sessions—what rupture/repair processes lead to + outcomes?
• Alliance ruptures & repairs defined & measured from client, therapist, & observer perspectives
• Rupture = disagreements about the tasks of therapy, goals of treatment, or strains in the client-therapist bond
• Pattern of rupture repairs linked with good outcomes, refined, retested…
• Common rupture-repair interventions: • Therapist acknowledges rupture,
• explores it with client,
• clarify misunderstandings,
• Therapist takes responsibility for his/her contribution,
• explore relational themes (in client’s life) associated with rupture,
• link therapy rupture to common patterns in client’s life,
• facilitate new experience
5
Research Continuum
Basic—Translational—Applied
• Basic: Pure research that advances fundamental knowledge about the human world. Focuses on refuting or supporting theories. The source of most new scientific ideas and ways of thinking about the world. It can be descriptive or explanatory.
• Translational: Research that applies findings from basic science to practical applications that enhance human health and well-being. Applying knowledge from basic research is a major stumbling block in science, partially due to compartmentalization of work based on expertise.
• Applied: Form of research involving the practical application of science.
6
Developing a Research Project
• Identify a topic or area of interest
• Formulate research ‘problem’
• Specify in terms of question re: relationship between 2+
variables
• Translate question into a testable hypothesis o Is it falsifiable?
• Design study to test your hypothesis
7
Identifying Research Topics
• Personal interests/experience
• Read journals
• Study theory ____________________________________
• Science must operate at the level of observation, and
gather data to test hypotheses o Requires us to move from the construct level to the observational level
• e.g., ‘early deprivation” and ‘learning problems’
o We have to define our constructs clearly enough so that observations are
possible
8
Operationalizing Research Topics
• Constructs: are concepts that cannot be directly observed
• Variable: is a symbol to which numbers or values are
assigned; can take on any set of values; can be dichotomous
to continuous o When operationally-defined, they are observable
• Operational definitions: assign meaning to a construct/
variable by spelling out what the investigator must do to
measure it o (1) measured: describes how the variable will be measured
o (2) experimental: spells out the details of the investigator’s manipulation of a variable
o Reinforcement schedule
o Intervention type & dosage
o No operational definition can ever reflect all of a variable…
9
Types of Variables 1. Independent and Dependent variables
o We are trying to explain the DV or predict the DV
o In correlational and/or experimental studies, we look for variation in the IV to predict the DV
o In experiments we manipulate the IV and look for effects on the DV
o Causal claims: IV is presumed cause of DV; IV is antecedent & DV is consequent
o Association claims: Variables may be called ‘predictor’ (IV) and ‘criterion’ (DV)
2. Active and Attribute variables o Active variables are manipulated (e.g., dose of prevention; experimentally-induced
stressor)
o Attribute variables cannot be manipulated, can only be measured
• (e.g., most human characteristics: ethnicity, age, sex…)
o Some attribute variables may also be active, depending on your design
• (i.e., anxiety…)
3. Continuous and Categorical variables o Continuous variables take on an ordered set of values (rank, interval, ratio scale)
o Categorical variables belong to a nominal scale of measurement (two or more subsets of sets are measured (i.e., political party membership, sex, college alma mater, religion, etc.)
10
Methods of Measurement 1. Self-report
o Participant makes an observation or report on self
o + : easy to administer, economical, accesses private thoughts, feelings, behavior not accessible to investigators
o -- : vulnerable to distortion, presume client insight/understanding about construct being measured
2. Other-report (parents, therapist, teacher, etc.) o Respondents rate the participant on some dimension(s)
o + : easy to administer, economical
o -- : potential systematic bias (e.g., cultural competence of rater – cross-cultural child development study)
3. Behavioral observations o Measures of overt behavior by trained observers using coding system
o + : direct and objective
o -- : presumption that observed behavior is representative; costly; feasibility?
4. Neurobiological indices
5. Interviews o + : flexible, high completion rate
o -- : costly; feasibility?
6. Unobtrusive measures o Assessment conducted without participants’ awareness
o + : eliminates reactivity to measurement
o -- : expensive?; some types are unethical 11
Writing Research Problems and Hypotheses
1. Work with your table group to brainstorm a list of interesting
research topics (tables).
2. Work with a partner to identify two ‘topics’ of interest to you from
the list of topics. State each of these ‘topics’ as a question about
the relationship between 2 variables (2 person groups).
3. Write a definition for each of your variables.
1. Identify IVs and DVs; active vs. attribute variables
2. What methods can be used to measure each of your
variables? (e.g., self-report, observation, performance, other’s report—
teacher/parent/spouse, other)
4. Discuss in class
12
Goal of the Ethical Research
to create new knowledge (beneficence)
while preserving the dignity and welfare of
participants (non-maleficence &
autonomy)
13
Ethical scholarship
As researchers, we have responsibility to seek and share accurate information in our scholarly endeavors, in:
1. Executing a research study o Respect Ss’ rights, conduct study carefully, minimize bias in methods & measures, &
ensure both data & analyses are error-free
o Maintain raw data for 5+ years post-publication
2. Reporting our results o Accurately, honestly, note limitations…& guard against misuse of results
o “the facts are always friendly” Carl Rogers
3. In presentations & publications o Avoid duplicate/piecemeal publication
o Clearly identify multiple publications from same data set
4. Giving accurate publication credit Major contributions = authorship
• formulate research question/hypotheses, design study, conduct analyses, write manuscript
Minor contributions = footnote (i.e., editing, collect data, code data, clerical work
14
Publication Credit In case of student thesis or dissertation, APA guidelines
state that “except under exceptional circumstances, a
student is listed as principal author on any multiple-
authored article that is substantially based on the student’s
doctoral dissertation”
Plagiarism 1. Omitting necessary citations
2. Failing to cite relevant work
3. Verbatim copying of another’s writing
FIX: Give credit where/when it is due
15
History of unethical treatment of
research participants
• Nazi prison camp experiments
• Nuremberg Code o Basis for first guidelines regarding ethical treatment of research participants
• Tuskegee Syphilis Study (1932-1972) o Whistle-blower ends study
• 1974: Code of Federal Regulations implemented Public Law 93-348 (rev. 1983) o establishing Institutional Review Boards (IRBs) to protect human participants in
biomedical & behavioral research
16
Ethical Violations • Tuskegee Syphilis Study
• Researchers lied o Told men they were being treated, but none were
o Conducted painful spinal taps to track disease progression, but told men it was a “special free treatment”
• Withheld information o Men who contracted the disease were not informed
o 1947: penicillin discovered as cure, but this fact was not shared with participants
• Actively interfered with men’s efforts to get treatment
• Acts prevented men from serving in armed forces and benefiting from GI bill and benefits
• 1969: PHS employee blows whistle, no action, 1972 breaks story to Associated Press
• 1972: Study ends 17
In 1932, U.S. Public Health Service in
cooperation with the Tuskegee Institute
began a 40-year study of 600 Black
men to understand effects of syphilis
on health over time
• 400 already infected
• 200 were not
The Belmont Report: Each Principle
Has an Application
• Respect for persons – Informed consent – Protection of vulnerable populations
• Beneficence – Cost-benefit analysis for participants – Cost-benefit for society
• Justice – How are participants selected? Do they
represent the people who will benefit from the study?
Beneficence: Cost-Benefit Balance
low risk high risk
Low
benefit
hig
h b
enefit
Do the study
Don’t do
the study
Do the
study?
Do the
study?
Risk to participants
Benefit
to s
ocie
ty
APA Guidelines for ethical research practice
Guiding principles
1. Non-maleficence o First, do no harm
2. Beneficence o Do good & give back to the community
3. Justice o Fairness, including rewards for one’s labor
4. Autonomy o Right to voluntarily participate or decline to
o Underpins ‘informed consent’
5. Fidelity o Faithfulness, loyalty, keeping promises to maintain
confidentiality, etc.
20
Respect
for
pers
ons
IRB Guidelines for Ethical Treatment of
Research Participants 1. Risks and Benefits
o ID risks & work to eliminate or minimize these; protect SS from harm
o ID potential benefits to SS; clarify benefits for whom?
o Weigh the balance of risks-benefits
o Pilot all new procedures, measures
2. Informed Consent o Give SS a fair, clear, explicit summary including risks & benefits, then seek consent to
participate
o Obtain assent from children
o Consider ability to provide consent – mental competence, etc.
o Voluntariness: consent must be free of any coercion (i.e., students, institutionalized persons, client status, etc.)
o Document
3. Deception & Debriefing o Involves deliberate withholding of info or providing misinformation to SS (i.e., Cole et
al.’s Disappointment task)
o Additional responsibilities & safeguards are required with use of deception
21
IRB Guidelines for Ethical Treatment of
Research Participants
4. Confidentiality & Privacy o Protect any information that a SS shares during the study
o Concern for well-being may necessitate
• Any exceptions are clearly stated (i.e., harm to self/others)
o Anonymity = no identifiers can link you to your data
5. Treatment issues o (withholding effective treatment, deception)
o Great concern when withholding a treatment known to be effective
• Strategies: wait-list & delayed treatment groups; contrast with treatment as
usual
22
Instructions for Registering and Completing
CITI RCR training
• Go to the https://www.citiprogram.org/ CITI Website
• New Users: Click on the Register” link under Create an Account.
o Start typing “University of Oregon” as your organization and click the option when it appears.
o Enter your contact information
o Create your username, password and security question and answer.
o The next step involves optional collection of demographic information. Answer as you prefer and continue to the next step.
o Answer “No” regarding professional continuing education requirements (Not applicable to RCR users)
o Complete required questions in the next step, regarding institutional e-mail address, gender, etc.
o Skip the Human Subjects Research question and move on to the Responsible Conduct of Research (RCR) training question
o Select the RCR course most appropriate to your research discipline (i.e., social and behavioral sciences) and your status at the University (undergraduate student, graduate student, or postdoctoral researcher). If you have any questions regarding which course you should take, please contact me.
o The remaining courses do not apply to the RCR training. Click the “Complete Registration” button at the end.
• To complete the CITI course, you must complete all required modules and quizzes, achieving a minimum passing score of 80%. A quiz can be taken more than once to achieve this minimum score. You are not required to complete the course in one sitting. Your progress will be saved if you choose to stop the course and return at a later time.
• When you complete all required modules successfully, please print or download your completion report. A copy will be sent automatically to Research Compliance Services. Send a copy of your completion report to Dr. Skowron, at eskowron@uoregon.edu with the message topic “CITI training completed”. You can return to the CITI site at any time to obtain a copy of your completion report.
23
In-Class Activity 1
Ethical Concerns in Human Subjects Research
1. A prevention science researcher applies to an IRB, proposing to
observe children ages 2 to 10 eating their meals and playing in the
local McDonald’s play area. Because the area is public, the
researcher does not plan to ask for informed consent from the
children’s parents.
• What ethical concerns exist for this study?
• What questions might an IRB ask?
2. A psychologist plans to hand out surveys in her 300-level
undergraduate class. The survey asks about student study habits and
substance use. The psychologist does not ask the students to put
their names on the survey; instead, students will put completed
surveys into a large box at the back of the room. Because of the low
risk involved in participation and the anonymous nature of the survey,
the researcher requests to be exempted from formal informed consent
procedures.
• What ethical concerns exist for this study?
• What questions might an IRB ask?
3. Discuss in class & submit for grading 24
Three Kinds of Conclusions To
Draw from Research • Frequency Claims
o Describe a particular rate or level of something
o Typically focus on 1 variable
o Variable is measured, not manipulated
• “More than 2 million U.S. teens depressed”
• “Half of Americans struggle to stay happy”
• “Almost 1 million children were abused or neglected last year”
• Associational Claims o Argue that 2+ variables are related (+/-)
o Involve at least 2 variables
o Variables are measured, not manipulated
• “Belly fat linked to dementia”
• “Laptop computer use linked to poor sperm quality”
• “Poor nutrition associated with school failure”
• Causal Claims o Argue that one variable causes another
o Study must have meet 3 criteria of covariation, temporal precedence, and internal validity
• “Music lessons enhance IQ”
• “PCIT reduces child maltreatment recidivism”
• “Debt stress causes health problems”
25
Technical function of good research
design = To control variance (attend to the 4 validities)
MAXMINCON (Kerlinger, 1973, 1986)
Maximize systematic variance Maximize variance of the variables in your substantive research hypothesis
Experimental variable: make conditions as different as possible
Associational variable: seek wide range of scores/levels as possible
Minimize error variance Reduce the errors in measurement of your constructs and increase the reliability
of your measures
Control extraneous variance Control variance of extraneous or unwanted variables that may effect or relate to
your variables of interest
3 ways to control these
26
MAX ‘Maximize systematic variance’
27
Dependent variable:
Emotion dysregulation
Emotion Dysregulation
Vio
lence e
xposure
MIN ‘Minimize error variance’
Give the systematic variance (the stuff you’re interested in)
a chance to show itself
1. Sources of error variance (errors in measurement): Guessing, fatigue over time, momentary inattention, variation in responses from
trial to trial
Solutions:
2. (Un)reliability of measures: Consistency in measurement across items, raters, time, etc.
28
MIN • Reliability of your measures will constrain the strength of
association you can observe between the variables of
interest (e.g., Ghiselli et al., 1981)
• ryy = reliability of the y scores
• rox,oy = observed correlation between x and y
• rtx,ty = true correlation between x and y
29
Correction for
attenuation
MIN
30 MacCoun, 2006
MIN If our dependent variable measure is unreliable, it will
drastically underestimate the true x – y relationship.
31
CON “Control Extraneous Variance”
Identify plausible ‘3rd’ variables and control their influence on your study variables of interest in 1 of 3 ways
Principle 1: To eliminate the effect of a possible influential ‘3rd’ variable on a dependent variable, chose participants so that they are as homogeneous as possible on that ‘3rd’ variable
Principle 2: Whenever possible, randomly assign participants to experimental groups and conditions
Principle 3: control the effects of a ‘3rd’ variable by building it into the research design as an attribute variable that is measured and then statistically controlled
Principle 3a: Match participants across conditions or groups by splitting a variable into 2 or more parts, then randomize within each level
32
CON • Extraneous ‘3rd’ variables to control for…?
33
Dependent variable:
Emotion dysregulation
Violence exposure
Child Age
Caregiving
SES
Next week (Morling Ch. 3, 14; CITI)
____________________________
3 Claims
Research Designs
4 Validities
34
Research Design in Counseling
Psychology
Class 3
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
1
Week 3
________________
3 Claims
4 Validities
________________________________
Next Week: Research Designs
2
Three Kinds of Conclusions To
Draw from Research • Frequency Claims
o Describe a particular rate or level of something
o Typically focus on 1 variable
o Variable is measured, not manipulated
• “More than 2 million U.S. teens depressed”
• “Half of Americans struggle to stay happy”
• “Almost 1 million children were abused or neglected last year”
• Associational Claims o Argue that 2+ variables are related (+/-)
o Involve at least 2 variables
o Variables are measured, not manipulated
• “Belly fat linked to dementia”
• “Laptop computer use linked to poor sperm quality”
• “Poor nutrition associated with school failure”
• Causal Claims o Argue that one variable causes another
o Study must have meet 3 criteria of covariation, temporal precedence, and internal validity
• “Music lessons enhance IQ”
• “PCIT reduces child maltreatment recidivism”
• “Debt stress causes health problems”
3
Validity Issues in Research Design
To draw valid conclusions about research
questions, we must design studies to
minimize the potential for alternative
explanations of the results
4
Three Claims
Three Claims • Frequency claims
• Association claims (types of associations)
• Causal claims
Practice Identifying Claims
a. Worry may make women’s brains work overtime.
b. High “normal” blood sugar may still harm brain.
c. Want a higher GPA? Go to a private college.
d. Those with ADHD do one month’s less work a year.
e. When moms criticize, dads back off baby care.
f. Report: 16% of teens have considered suicide.
g. MMR shot does not cause autism, large study says.
h. Breastfeeding may boost children’s IQ.
i. Breastfeeding rates hit new high in United States.
j. Smiling may lower your heart rate.
k. OMG! Texting and IM-ing doesn’t affect spelling!
l. Facebook users get worse grades in college.
m. Mother’s heartburn means a hairy newborn.
Practice Identifying Claims
a. Indicate if the claim is frequency, association, or
cause.
b. For each claim, identify the variable(s).
c. For each variable, is it manipulated or measured?
d. State each variable at the conceptual level.
e. State each variable in terms of its operational
definition: How might it have been operationalized?
Interrogating the Three Claims
Using the Four Big Validities
Four (Big) Validities
Statistical Conclusion Validity: Are the variables actually statistically related?
Is the statistical test able to detect small associations/small differences (i.e., small
effects)?
Internal Validity (most relevant in studies that test for causal relations)
The extent to which observed changes in a DV are attributable to/caused by an IV?
What 3 conditions need met to establish a causal relationship? (REVIEW)
Construct Validity Do the measured variables reflect the actual constructs of interest?
Are all important aspects of the constructs represented in the study variables?
External Validity Are the study results applicable (i.e., generalizable) to other groups, settings, time-
frames?
12
Threats to Validity
13
Threats to Statistical Conclusion Validity
• Low statistical power
• Violated assumptions of your statistical tests
• “Fishing” and error rate problems
• Unreliability of measures/treatment implementation (MIN)
• Restriction of range (MAX)
• Extraneous variance (3rd variable threats) (CON)
14
Threats to Statistical Conclusion Validity • Low statistical power
o Statistical power = probability of finding a relationship or effect when it really exists (i.e., power to find a true effect)
o Type II Error = risk of failing to find a relationship (significant effect) that really exists
• Steps to increase statistical power 1. Use a larger sample size
2. Increase the effect size (MAX your systematic variance)
3. Decrease noise (MIN your error variance)
15
Threats to Statistical Conclusion Validity
• “Fishing” and error rate problems • Conducting lots of analyses on a data set and treat each as independent
• In stats analyses, we use p < .05 level of significance • Result we obtain in our study is expected to occur by chance in only 5 X out of
every 100 times we run the analysis
• Odds are 5 out of 100 that we will see a relationship (i.e., significant effect) even if none exists
o Query: What are the chances of finding a significant effect if you:
• conduct 10 separate tests with your data? p = ______
• 20 separate tests with your data? p = ______
• Solution: o Adjust the error rate (i.e., p-value, significance level) to reflect the number of analyses
you plan to conduct
o ‘Experiment-wise’ p = ____.05____ N tests = 4, experiment-wise p = _.0125__
N of tests = 6, “ p = _.008__
16
Threats to Statistical Conclusion Validity
17
• Unreliability of measures/treatment implementation
(MIN)
If measurement of variables
measure is unreliable,
it will drastically underestimate
the true x – y relationship.
MIN
18 MacCoun, 2006
Threats to Statistical Conclusion Validity
• Restriction of range (MAX)
19
Threats to Statistical Conclusion Validity
Extraneous variance
3rd variable threats must be identified (CON)
• Control their influence on your study via…
Principle 1: To eliminate the effect of a possible influential ‘3rd’ variable on a dependent variable, chose participants so that they are as homogeneous as possible on that ‘3rd’ variable
Principle 2: Whenever possible, randomly assign participants to experimental groups and conditions
Principle 3: control the effects of a ‘3rd’ variable by building it into the research design as an attribute variable that is measured and then statistically controlled
Principle 3a: Match participants across conditions or groups by splitting a variable into 2 or more parts, then randomize within each level
20
Threats to Internal Validity
Compromise our confidence in assertions that a relationship/effect exists between the independent and dependent variables.
• History
• Maturation
• Statistical regression (law of initial values)
• Selection
• (Differential) Attrition
• Testing
• Instrumentation
• Compensatory equalization of treatments
• Resentful demoralization
• Treatment diffusion
21
Threats to Internal Validity
History: Did some unanticipated event occur while
the experiment was in progress and did these
events affect the dependent variable?
A threat for the one-group design, but not for two-group designs
In the one-group pre-test post-test design, the effect of the treatment
produces the difference in the pre- and post-test scores. This difference
may be due to the treatment or to history.
22
Threats to Internal Validity
History:
Not a threat for two-group designs (i.e., treatment/experimental group vs.
comparison/control group).
If the history threat occurs for both groups, the difference between the
two groups will not be due to the history event.
23
Threats to Internal Validity
Maturation: were changes in the dependent
variable due to normal developmental processes
operating within the participant as a function of
time?
Is a threat for the one-group design.
Is not a threat for the two-group design, assuming that participants in
both groups change (‘mature’) at the same rate.
24
Examples: Threats to Internal Validity
History: In a short intervention designed to
investigate the effect of computer-based self-control
instruction, participants missed some instruction
because of a power failure at the school.
Maturation: the performance of 1st graders in a
learning experiment begins decreasing after 45
minutes due to fatigue
25
Threats to Internal Validity
Statistical regression: An effect that is the result
of a tendency for participants selected on the
bases of extreme scores to regress towards the
mean on subsequent tests.
When measurement of the dependent variable is not perfectly reliable,
there is a tendency for extreme scores to regress or move toward the
mean over time.
The amount of regression to the mean is inversely related to the
reliability of the test.
26
Examples: Threats to Internal Validity
Statistical regression:
In a study of family therapy, participating children grouped
because of high anxiety scores show considerably greater
reductions in anxiety than do the groups who scored average
and low on anxiety at the pre-test.
27
Threats to Internal Validity
Selection: Refers to selecting participants for the
various groups in the study. Are the groups
equivalent at the beginning of the study?
This is not a threat in studies that employ random sampling and random
assignment. All participants have an equal chance of being in the
treatment or comparison groups and the groups are equivalent.
Were participants self-selecting into experimental and comparison
groups? This could compromise the internal validity of the study.
Selection is not a threat for the one-group design but is a threat for the
two-group design.
28
Threats to Internal Validity
Differential Attrition: Differential loss of
participants across groups.
Did some participants drop out? Did this affect the results?
Did about the same number of participants make it through the entire
study in both experimental and comparison groups?
This is a threat for any design with more than one-group.
29
Threats to Internal Validity
• Testing: Did the pre-test affect scores on the post-
test?
o A pre-test may sensitize participants in unanticipated ways and their
performance on the post-test may be due to the pre-test, not to the
treatment, or more likely, an interaction of the pre-test and treatment.
o This is a threat for one-group designs.
o Not a threat for two-group designs. Both groups are exposed to the pre-
test and so the difference between groups will not be due to testing.
30
Examples: Threats to Internal Validity
Selection: The experimental group in a study of self-control consisted of a high-ability class, while the comparison group was an average-ability class.
(Differential) Attrition: In a health-promotion intervention designed to test the effect of various exercises, those participants who dislike exercise most, stopped participating.
Testing: In an experiment with logical reasoning performance as the dependent variable, a pre-test familiarizes the participants with the post-test and how to perform well
31
Threats to Internal Validity
• Instrumentation: Did any change occur during the
study in the way that the dependent variable was
measured? o Is a threat for one-group designs, not for the two-group designs.
o Why? _________________________________
• Treatment diffusion: Did the comparison group
know or find out about the experimental/
intervention group and what transpired? o A threat for two-group designs.
32
Examples: Threats to Internal Validity
• Instrumentation: Two research assistants for a
self-control experiment with preschoolers
administered the post-test with different
instructions and procedures.
• Treatment diffusion: In an intervention study to
enhance college student adjustment, students in
the treatment and placebo control groups ‘compare
notes’ about what they are learning in sessions.
33
Threats to Internal/Construct Validity
• Compensatory equalization or rivalry: These simply weaken or strength the effect sizes associated with the intervention.
• Resentful demoralization: If participants learn that their group receives less desirable goods or services, they may feel resentful, demoralized and perform particularly low on the dependent variable. o What effect would this have on treatment vs. control group differences
_____________________________?
May increase magnitude of group differences, leading to an overestimate of the effect
34
Threats to Construct Validity
• Inadequate explication of the constructs
• Construct confounding
• Mono-operation bias
• Mono-method bias
35
Threats to Construct Validity
• Are all important aspects of the constructs represented in the independent and dependent variables? o If yes, good
o If not, the constructs are underrepresented
• Do the independent and dependent variables also represent constructs that are not of interest in the study? o If yes, there are surplus construct irrelevancies
o If no, good
• Inadequate explication of the constructs
• Construct confounding
36
Threats to Construct Validity Campbell and Fiske (1959) proposed two kinds of construct-validation evidence:
1. evidence of convergent validity o evidenced by achieving similar results (convergence) across different measures of the
same construct or different manipulations of the same construct. In other words, your measure of binge drinking would be expected to correlate with other existing measures of binge drinking and similar constructs, such as ____________ and __________________.
2. evidence of discriminant validity
o evidenced by observing no associations between your measure of ___________ and measures of other, unrelated constructs, such as ___________________ and ____________________. For example we would expect that your measure of binge drinking would not be correlated with unrelated constructs, such as __________ or _____________.
• Mono-operation bias
• Mono-method bias
• http://www.youtube.com/watch?v=1Y3v5dgWlWM
37
External Validity
External validity refers to the degree to which the results of
an empirical investigation can be generalized to and across
individuals, settings, and times
External validity can be divided into
Population validity
Ecological validity
38
External Validity
Population Validity:
How representative is the sample of the population?
The more representative, the more confident we can
be in generalizing from the sample to the population.
How widely does the finding apply? Generalizing
across populations occurs when a particular research
finding works across many different kinds of people,
even those not represented in the sample.
39
External Validity
Ecological Validity is present to the degree that a result
generalizes across settings. Types include:
Interaction effect of testing
Interaction effects of selection biases and
experimental treatment
Reactive effects of experimental arrangements
Multiple-treatment interference
Experimenter effects.
40
Threats to External Validity
• Interaction of selection and treatment o A characteristic of the treated group that interacts with the treatment
o Randomization would correct
• Example: o An experimental evaluation of a new teaching method is conducted in a sample
of low achieving students.
o Results will not generalize to a sample of students with heterogeneous
abilities/achievement levels
41
Summary: External Validity
It’s a population, not the population
External validity comes from how, not how many.
Just because a sample comes from a population doesn’t
mean it generalizes to that population.
To Be Important, Must a Study Have
External Validity?
• Generalizing to other participants
• Generalizing to other settings
• Does a study have to be generalizable to many people?
• Does a study have to take place in a real-world setting?
Does a Study Need to Be
Generalizable to Many People?
Generalization mode
– Frequency claims
– Goal is to make a claim
about a population
– Real-world matters
External validity is essential!
Theory-testing mode
– Association and causal claims
– Goal is to test a theory rigorously, isolate variables
– Prioritize internal validity
– Artificial situations may be required
– Real world comes later
External validity is not the priority!
Does a Study Have to Take Place in a
Real-World Setting?
Theory-testing mode often requires
artificial settings.
Even laboratory settings can feel
emotionally real. – Experimental realism
Prioritizing Validities • Which validity is appropriate to
interrogate for every study?
• Which validities are not always relevant for a study?
• Why can’t researchers achieve all four validities in a single study?
• Which two validities are most often in trade-off?
• Which validity is most under the researcher’s control?
That study’s just not valid!
In-Class Activity 2
Return to 2 of your research topics of interest.
a. For 1 of your topics of interest, construct a research question that is framed as:
• A frequency claim
• An associational claim
• A causal claim
b. For each research question (3 total) prepare an operational definition of your constructs (i.e., measureable variables) and specify the method you will use to measure the construct
• Identify your IVs and DVs, and note whether your variables are Active or Attribute variables, and Continuous or Categorical variables
c. Restate each of your research questions as directional relationships between your measured constructs
d. Identify at least 3 possible threats to validity that may be relevant to the studies you design to test the associational and causal claim questions. (Select at least 2 threats to internal validity.) How could interpretation of your findings be impacted by each of these potential threats?
47
To Be Important, a Study
Must Be Replicable
Replication Studies
• Direct replication
• Conceptual replication
• Replication-plus-extension
• Meta-analysis
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication
Replication Studies
Direct replication Same variables, same
operationalizations
Conceptual replication Same variables, different
operationalizations
Replication-plus-
extension Same variables, plus some
new variables
How Meaningful Is That Effect Size?
The question is,
is the study
valid?
That is not a valid study.
Say this: Not that:
How’s the construct validity?
Is external validity
relevant here?
Can the study support
a causal claim?
Week 4
Research Designs ________________________________
oPre-experimental
oExperimental
oQuasi-experimental
56
Research Design in Counseling
Psychology
Class 4
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
office hrs: Tues 12-1 pm
1
Review In class activity #2
2
In-Class Activity 2
Return to 2 of your research topics of interest.
a. For 1 of your topics of interest, construct a research question that is framed as:
• A frequency claim
• An associational claim
• A causal claim
b. For each research question (3 total) prepare an operational definition of your constructs (i.e., measureable variables) and specify the method you will use to measure the construct
• Identify your IVs and DVs, and note whether your variables are Active or Attribute variables, and Continuous or Categorical variables
c. Restate each of your research questions as directional relationships between your measured constructs
d. Identify at least 3 possible threats to (internal) validity that may be relevant to the studies you design to test the associational and causal claim questions. Why might these potential threats be an issue with tests of your research question?
3
Research Designs
• Pre-Experimental
• Experimental
• Quasi-Experimental
• Conducted in the lab vs. in the field
• Making associative or causal claims
4
Three Criteria for Causation • Covariance
• Temporal precedence
• Internal validity
Research Design
6
Random Assignment
used?
Experiment
Quasi-
experimental
Pre-
experimental
One or more IVs are
manipulated?
yes no
yes no
IV manipulated?
yes
In-Class Activity 3
List your research questions that frame
• An associational claim
• A causal claim
a. Use an experimental design to test your causal research question • Identify one IV and one DV
• Select and describe your design choice
o Which threats to internal validity does it control and why?
o List 1 threat to external validity that exists and why.
o Explain how each threat would impact interpretation of your findings.
b. Use a quasi-experimental design to test your research question • Identify one IV and one DV
• Select and describe your design choice
o Which threats to internal validity does it control and why?
o Which 2-3 threats to internal validity does it NOT control and why?
o Explain how each threat would impact interpretation of your findings.
7
8
Pre-Experimental Designs
9
• Heppner, Kivlighan, & Wampold
(2008) refer to these three
designs as “uninterpretable”
• Multiple threats to internal
validity of these studies
No way to infer that any change has taken
place; maturation & history can’t be ruled
out because no control group was used.
Great difficulty attributing results to the
intervention. Groups could differ in many
different ways beyond treatment effects.
Can’t discern those possible differences.
Better than one-shot case study, because
we can determine if change occurred.
Cause of change remains ambiguous.
Pre-Experimental Designs
• One shot case study
• One group pretest-posttest study
• Static group comparison study
10
11
Research Design
12
Random Assignment
used?
Experiment
Quasi-
experimental
Pre-
experimental
One or more IVs are
manipulated?
yes no
yes no
IV manipulated?
yes
Experimental Designs
13
Pretest-Posttest Control Group Design
R O1 X O2
R O1 O2
Posttest–Only Control Group Design
R X O2a
R O2b
Solomon Four-Group Design
R O1 X O2
R O1 O2
R X O2
R O2
Key
R randomization
O1 pretest
O2 posttest X intervention
Randomize participants to 2+ groups (1
treatment & 1 no-tx, i.e., control). Both
groups get a pre- and post-test. Enables
test of X on O2, reflected in the differences
observed across groups.
Pretest: helps clarify source of diff
attrition, strengthens stat test by
controlling for pre-tx differences in the DV;
assist in testing moderation effects
Randomize participants to 2+ groups (1
treatment & 1 no-tx, i.e., control). Both
groups get a post-test. Enables test of X
on O2a. Less time, expense, & avoid
repeated testing.
Used when there are
concerns about the effect
of a pretest on participants.
Added value is ability to
examine effects of pretest.
Controls for most threats
to internal validity. Is
costly in time & resources.
Experimental Designs
• Pretest-posttest control group design
• Posttest only control group design
• Solomon four-group design
14
15
Research Design
16
Random Assignment
used?
Experiment
Quasi-
experimental
Pre-
experimental
One or more IVs are
manipulated?
yes no
yes no
IV manipulated?
yes
Quasi-Experimental Designs
• No randomization
• One or more IVs are experimentally-manipulated
4 reasons to select these over a true experimental design 1. Cost
2. Sample selection
3. Ethical considerations
4. (un)Availability of suitable control groups
17
Quasi-Experimental Designs
• Three good non-equivalent
groups designs
18
Nonrandom assignment to groups. Pretest
enables us to assess for similarity of
participants on the DV (though groups
won’t be similar on other 3rd variables).
Selection may still be a threat. Less time,
expense, & avoid repeated testing.
Enables us to clarify and control for
maturation effects. Must deal with the
autocorrelations in data when they are
analyzed.
Strengthens 1st design by adding another
pretest. Clarify whether maturation is
different across groups.
Quasi-Experimental Designs
• Pretest-posttest nonequivalent groups
• Time series designs
• Nonequivalent before-after design
19
20
Technical function of good research
design = To control variance (attend to the 4 validities)
MAXMINCON (Kerlinger, 1973, 1986)
Maximize systematic variance Maximize variance of the variables in your substantive research hypothesis
Experimental variable: make conditions as different as possible
Associational variable: seek wide range of scores/levels as possible
Minimize error variance Reduce the errors in measurement of your constructs and increase the reliability
of your measures
Control extraneous variance Control variance of extraneous or unwanted variables that may effect or relate to
your variables of interest
3 ways to control these
21
In-Class Activity 3
List your research questions that frame
• An associational claim
• A causal claim
a. Use an experimental design to test your causal research question • Identify one IV and one DV
• Select and describe your design choice
o Which threats to internal validity does it control and why?
o List 1 threat to external validity that exists and why.
o Explain how each threat would impact interpretation of your findings.
b. Use a quasi-experimental design to test your research question • Identify one IV and one DV
• Select and describe your design choice
o Which threats to internal validity does it control and why?
o Which 2-3 threats to internal validity does it NOT control and why?
o Explain how each threat would impact interpretation of your findings.
22
Exam 1
• Due Tuesday, October 28thth, 2014 by 4:30 PM
o Located in Blackboard Research Design, in Assignments, named “Exam 1”
o Exam goes live Wed 8:00 am and closes following Tues 4:30 PM
• Multiple choice
• Short answer / essay questions (prepare in paragraph format using APA
style)
• “No backtracking” option enabled
23
Review In class activity #3
24
Research Design in Counseling
Psychology
Class 6
1
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building 541-346-0913 eskowron@uoregon.edu
Measurement
Data Collection
Sampling
2
3
Operationalizing Study Variables
Measurement
Constructs: are concepts that cannot be directly observed Variable: is a symbol to which numbers or values are assigned; can take on any set of values; can be dichotomous to continuous
– When operationally-defined, they are observable
Operational definitions: assign meaning to a construct/ variable by spelling out what the investigator must do to measure it
– (1) measured: describes how the variable will be measured – (2) experimental: spells out the details of the investigator’s
manipulation of a variable – Reinforcement schedule – Intervention type & dosage
4
Construct definition & operationalization
Each construct has only one Conceptual definition (i.e., researcher’s definition of the variable at an abstract level) Each construct may have multiple Operational definitions (i.e., representing the researcher’s specific decision about how to measure or manipulate the variable)
5
Conceptualizing Race, Culture, & Ethnicity
• Race – ‘The presumed classification of all human groups on the basis of
visible physical traits or phenotype and behavioral differences’ (Robert Carter, 1995)
– Not a biological reality – A social construct used to categorize people – Referenced to perpetuate power differences and social inequalities
• Ethnicity – One’s national origin, religious affiliation, or other type of socially or
geographically-defined group (Carter, 1995)
• Culture – The values, beliefs, language, rituals, traditions, and other behaviors
that are passed down from one generation to another within any social group (Helms & Cook, 1999).
6
Methods of Measurement 1. Self-report
– Participant makes an observation or report on self – + : easy to administer, economical, accesses private thoughts, feelings, behavior not accessible to
investigators – -- : vulnerable to distortion, presume client insight/understanding about construct being measured
2. Other-report (parents, therapist, teacher, etc.) – Respondents rate the participant on some dimension(s) – + : easy to administer, economical – -- : potential systematic bias (e.g., cultural competence of rater – cross-cultural child development study)
3. Behavioral observations – Measures of overt behavior by trained observers using coding system – + : direct and objective – -- : presumption that observed behavior is representative; costly; feasibility?
4. Neurobiological indices 5. Interviews
– + : flexible, high completion rate – -- : costly; feasibility?
6. Unobtrusive measures – Assessment conducted without participants’ awareness – + : eliminates reactivity to measurement – -- : expensive?; some types are unethical
7
Operationalizing the Independent Variable
Those you can manipulate (i.e., active IVs)
1. Determining conditions of the IV – Referred to as levels of the IV, groups, categories, and
treatments interchangeable terms – These are often categorical variables (but don’t have to
b…) – Conditions of the IV are determined by YOU, the
researcher…bc they are manipulated
2. Adequately reflecting the constructs of interest – Your IV must be well-defined and operationalized – See ‘psychometrics’ section below
8
Operationalizing the Independent Variable
3. Limiting differences between conditions – Try to make sure that the different conditions of the IV
differ only on the dimension of interest (e.g., math problem difficulty groups-easy, moderate, hard) and not other dimensions (how much tutoring was available….etc.)
4. Establishing the salience of differences in conditions ______________________ Manipulation checks
– Used to verify that the conditions of the IV • differed as intended • Didn’t differ on other dimensions • And that treatments were implemented in the intended fashion 9
Operationalizing the Independent Variable
Those you cannot manipulate, i.e., attribute IVs, aka ‘status’ variables in HWK
• Statistical tests with these variables are used to detect associations
• FYI: Stats used in tests of associational and causal claims are basically the same, but it is more difficult to draw causal inferences with status variables because they are not manipulated
• IT IS THE RESEARCH (STUDY) DESIGN, NOT THE STATS ANALYSIS USED, THAT DETERMINES THE INFERENCE STATUS OF THE STUDY
– i.e., associational claims vs. causal claims, etc.
10
Operationalizing the Dependent Variable
• Have a rationale for why you selected the DVs of choice, and not others, and why you operationalized the DV in the manner you did
• e.g., Webster-Stratton (1988) parent-report of child behavior is a function of parent psychopathology
• Orlinsky et al. (1994) psychotherapy outcome ratings differ per therapist, client, and observer ratings
1. Insure measure used to operationalize the DV is psychometrically-strong (i.e., good reliability & validity)
2. Consider role of reactivity in DV assessment 3. Consider other procedural issues with DV assessment
– Administration time – Order of presentation – Reading level
11
Scales of Measurement
Categorical scales
• Nominal (i.e., categorical) – A scale with numerical values that represent categories of an attribute or "name" the attribute
uniquely – e.g., sex or ethnicity – NOTE: You cannot subject nominal scale measures to the same statistical tests that other three
can
Quantitative scales • Ordinal
– measurement of some the attributes that can be rank-ordered – e.g., years of schooling completed
• Interval – Measurement that is rank-ordered AND the distance between locations on the scale do have
meaning – e.g., measurement of temperature in Fahrenheit or Celsius (40 degrees is twice as hot as 20
degrees)
• Ratio – Measurement that is rank-ordered AND the distance between locations on the scale do have
meaning and there is an absolute zero that is meaningful – e.g., number of study participants who re-abused their children following treatment
12
13
Measurement Activities
1. Classify each operational variable below as categorical or
quantitative. If the variable is quantitative, further classify it
as ordinal, interval, or ratio. a) Number of books a person owns b) A book’s sales rank on amazon.com c) Location of a person’s hometown (urban, rural, or
suburban) d) Nationality of the participants in a cross-cultural study
of Canadian, Ghanaian, and French students e) A student’s grade in school
Psychometrics
• Reliability of measures
• Validity of measures
• Relationship between R & V
14
Reliability
• Internal consistency: the extent to which items within a test are similar or ‘hang together’
• use a single instrument administered to a group of people on one occasion • Compute Cronbach's Alpha: an index of intercorrelations between all items on test • Reliability estimates = .70 or higher indicate very good reliability
• Inter-rater: degree to which different raters/coders give consistent ratings/scores of the same phenomenon
• Two or more raters code same phenomenon • Categorical measures:
– Calculate the percent of agreement between the raters – Adjust this for ‘chance agreement’ using kappa coefficient
• Continuous measures: – calculate the correlation between the ratings of the two observers
• Test-retest: consistency of a measure from one time to another • Use a single instrument administered to a group of people on two+ occasions • Calculate a correlation • Is this best used for measures of constructs that are State or Trait-like? Stable or shifting
over time? Why…? • Shorter the time gap, the higher the correlation; the longer the time gap, the lower the
correlation
15
16
2. For each measure below, indicate which kinds of reliability would be
appropriate to evaluate.
a) Researchers place unobtrusive video recording devices in the living rooms of
20 children. Later, coders view tapes of the living areas and code how many
minutes each child spends playing video games. b) Clinical psychologists have developed a seven-item self-report measure to
quickly identify people who are at risk for post-traumatic stress disorder. c) Psychologists measure how long it takes a mouse to learn an eye-blink
response. For 60 trials, they present a mouse with a distinctive blue light
followed immediately by a puff of air. The 5th, 10th, and 15th trials are test
trials, in which they present the blue light alone (without the air puff). The
mouse is said to have learned the eye blink response if observers record that it
blinked its eyes in response to a blue light test trial. The earlier in the 60 trials
the mouse shows eye-blink response; the faster it has learned the response. d) A restaurant owner uses a response card with four items in order to evaluate
how satisfied customers with the food, service, ambience, and overall
experience. Each item is scaled from one to four stars. e) Educational psychologists use teacher ratings of classroom shyness (on a nine-
point scale, where 1 = “not at all shy in class” and 9 = “very shy in class”) to
measure children’s temperament.
Validity (of measures)
• Physical science is fortunate to have standard measurements
– e.g., Platinum-iridium bar kept at U.S. NIST – international standard for length of 1 meter
– I can compare my 1 meter ruler to this standard and know if it measures what it’s supposed to measure
– No such luck in the social sciences…our constructs are typically not directly observable (i.e., anxiety, happiness, self-regulation)
– No way to directly measure these constructs – We work with estimations (via self report, observed behavior, neurobiological
measures, other’s reports, etc.)
• Construct validity = to what extent is our measure of X really tapping into it?
– Definition: to what extent does this test/measure (i.e., an operationalization) accurately reflects the construct it’s intended to measure?
17
18
CONTENT VALIDITY
The measure contains all parts that your
theory says it should contain
Four Empirical Ways to Assess
Validity
Reliability Do you get consistent
scores every time?
Measurement
(construct) Validity
Does it measure what you intend to measure?
Two subjective ways
to assess validity
Predictive validity
Your measure is correlated with a relevant outcome in
the future
Convergent validity Measure is more strongly
associated with measures of similar constructs
Discriminant validity
Measure is less strongly associated with measures of
dissimilar constructs
Concurrent validity Your measure is
correlated with a relevant outcome now,
in the present
TEST-RETEST RELIABILITY
People get consistent scores every time they
take the test
FACE VALIDITY
It looks like what you want to
measure
INTERNAL CONSISTENCY RELIABILITY
People give consistent scores on every item on a
questionnaire
INTERRATER RELIABILITY Two coders’ ratings of a
behavior are consistent with each other
Morling, 2012
Relationship between Reliability & Validity
19
Reliability is a necessary but not sufficient condition for validity
Relationship between Reliability & Validity
20
Concurrent & Predictive Validity
• Both evaluate whether scores on your measure are related to scores on other concrete outcomes that they should be related to
• e.g., measure of clinical skills/aptitude or graduate school aptitude Concurrent Validity
– Does your measure correlate with a relevant ‘outcome’ right now, in the present
– e.g., correlate scores on your measure of clinical skill with outcome (client ratings of therapeutic alliance; ______________)
Predictive validity – Does your measure correlate with a relevant ‘outcome’ measured in
the future – e.g., correlate scores on your measure of clinical skill assessed now,
with an outcome measured in the future (client improvement in therapy; ____________)
• Can calculate via a correlation coefficient, r
21
Convergent & Discriminant Validity
• Does the test show a meaningful pattern of associations with other measures • Your measure should:
• Correlate more strongly with other measures of similar constructs, and • Correlate less strongly with measures of other, different constructs
Convergent Validity – Your measure correlates more strongly with other measures of similar constructs
• e.g., Differentiation of self scores should correlate with: __________________ • ______________________________________________________________
Discriminant Validity – Your measure correlates less strongly with measures of other, different constructs
• e.g., Differentiation of self scores should NOT correlate with: ______________ • ______________________________________________________________
• Can also calculate via a correlation coefficient, r • No absolute level of correlation indicates convergent or discriminate validity
evidence…look to the pattern of findings across the nomological net
22
Cultural Validity “…is concerned with the construct, concurrent, and predictive validity of theories and models across cultures, i.e., cultural ly different individuals” (Leong & Brown,
1995, p. 144)
• Planning your study • Use MC theories to conceptualize the research; consult with cultural communities • Translate demographics into salient psychological characteristics (e.g., ethnic identity
development, experience of micro-aggressions)
• Selection of measures • Use multiple measures to represent each construct • Pilot test measures with your target population • Use culturally congruent measures in your study • Create or adapt ethnocentric measures
• Recruiting participants • Representative of your target population • Use procedures congruent for this cultural group • Recruit to represent underlying psychological characteristics of interest
• Analyzing your data • Evaluate cultural hypotheses & rival, competing hypotheses • Examine moderator effects of cultural variables
• Interpreting results • Design your study to benefit participants directly • Represent participants’ voices authentically when interpret data • Integrate service into community as way of ‘giving back’ • Engage participants in interpretation of data and share findings 23
Using Factorial Designs to Study External Validity
• Factorial designs are comprised of at least two independent variables, and each IV has 2+ levels – IV-1: intervention (treatment, control group: 2 levels) – IV-2: status variable (i.e., demographic or individual difference
variable) (e.g., gender: male, female: 2 levels)
Independent Variable 1. Treatment 2. Control 1.male 2. female
• Enables us to learn whether the treatment works or works
better for one level of the status variable than another (via ‘interaction’ effects)
24
Gender
Recommendations for conducting culturally-valid quantitative research
– Identify demographic variables that serve as proxy variables & measure those social-psychological variables directly
• Ethnic & racial group status as a proxy for socio-economic status • Racial group status as proxy for stage of racial identity development
– Evaluate external validity of studies, not solely based on demographic
characteristics of a sample, but on salient psychology characteristics & a strong theoretical rationale
e.g., potential generalization of research on racial identity development from African-American samples to other stigmatized ethno-cultural populations
– Benefits – Conceptual generalization promotes better theory building, and – Use of social-psychological characteristics (rather than simple demographics)…
• …may limit use of inappropriate generalizations to an entire population • …would enable focus on psychological antecedents for psychological
outcomes • …could divert efforts away from token sampling of ethno-cultural groups that include only highly acculturated members of who fail to represent the important psychological characteristics of the larger population
25
Recommendations continued…
• Improve construct validity in measurement via evidence of cultural equivalence of tests/measures
– Linguistic equivalence: do translated items carry the same meaning in the target language as they do in their source language?
– Functional equivalence: does the phenomenon have similar functions across cultures? (e.g., assertiveness as ‘adaptive’)
– Conceptual equivalence: does the concept have an equivalent in other cultures? (e.g., defining IQ…)
– Psychometric equivalence: are the ways in which the concept is quantified equivalent across cultural groups? (e.g., timed components of IQ test…)
• Involve indigenous experts in formulating theory, study hypotheses, research procedures, & interpretation of results
• Strengthen cultural validity of your research study
26
Face & Content Validity (most subjective)
Face Validity
– Weakest way to try to demonstrate construct validity – To what extent does this measure appear "on its face" to
be a good translation of the construct – Is essentially a subjective judgment call
Content validity – Involves a subjective check the operationalization against
the relevant content domain for the construct. – Often involves surveying ‘expert’ in the content domain to
evaluate content capture of your measure
27
Concluding Notes re: Measurement of Constructs
1. A single operationalization (i.e., single scale or instrument) will almost always poorly represent a construct
2. The correlation between two constructs is attenuated (i.e., weakened) by unreliability in measurement
3. Unreliability always makes it more difficult to detect true effects (should any be present) because of reduced statistical power.
4. The correlation between two measures using the same method is inflated by (shared) method variance.
5. If possible, multiple measures using multiple methods should be used to operationalize a construct.
6. Typically, interpretations of relationship should be made at the construct level, for seldom are we interested in the measures per se. Awareness of the effects of unreliability and method variance is critical for drawing proper conclusions.
28
29 SAMPLING
Sampling • When we consider external validity, we ask whether results of a particular
study can be generalized, to other people in the population, or to kinds of settings we’re interested in.
• To interrogate the external validity of a frequency or causal claim, we ask for example: – Do clients who rated this therapist’s warmth adequately represent all of the
therapist’s former clients? – Can we predict the results of the presidential election from the results of this
poll taken from these 1,500 people?
• Sample: portion of the population, e.g., one potato chip • Population: all, e.g., the whole bag of chips
• You don’t need to study the whole population. You just need to insure
that the sample you study adequately reflects the population
30
Sampling
– Define your population of interest – Now you can assess how well your sample represents it
• Bias – Samples are bias when they are unrepresentative of the population – Biased samples lead you to draw the wrong conclusions about the
population
• e.g., your 1 potato chip is burnt (biased sample) • This would lead you to conclude something wrong about the whole bag of chips
• e.g., Presidential election poll • Biased sample would include too many of the most unusual (not typical) people
• e.g., Therapist ratings • Clients who rate their therapist on a website may tend to be ones who are angry
or disgruntled, and not represent the rest of the therapists’ clients very well
31
Sources of Biased Samples
• Sampling only those people who are easy to
contact
• Sampling only those who you can contact
• Sampling only those who self-select (i.e., invite themselves)
32
Getting a Representative Sample Probability sampling
– Draw the sample at random from that population – Every member of the population has an equal chance of being in the sample
1. Simple random sampling
1. Most basic form of prob. sampling, but difficult and time-consuming 1. Assign a number to every person in the pop. 2. Use a table of random numbers to select a sample from the pop
2. Cluster sampling 1. Start with a list of clusters and take a random sample of clusters from that list and include
every person from each of those selected clusters 2. E.g., what to randomly sample school districts in OR; start with list of districts (clusters) in
the area, and randomly select 4 of those districts (clusters) and include every child from each cluster in your sample
3. Multistage sampling 1. Similar to #2: but you select two random samples 2. Start with a list of clusters and take a random sample of clusters from that list but then take a
random sample of children rom each selected clusters
4. Stratified random sampling 1. Select particular demographic characteristics on purpose and then randomly select
individuals within each of the categories 2. e.g., in a study of self regulation development, stratify on child age to obtain at least X
number kids from age 3, age 4, and age 5 into the study
33
Getting a Representative Sample
Probability sampling
– Draw the sample at random from that population – Every member of the population has an equal chance
of being in the sample
4. Stratified Random Sampling cont. – Oversampling
• Is a variant of stratified random sampling • Use stratified random sampling and deliberately include
more of one group, usually when that group is difficult to engage in research or in low numbers in your population. – e.g., oversampling for physically-abused children helps to insure
there are adequate numbers of participants in the sample
34
Random Sampling vs.
Random Assignment
• Random sampling (i.e., probability sampling) – Get a sample using some random method so that each member of the
population of interest has equal chance of being in the sample
– Enhances ___________ validity
• Random assignment (used only in experimental designs) – Assign members of the sample at random to the groups or conditions
of the IV, for example, by flipping a coin
– Enhances ____________ validity
35
Non-Representative Sampling Methods
• Convenience sampling
– Samples chosen on the basis of who is easy to access
• Purposive sampling – Choosing a sample of only certain kinds of people you want to
study
• Snowball sampling – A variation of purposive sampling used to find rare individuals
for a research study or the sample is otherwise hard to obtain – Each participant in the study is asked to recommend a few
acquaintances to the study
36
Research Design in Counseling
Psychology
Class 7
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
1
Three conditions for determining
causality
1. Co-variation (i.e., correlation)
2. Temporal precedence
3. Ruling out alternative explanations (due to extraneous
3rd variable threats…i.e., internal validity)
2
Tools for Testing “Associational”
Hypotheses
Kinds of studies that lead to associational claims
• Correlational research (i.e., ex post facto) o 2+ measured variables (regardless of the stats used) make a study correlational
o Prioritize construct validity & statistical conclusion validity, & external validity
o Avoid temptation to make causal inferences from these kinds of studies
Kinds of graphs & statistics used to describe associations
• Bivariate correlations o Positive, negative, zero, & curvilinear
o Graph association between scores on 2 variables using Scatterplot and a correlation coefficient, r
Designing and evaluating studies that make an associational claim (via the 4 big validities)
3
Testing “Associational” Hypotheses
• Bivariate correlations
o Positive, negative, zero, & curvilinear
o Graph association between scores on 2 variables using scatterplot
o Calculate strength of correlation coefficient, r
4
Testing “Associational” Hypotheses
Bivariate
correlations
5
Correlation coefficients (r)
Type Effect Size
Small Medium Large
r .10 .30 .50
d/g .20 .50 .80
ratio 1.50 2.50 4.25
6
Associations:
• between 2 continuous variables: correlation coefficient, r
• when 1 variable is categorical: t test (or a point-biserial correlation)
• when both variables are categories: phi coefficient
Ascertain strength of associations: Cohen’s conventions…
Testing Associational Hypotheses
• Designing and evaluating studies that make an
associational claim (via the 4 big validities)
• Statistical conclusion validity 1. Effect size?
2. Correlation statistically significant?
3. Are there subgroups?
7
Testing Associational Hypotheses
• Statistical conclusion validity o Are there subgroups?
• Interpret scatterplot below…
• Consider subgroups (class standing)
8 # Absences
GPA
Testing Associational Hypotheses
• Statistical conclusion validity o Are there subgroups?
• Now consider subgroups (class standing) and interpret scatterplot…
9 # Absences
GPA
Freshmen
Juniors
Seniors
Testing Associational Hypotheses
• Statistical conclusion validity o Could outliers (extreme scores) be affecting the relationship between
variables?
o More likely with smaller samples
10 # Absences
GPA
Testing Associational Hypotheses
• Construct validity o How well were our variables measured?
o Good Reliability?
o Does each measure what it’s intended to measure (Validity)?
11
Testing Associational Hypotheses
• External validity o To whom can we generalize?
o To whom do we wish to generalize?
o Which population(s) did we sample from?
o What methods did we employ to sample?
o Moderating variables
• In what subgroups does the association exist?
• Goal: to learn whether the association is different within different levels of
the moderator (e.g., at low SES, moderate SES, or high SES)
12
Three conditions for determining
causality
1. Co-variation (i.e., correlation)
2. Temporal precedence
3. Ruling out alternative explanations (due to extraneous
3rd variable threats…i.e., internal validity)
13
Establishing Temporal Precedence
• Longitudinal designs: enable us to examine evidence for
temporal precedence in the relation between our 2
variables of interest o Useful for other reasons as well
o There are many variables that we cannot manipulate, or it would be unethical to
do so (e.g., exposure to violent TV shows; smoking)
o Thus useful when experiments are not practical
• How to: o Measure same variables in same people over two+ different time points
14
Longitudinal designs
• Testing temporal associations between watching violent TV shows and
aggression
15
TV
Violence
3rd grade
TV
Violence
13th grade
Aggression
13th grade
Aggression
3rd grade
1. Cross-sectional correlations
2. Autocorrelations
3. Cross-lagged correlations
Longitudinal designs
(intensive repeated measures) • Temporal associations between maternal physiology & harsh parenting
16
Hostile
control
Hostile control
(30” later)
Physiological
arousal
(30” later)
Physiological
arousal
1. Cross-sectional correlations
2. Autocorrelations
3. Cross-lagged correlations
Longitudinal Designs
• Interrupted Time Series
17
Longitudinal Designs
• Stable Baseline Designs o Assess baseline via multiple assessments over time in an extended fashion to
establish consistent scores, then introduce the intervention/experimental
condition and continue with over time assessments post-intervention
• Multiple Baseline Designs o Introduction of intervention components is staggered across time, contexts, or
situations (e.g., 3 problem behaviors in classroom identified—introduce
intervention for each one in staggered fashion—continue to assess all behaviors)
• Reversal Designs • Best used in situations when the intervention would not cause lasting
change (i.e., to test a therapy or educational intervention)
• Some ethical concerns with ‘withdrawal’ a treatment
18
Longitudinal Designs
• Stable Baseline Designs o Assess baseline via multiple assessments over time in an extended fashion to
establish consistent scores, then introduce the intervention/experimental
condition and continue with over time assessments post-intervention
19
On-task
behavior
1 2 3 4 5 6 7
---------Baseline --------------- -----Post intervention--------
Intervention
Longitudinal Designs
• Multiple Baseline Designs o Introduction of intervention components is staggered across time, contexts, or
situations (e.g., 3 problem behaviors in classroom identified—introduce
intervention for each one in staggered fashion—continue to assess all behaviors)
• BASELINE INTERVENTION
20 SESSIONS
Poking
neighbor
Grabbing
objects
Not raising
hand
Three conditions for determining
causality
1. Co-variation (i.e., correlation)
2. Temporal precedence
3. Ruling out alternative explanations (due to
extraneous 3rd variable threats…i.e., internal validity)
21
Bivariate correlations show covariance. _______
• But not temporal precedence—not sure which variable came first Solution: cross-lag panel designs (longitudinal designs)
• And not internal validity—no control for third variables Solution: multiple regression
Ruling Out Third Variables with
Multiple-Regression Designs
• Measuring more than two variables
• Regression results indicate if a third variable affects the
relationship
• Adding more predictors to a regression
• Regression does not establish causation
The Third Variable Problem
Multiple Regression Helps with the
Third Variable Problem
Adding More Predictors
Review: Are multiple regression studies able to show causation?
– Temporal precedence? (maybe not)
– Internal validity? (You can only control for variables that you thought to measure.)
Good experiments are still the best.
Multiple Regression and the Third Variable Problem
Multiple Regression Helps with the Third Variable Problem
Regression Does Not (Definitively)
Establish Causation
Getting at Causality
Start with an association between two variables:
(IV) RECESS and (DV) BEHAVIOR PROBLEMS (link C).
Mediation hypotheses propose a mechanism for a bivariate relationship. Why are these
two variables correlated? (i.e., Recess affects Physical Activity which then impacts
Behavior Problems)
Mediation hypotheses are causal statements.
Mediators specify a time sequence for the three variables (temporal precedence).
Mediators also specify the mechanism (IV affects DV through the mediator).
Mediation
1. Test path c 2 Test path a 3 Test path b 4 Regression (test path c’):
DV is behavior problems IVs are physical activity and recess Does the ‘recess – beh problems’ link (path c) get smaller when physical activity is controlled/accounted for? If YES, then physical activity is a mediator.
Steps in Testing Mediation
Mediators Versus Third Variables
Mediation Model
3rd Variable
Problem
Moderator Effect Gender
Extroversion Group
conversations
36
Indicate whether each statement below is describing a mediation hypothesis, a third variable argument, or a
moderator result. First, identify the key bivariate relationship. Next decide whether the extra variable comes
between the two key variables or is causing the two key variables simultaneously. Then draw a sketch of
each explanation, following the examples in Figure 9.13 in the text.
1. Having a cognitively demanding job is associated with cognitive benefits in later years, because
people who are highly educated take cognitively demanding jobs, and people who are highly
educated have better cognitive skills.
2. Having a cognitively demanding job is associated with cognitive benefits in later years, but only
among men, not among women.
3. Having a cognitively demanding job is associated with cognitive benefits in later years, because
cognitive challenges build lasting connections in the brain.
1. Viewing violent television is associated with aggressive behavior because children model what
they see on TV.
2. Viewing violent television is associated with aggressive behavior because people who watch more
violent TV have more lenient parents, and these lenient parents also do not care if their children
are violent.
3. Viewing violent television is associated with aggressive behavior very strongly among teenagers,
but less strongly among young adults.
In Class Activity #4
Research Design in Counseling
Psychology
Class 8
Instructor: Elizabeth A. Skowron, Ph.D.,
257 HEDCO Building
541-346-0913
eskowron@uoregon.edu
1
2
Analyses Design selection
Three conditions for determining
causality
1. Co-variation (i.e., correlation)
2. Temporal precedence
3. Ruling out alternative explanations (due to extraneous
3rd variable threats…i.e., internal validity)
3
Testing “Causal” Hypotheses
• Review basic components of ‘Experiments’
o Independent variables
• Manipulated
o Dependent variables
• Measured
Three conditions of causality…
1. Establishing covariation
2. Establishing temporal precedence
3. Establishing internal validity
Two kinds of designs…
4
Testing “Causal” Hypotheses
Two kinds of designs that support causal claims
1. Independent-groups designs o (i.e., between-groups or between-persons or BP designs)
o Different groups of participants are assigned to different levels of the independent variable
2. Within-groups designs o (i.e., within-persons or WP designs)
o One group of participants are assigned to (or presented with) all levels of the independent variable
• “Enables researcher to treat each participant as his/her own control”
5
Testing “Causal” Hypotheses
1. Independent-groups designs o (i.e., between-groups or between-persons or BP designs)
o Two basic forms of this design
1. Posttest only designs: random assignment and 1 posttest
R X O2a
R O2b
1. Pretest-posttest designs: random assignment & key DVs are measured twice—once before and once after exposure to the IV
R O1 X O2
R O1 O2
6
Test for covariation by detecting
differences in the dependent variable; establish temporal precedence bec.
IV precedes changes in DV; if study is
conducted well (no design confounds, no selection effects), internal validity is
established.
Randomly
Assign
IV: group 2
IV: group 1
Measure of DV
Measure of DV
Randomly
Assign
IV: group 2
IV: group 1
Measure of DV
Measure of DV
Measure of DV
Measure of DV
All above applies plus…
Use pre-posttest design to
evaluate whether random
assignment made groups equal
(relevant with small n studies);
can better track change over
time in each group
1. Independent-groups designs o (i.e., between-groups or between-persons or BP designs)
o Two basic forms of this design
1. Posttest only designs: random assignment and 1 posttest
R X O2a
R O2b
1. Pretest-posttest designs: random assignment & key DVs are measured twice—once before and once after exposure to the IV
R O1 X O2
R O1 O2
7
Randomly
Assign
IV: person
praise
IV: process
praise
# problems
solved
# problems
solved
Randomly
Assign
IV: person
praise
IV: process
praise
# problems
solved
# problems
solved
# problems
solved
# problems
solved
EXAMPLE
Study testing the effects of two
kinds of praise on children’s
problem-solving effort:
Process praise: ‘you must have
worked hard at these problems’
Person praise: ‘you must be
smart at these problems’
0
1
2
3
4
5
6
7
# problems solved
Process
Person
4.0
4.5
5.0
5.5
6.0
6.5
7.0
Trial 1 (pre) Trial 2(post)
Process
Person
# problems
solved
Posttest only designs
vs.
Pretest-posttest designs
o Which Design is Better…?
o It depends…..
o Posttest only design
• combines random assignment with a manipulated IV—enabling
powerful causal conclusions
o Pretest-posttest design
• Adds a pre-testing step…helps if you want to be sure that IV
levels are equivalent at pretesting (as long as the pretest doesn’t
change behavior…), and helps to more clearly map patterns of
change
8
Testing “Causal” Hypotheses
1. Within-groups designs o (i.e., within-persons or WP designs)
o Concurrent-measures design
• Participants are exposed to all levels of an IV at roughly the same time, and a
single DV measure is taken
o e.g., Harlow’s study of attachment in baby monkeys
• Two ‘mothers’ are presented
IV: (mother type) A wire mother w/milk vs. A cloth mother w/no milk
DV: preference as measured by time spent clinging to either
o e.g., Coke v. Pepsi taste test
9
One group
Wire mom w/milk
Cloth mom
Clinging behavior
Testing “Causal” Hypotheses
1. Within-groups designs o (i.e., within-persons or WP designs)
o Repeated-measures design
• Participants are measured on a DV more than once—after exposure to each
level of the IV
o e.g., Bick & Dozier’s (2008) study of social bonding in new mothers
• Two ‘toddlers’ are presented and mothers instructed to interact
closely with them
IV: (toddler type) own toddler vs. different toddler
DV: Oxytocin levels in bloodstream (social neuropeptide central to human bonding)
10
One Group Measure oxytocin Interact w/different
toddler Measure oxytocin
Interact w/own
toddler
Testing “Causal” Hypotheses
1. Within-groups designs o (i.e., within-persons or WP designs)
o Advantages of Within-groups designs
• Ensures participants in (or exposed to) all levels of the IV are equivalent.
Why ____________________________?
• Gives the research study more (statistical) power to see differences across
conditions if they exist. Why ___________________? As per MAXMINCON,
when extraneous differences in demographic and other personality
variables, etc. are held constant across all levels, we can more easily detect
an effect of the IV manipulation if there is one.
• These designs require fewer participants overall
11
Within-groups designs
(i.e., within-persons or WP designs)
• Do within-group designs allow you to make causal
claims? • Covariation_____?
• Temporal precedence____________?
• Threats to internal validity_______________?
• Potential threat to internal validity for WP designs = if being exposed to one
condition changes how someone reacts to the other condition(s)
o Called: order effects or practice effects or carryover effects
• Solution? o Counter-balancing controls for order effects
12
Randomly
Assign
Measure
oxytocin
Measure
oxytocin
Interact w/own toddler
Interact w/different toddler
Interact w/different toddler
Interact w/own toddler
Measure
oxytocin
Measure
oxytocin
Testing Causal Hypotheses
• Designing and evaluating studies that make a causal claim (via the 4 big validities)
• Construct validity o How well were the variables measured and manipulated?
• External validity o To whom or to what can you generalize the causal claim?
• To other people…?
• To other situations…?
• Statistical conclusion validity o How well do your data support your causal conclusion?
1. Is the different statistically significant?
2. How large is the effect?
• Internal validity o Are there (plausible) alternative explanations for the outcome?
13
Testing Causal Hypotheses
• Designing and evaluating studies that make a causal
claim (via the 4 big validities)
• Internal validity o Are there (plausible) alternative explanations for the outcome?
o Three fundamental questions worth asking…
1. Did the design of the experiment ensure there were no design confounds? Or did some other variable accidentally covary along with the intended independent variable?
2. If the experimenters used an independent-groups design, did they control
for selection effects by using random assignment or matching?
3. If the experimenters used a within-groups design, did they control for
order effects by counterbalancing?
14
Threats to Internal Validity that can apply to
an experiment • Many threats to validity of studies can be corrected for simply by adding
a comparison group.
• A few threats may apply to any intervention study/experiment
1. Observer bias
• Possible in any study with behavioral/observed DVs o Occurs when researchers’ expectations influence their ratings/scores/interpretation of the results
• Threatens internal validity (an alternative explanation now exists…) and construct validity (ratings/scores don’t represent ‘true’ scores)
• Solution: ensure staff who measure the DV are blind to study hypotheses
2. Demand characteristics
• A problem when participants guess what the study is supposed to be about & change their behavior in the expected direction
• Solution: conduct a double-blind study, where neither staff nor participants know which condition they are in; at minimum, ensure staff are blind to condition
3. Placebo effects
• Occur when participants improve after treatment, but only because they believe they received an effective intervention
15
In-class activity #5: Article review Prinz et al., 2009
• Research question o Specific hypotheses
o IV = __________________; # levels of the IV = _______________
o Levels of the IV are:
o DVs: # of DVs = ______; Specific DVs are:_______________, ___________, and ____________________
o Design: ___________________________
• Diagram the design
• Describe the random assignment process. Who/what was randomized?
• Who were the participants?
• Describe the Triple P intervention condition
• Were the hypotheses supported?
• Did they acknowledge plausible threats to validity? What are some examples…?
16
Beth Stormshak, Ph.D.
Professor, College of Education
University of Oregon
An intervention is one thing
Implementation is something
else altogether
Implementation Science
According to NIH (2008):
The use of strategies to adopt and integrate evidence-based health interventions and change practice patterns within and across specific systems
Action Oriented
Within Settings or Systems
AND collects data
Chambers DA. Advancing the science of implementation: A workshop summary. Administration and Policy in Mental Health and Mental Health Services Research. 2008;35(1-2):3-10.
3
1. We know a lot about what works
10K reviewed studies in What Works Clearinghouse
2. We are short on implementation action strategies to put what works into practice:
3. It takes too long for research to affect practice
4
T1 – Type 1 – The application of basic research findings to the development of interventions
T2 – Type 2 – Investigates the process and mechanism through which tested and proven interventions are integrated into practice and policy
T1 research is more common, T2 research is more limited
The use of effective interventions without implementation strategies is like serum without a syringe; the cure is available, but the delivery system is not
Fixsen, Blase, Duda, Naoom, Van Dyke,2010.
Only a small percentage of interventions implemented by community based delivery systems are evidence based.
6
Effective
Interventions
Actual Supports
Years 1-3
Outcomes
Years 4-5
Every Teacher
Trained
Fewer than 50% of
the teachers
received some
training
Fewer than 10% of
the schools used the
CSR as intended
Every Teacher
Continually
Supported
Fewer than 25% of
those teachers
received support
Vast majority of
students did
not benefit
Aladjem & Borman, 2006; Vernez, Karam, Mariano, & DeMartini, 2006
Longitudinal Studies of a Variety of Comprehensive School Reforms
“17 Year Gap” in Health Care
8
Is the gap between research and practice similar in education to that existing in health?
Types of Gaps?
As long as?
As important to shorten? Which way?
As resistant to change?
9
Making a Program
Work
Does a Program Work?
Could a Program Work?
10
IOM 2009
Landsverk,
Brown et al.
2012
Aarons et al.,
2011
Implementation
Exploration
Adoption / Preparation
Implementation
Sustainment
Effectiveness Studies
EfficacyStudies
Preintervention
Traditional Translation Pipeline
Rea
l W
orl
d R
elev
an
ce
Local
knowledge
Generalized
knowledge
Intervention
Intervention: Program, Practice, Policy, Principles
Practice Setting: Delivery Support System
Ecological System: Population and Community/Cultural Context
11
Preadoption◦ How do preferences for EBI impact consumer choices?
◦ What are the key channels for stakeholders to obtain EBI information?
Adoption
• What are key market, organizational, and other factors influencing adoption decisions?
• What evidence is used by decision makers in the adoption phase?
Implementation◦ What are the most effective delivery systems for different
settings?
◦ What influences consumer participation?
◦ What are the factors that impact implementation quality?
Sustainability
• What funding models are needed to sustain the program?
• What are the effective leadership strategies for long-term implementation?
The Baltimore City Public School System (BCPSS) has
collaborated in 3 generations of education and
prevention field trials.
Trials were directed at helping children master
obeying rules of behaving, attending, academic
learning, socializing appropriately in 1st grade
classroom.
Interventions were tested separately in 1st generation
(our focus today), then together in later trials.
15
Universal
Selective
Indicated
RxMed, MH,Soc Welfare
Levels of Prevention and Treatment
16
Early Risk in Prevention Research
Over the last four decades much has been learned about early
risk factors and paths leading to drug abuse, and other behavioral, mental health, and school problems.
Aggressive, disruptive behavior as early as 1st grade has been repeatedly found a risk factor for later drug and alcohol abuse and disorders, delinquency, violence, tobacco use, high risk sex, school failure and other high risk behaviors.
Parenting interventions are one of the most effective for reducing aggressive behavior over time.
17
You have decided to implement your intervention in schools
What are the barriers?
What are the strengths?
How will you go about doing this?
Do you think you will be successful?
Test and Tailor
for Real
World Conditions
Developmental
& Measurement
Models
Intervention
Design &
Experiment
Revise for
Public Health
Service Settings
Improved:Effectiveness,
Efficiency, Expense
Ethics
InitialInterview
AssessChild &Family
ParentFeedback &
Planning
Brief, tailoredPMT
PMT Treatment
ChildCBT
CommunityTreatmentResources
An Overview of the Family Check-Up and Follow-Up Services
The Family Check-Up
Mindful
Parenting
(proactive,
Monitoring)
Positive
Behavior
Support
Setting
Healthy
Limits
Family
Relationship
Building
FCU
Project Alliance 1 Portland Public Schools, 1995-present
Project Alliance 2 Portland Public Schools, 2005-2010
Early Steps Children involved in WIC, ages 2-10
Shadow Project American Indian families in PNW
Community Mental CMH agencies in Portland–120 familiesHealth (CDC)
Positive Family 44 Oregon Middle SchoolsSupport
Positive Family 5 Oregon Elementary schoolsSupport: Elementary school
Service Systems Affecting Mental Health
of Children and Adolescents
Developmental
Stage
Early
Childhood
Childhood
Early
Adolescence
Adolescence
Public School
Setting
Community
Programs:
Treatment and
Rehabilitation
WIC,
Preschools
OutcomeDomain
InterventionEffects
Period of Development
Authors
Behavioral * Problem behavior* Problem behavior
Age 2 to 4Age 2 to 7.5
Shaw et al 2006Dishion et al 2013
Affective * Co-morbid depression* Maternal depression
Age 2 to 4Age 2 to 4
Connell et al, 2009Shaw et al, 2009
Parenting * Observed PBS* Reduced coercion
Ages 2 to 3Ages 2 to 4
Dishion et al, 2008Smith et al, 2013
Cognitive/Educational
*Improved effortful control and language
*School readiness
Ages 2 to 7
Ages 2 to 7
Chang et al, in press
Brennan et al, 2013
Effects of the Early Childhood Family Check-up:
Average 2 Annual Sessions 70% Engagement
OutcomeDomain
InterventionEffects
Period of Development
Authors
Behavioral * Antisocial Behavior*Early Drug Use*Drug (ab)use*Problem behavior*High risk sex
Age 11 to 19Age 11 to 14Age 11 to 23Age 11 to 14Age 11 to 22
Van Ryzin et al, 2012Dishion et al 2002Veronneau et al in pressStormshak et al, 2010Caruthers et al 2013
Affective *Depression*Depression
Age 11 to 15Age 11 to 14
Connell et al, 2006Fosco et al, in press
Parenting * Observed Monitoring* Reduced conflict
Ages 11 to 14Ages 11 to 16
Dishion et al, 2003Van Ryzin et al, 2012
Cognitive/Educational
*Improved gradesand attendance
Ages 11 to 17 Stormshak et al 2010
Effects of the School-based Family Check-up:
Average 6 Sessions over 2 years and 25-50% Engagement
Phase 1Exploration and
Readiness:
1) Information/brochure, cost structure.
2) Assessment process and review
3) Plan and scope
Phase 2 Installation:
1) Role definition2) Priority and
staging3) Work site
training4) Technology
Transfer5) Supervision
training
Phase 3: Implementation
consultation:
1) Ongoing COACH supervision
2) Feedback monitoring
3) Clinical outcome monitoring
Phase 4:Sustainability:
1) Certification of therapists
2) Certification of supervisors
3) Certification of agency
4) Plan for fidelity Monitoring
Funding for this research supported by the
Department of Education IES, grant
R324A090111
Awarded to John Seeley, Ph.D., Tom Dishion,
Ph.D., Beth Stormshak, Ph.D., & Keith
Smolkowski, Ph.D.
Increased problem behavior
Increased peer group influence
Decreased attendance
Decreased parent involvement
Decreased academic performance
Robust evidence linking parenting practices and family engagement in school to positive outcomes for adolescents and young adults Biglan et al., 2004; Dishion, et al., 1996, 2002; Fosco, et al., 2013; Henderson & Berla, 1994; Henderson & Mapp, 2002
According to public health perspective:◦ Effective interventions should reach large numbers of people
Biglan, 1995; Biglan, Sprauge, & Moore 2006
◦ Interventions should be designed to fit in or alter existing service-delivery systems Hoagwood & Koretz, 1996
Schools are the largest, and often only, providers of child behavioral health services for many communities Burns, et al., 1995; Hoagwood, et al., 2001, 2003
A school-based system to form effective partnerships with parents to support student success
What it is: Strengths-based program
Integrated into PBIS tiers
Focused on family-school partnerships
Proactive
Inform, Invite, Involve parents in response to student needs
Foundation in empirically-supported strategies
Indicated
Selected
Universal
•Family Check-Up •Parenting Support Sessions•Parent Management Training
•Community Referrals
•Parent Integration CICO•Attendance & Homework Support•Home-School Beh Change Plans
•Email and Text messages
•Family Resource Center•Parenting Materials
(Brochures/Videos/Handouts)•Positive Family Outreach•Proactive Parent Screening
•Individualized Supports•Functional Behavioral
Assessments
•Specialized Supports•Check-In/Check-Out
• School Rules & Expectations
•Positive Reinforcement•Student Needs Screening
Assist middle school staff as they implement Positive Family Support within their existing Positive Behavioral Interventions and Supports infrastructure.
Brochures, TV/DVD, Supplies Meeting Table, Computer, Coffee/Danishes on counter
Invite Parents to Join CI/CO
Use Home Incentives Plan
Check-In/ Check-Out
For teachers & family resource specialists
For parents and students (with teacher & family
resource specialist help)
For teachers and parents
Parent
Readiness
Screener
(school entry)
Teacher &
Staff
Readiness
Screener
(fall-spring)
Family
Check Up
School-
Parent
PBS plan
Tailored
Student &
Family
Support
Tier I Family Support: Parent Student Readiness Screener
Tier I Family Support: Parent Student Readiness Screener
A unidimensional, psychometrically sound parent screener
Linked with proximal attributes of student functioning (e.g., completes homework and assignments on-time, shows up on-time to school)
Moore et al. (2014)
InitialInterview
AssessChild &Family
ParentFeedback &
Planning
Brief, tailoredPMT
PMT Treatment
ChildCBT
CommunityTreatmentResources
An Overview of the Family Check-Up and Follow-Up Services
The Family Check-Up
Tier III Family Support: The Family Check-up
Dishion & Stormshak (2007); Dishion, Stormshak, & Kavanagh (2012)
Recruitment
◦ All middle schools in Oregon implementing PBIS
invited to participate
Strict adherence to PBIS later revised due to
recruitment difficulties
◦ Interested schools provided with personal visit to
explain project and implementation process
◦ Schools randomly assigned to intervention or wait-list
control (N=41)
Workshops
◦ Spring before implementation: All staff introduction to PFS to increase school-wide
awareness and buy-in
◦ Summer before implementation: 2-day training for core PFS staff to familiarize with goals
and develop learning community Had to be revised due to drastic budget cuts throughout
implementation
◦ Fall of implementation: All staff training to increase positive communication with
parents
Consultation
◦ Intervention schools provided two years of consultation Planned visits and requested assistance
◦ Consisted of: Modeling positive family interventions Problem solving regarding when and how to involve families in
intervention Integration of family involvement into existing school
interventions Setting up family resource center Provision of parenting resources (brochures, videos, books,
etc) Increasing positive and proactive family outreach
Family-School Wide Evaluation Tool (FamSET) Multi-method, multi-source assessment
completed by trained assessor with appropriate middle school staff member
Maintains alignment with the School-Wide Evaluation Tool (SET; Horner et al., 2004)
Example items◦ “Are parents contacted before a child’s behavior gets
out of hand?” (1 = never, 4 = always)
◦ “At this school, do you offer family-based services or educational material?” (1 = never, 4 = always)
0.0 0.5 1.0 1.5 2.0 2.5 3.0
School budget contained an allocated amount ofmoney for school-wide behavioral support (U)
Followed-up with parents about previously discussedconcerns (I)
Worked directly with parents to support positiveparenting practices (I)
Asked parents to participate in positive rewardsystems for targeted school behaviors (S)
Parents had input into school-wide policies regardingstudent discipline practices (U)
Offered family-based services or educational material(U)
Worked directly with parents to support familyinvolvement in academic issues (S)
Provided assessment-based feedback about parentingrelated to academics (S)
Defined system for regular, positive contact withfamilies (U)
Parents contacted before a child's behavior got out ofhand (U)
Number of resources available to families at school (U)
Provided questionnaire to assess parents' perspectiveson student strengths and risk factors (U)
Adapted from Brown et al., 2013
80% of schools with the highest FamSET scores were in the intervention condition
60% of schools with the lowest FamSET scores were in the control condition
Poor Implementation Adequate Implementation
Strong Implementation
4.8% 28.6% 66.7%
Universal Level
Selected LevelPoor Implementation Adequate
ImplementationStrong Implementation
4.8% 42.9% 52.4%
Indicated LevelPoor Implementation Adequate
ImplementationStrong Implementation
23.8% 71.4% 4.8%
Intervention School Universal Selected Indicated Overall
Mad. 8 8 7 7.67
CP 9 9 5 7.67
Bro. 9 8 5 7.33
Cof. 9 7 6 7.33
HD 9 7 6 7.33
Ro. 9 8 5 7.33
BC 9 7 5 7.00
RR 8 8 5 7.00
Dam. 8 6 5 6.33
AS 8 8 2 6.00
Aza. 9 5 4 6.00
CR 8 5 5 6.00
WM 7 6 5 6.00
Sha. 7 8 2 5.67
WM 6 7 4 5.67
Ast. 6 4 6 5.33
Cre. 4 6 5 5.00
Tal. 6 5 4 5.00
DC 6 5 2 4.33
Lin. 5 4 1 3.33
Pio. 2 2 2 2.00
Conditions During Implementation: National
School YearOperating Expenditure
per StudentCapital Expenditure per
Student
2008-2009 $9392 $1364
2009-2010 $9275 $990
2010-2011 $9363 $777
2011-2012 $9366 $763
2012-2013 $9364 $556
From Oregon Department of Education, 2008-2013
Principal SST SPED Counselor
Highest FamSET Scores 20% 31.6% 26.7% 8.3%
Lowest FamSET Scores 60% 66.7% 73.8% 66.7%
Note. Percent turnover from year 1 to year 2, n=10
Table 1
FAM SET Implementation Findings
Control Schools
(n = 20)
Intervention Schools
(n = 21)
Implementation Tier and Sample Items Time 1 Time 2/3a Time 1 Time 2/3a
Mean
XX%b SD
Mean
XX%b SD
Mean
XX%b SD
Mean
XX%b SD d
Universal Implementation (range = 0 – 22) 10.65 4.95 14.25 3.58 10.86 4.36 18.86 2.35 1.58
Does your school have a room dedicated to parent or family
services?30% 45% 23.8% 85.7% 1.85
Did your school offer parent topic nights? 35% 50% 38.1% 85.7% 1.22
Selected and Indicated Implementation (range = 0 – 22) 16.15 3.76 18.90 1.67 15.38 3.83 19.71 1.77 0.47
Offer family-based assessments for students struggling
academically or behaviorally?45% 40% 33% 76.2% 1.51
Is there consistent follow-through on family support services
discussed in team meetings?90% 95% 71.4% 95.2% 0.40
Number of Resources Available to Families (range = 0 – 11) 1.30 1.95 3.67 4.56 1.48 2.99 7.48 3.69 0.96
Is there a family support person identified at the school? 25% 35% 19% 71.4% 1.28
a Third assessment for Wave A and B schools; Second assessment for Wave C schoolsb Item level data indicate the percent of schools implementing each intervention component
School readiness assessment important component of the pre-implementation process
Implementation models rarely address the increased response cost to school staff of changing routines and expectations◦ More attention needed regarding how to reinforce school staff for
implementation efforts
Interventions more likely to be sustained when implemented at the state or district level and supported with internal funds◦ High staff turnover often prohibits embedding interventions at the
individual school level
Intervention implementation most effective when scaffoldedand supported over a number of years◦ Funding for maintenance of implementation critical
Recommended