7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
1/22
Table of Contents
Chapter 1: Thinking like a scientist
Chapter 2: Getting started: ideas, resources, ethics
Chapter 3: defining, measuring and manipulating variables
Chapter 4: descriptive methods
Chapter 5: data organization and descriptive statistics
Chapter 6: correlational methods and statistics
Chapter 7: probability and hypothesis testing
Chapter 8: introduction to inferential statistics
Chapter 9: the logic of experimental design
Chapter 10: inferential statistics: two group designs
Chapter 11: experimental designs with more than two levels of an independent variable
Chapter 12: complex experimental designs
Chapter 13: quasi-experimental and single-case designs
Chapter 1 thinking like a scientist
Sources of knowledge: p.6
Superstition, intuition (couples more likely to conceive after adoptingan illusory correlation), authority (e.g. parents, actors), tenacity (repetition increases believabilityadvertising), rationalismlogical reasoning (syllogisms: A categorical syllogism consists of three parts: the major premise, the
minor premise and the conclusion. Each of the premises has one term in common with the conclusion: in a
major premise, this is the major term (i.e., the predicate of the conclusion); in a minor premise, it is the minor
term (the subject) of the conclusion. For example:
o Major premise: All men are mortal.o Minor premise: All Greeks are men.o Conclusion: All Greeks are mortal.
Empiricismknowledge through observation and experiences; get a long list of observable facts; needrationalism to assemble these facts logically; Aristotle was an empiricist, while Plato was a theorist.
Science: rationalism + empiricism; A hypothesis is a prediction regarding the outcome of a single study. Manyhypotheses may be tested and several research studies conducted before a comprehensivetheory on a topic is
put forth.
Publicly verifiable knowledge: research can be observed, replicated, criticized, and tested for veracity p.11 principle
of falsifiability: a theory must be stated in such a way that it is possible to refute.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
2/22
Scientific research has three basic goals: (1) to describe behavior, (2) to predict behavior, and (3) to explain behavior
p.14
Research Methods in Science
Descriptive Methodso Observational: Naturalistic observation and Laboratory observation (p.15)o Case Study Method: in-depth study of one or more individuals e.g. Piagets theory of cognitive
development in children developed by simply describing the individual(s) being studied.
o Survey Method: question individuals on topic(s) and then describe their responses Predictive (relational) methods: we do not systematically manipulate the variables of interest; we only
measure them; since alternative explanations cant be ruled out, cannot establish causation
o Correlational method: assesses the degree of relationship between two measured varso Quasi-experimental method: differs from the experimental method in that subjects choose to be
members of the different groups being studied i.e. subject/participant varcant be changed i.e. it is
not a manipulated variable e.g. sorority vs. non-sorority girls (p.17)
Experimental method: Controls are very important in such experiments; you control who is in the study (geta representative population), who participates in each group (control for differences in participants by
random assignment between the control (baseline) group and the experimental group), and the treatment
each group receives (e.g. some take Vit C and some do not). Other vars such as amt of sleep, type of diet,
amt of exercise might also need to be controlled. P.19
Chapter 2: Getting Started on a Research Idea
Selecting a Problem: review past research on the problem OR review the pertinent chapter in the psychology text OR
observe a problem in nature and decide how to address it
Reviewing the Literature (p.33,34): A list of psychology journals is on p.32; Psych Abstracts, published by the APA, lists
abstracts on a monthly basis of all published work; PsycINFO is an electronic database that provides abstracts and
citations to the scholarly literature in the behavioral sciences and mental health. To help you choose appropriate
keywords, use the
APAs Thesaurus of Psychological Index Terms. Whereas Psych Abstracts finds articles published on a given topic withina given year, the Social Science Citation Index (SSCI) can help you to work from a given article (a key article) and see
what has been published on that topic since the key article was published.p.34.
PsyArticles is an online database that provides full-text articles from many psychology journals and is available through
many academic libraries. ProQuest is an online database that searches both scholarly journals and popular media
sources. Full-text articles are often available.p.35
Journal Article Structure (p.37)
Research articles usually have five main sections: Abstract, Introduction, Method, Results, and Discussion. The Abstract
is a brief description of the entire paper that typically discusses each section of the paper (Introduction, Method,
Results, and Discussion). It should not exceed 120 words. The Introduction has 3 components: (1) intro to the problem
(2) review of relevant previous research, and (3) purpose/rationale for study. Method section includes participants(selection processes), materials/apparatus (testing materials, equipment), and procedure (groups used in study,
instructions to participants, experimental manipulation, controls etc); The Results section summarizes the data collected
and the type of statistic used to analyze the data. This section should include a description of the results only, not an
explanation of the results. Discussion: The results are evaluated and interpreted in the Discussion section. It typically
begins with a restatement of the predictions of the study and tells whether or not the predictions were supported.
Institutional Review Boards (IRBs) oversee all federally funded research involving human participants. P.44 If the
participants in a study are classified as at minimal risk, then an informed consent is not mandatory. P.47 In studies
where anonymity and confidentiality are at risk, an informed consent form should be used.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
3/22
Chapter 3: Defining, Measuring, and Manipulating Variables (p.57)
Operational definition: defining a variable in concrete terms e.g. hunger: >12hrs w/o
food intake, anxiety: galvanic skin response or HR. Purpose: communicate clearly to
others and measure/manipulate vars consistently.
Properties of measurement are listed at the right p.58
Scales of Measurement
A nominal scale is one in which objects or individuals are assigned tocategories that have no numerical properties. Nominal scales have the
characteristic ofidentity. Such variables are categorical b/c. data is grouped
into categories. E.g. ethniticity
In an ordinal scale, the categories form a rank order along a continuum andhave the properties ofidentity and magnitude but lack equal unit size (diff
bet. Rank 1 and 2; and rank 2 and 3) and absolute zero. Also known as
ranked data. E.g. class rank
In an interval scale, the units of measurement (intervals) between thenumbers on the scale are all equal in size. When you use an interval scale, the criteriaof identity, magnitude,
and equal unit size are met. E.g. Celsius temp scale the Fahrenheit scale does not have an absolute zero.Because of this, you cannot form ratios based on this scale (for example, 100 degrees is not twice as hot as 50
degrees).
Ratio data have all four properties of measurementidentity, magnitude, equal unit size, and absolute zero.Examples of ratio scales of measurement include weight, time, and height.
Aptitude tests measure an individuals potential to do something, whereas achievement tests measure an individuals
competence in an area. P.63 Behavioral measures are often referred to asobservational measures because they involve
observing anything that a participant does. Most physical measures, or measures of bodily activity, are not directly
Observable.
Reliability refers to the consistency or stability of a measuring instrument.p65 Examples of errors include trait(participant truthfulness e.g.) error and method errors (operator using equipment). In effect, a measurement is a
combination of the true score and an error score. Observed score = True score + Measurement error
Reliability is measured using correlation coefficients. A correlation coefficient measures the degree of relationship
between two sets of scores and can vary between -1.00 and +1.00. To establish the reliability (or consistency) of a
measure, we expect a strong correlation coefficientusually in the .80s or .90sbetween the two variables or scores
being measured. A positive coefficient indicates that those who scored high on the measuring instrument at one time
also scored high at another time, those who scored low at one point scored low again. P.68
Types of reliability: test/retest reliability (lowered if practice effects--person can get better between testing frompractice), alternate-forms reliability (diff but equivalent questions on the tests), split-half reliability, and inter-rater
reliability
Validity: a measure that is valid measures what it claims to measure. P.70 A systematic examination of the test content
to determine whether it covers a representative sample of the domain of behaviors to be measured assessescontent
validity. Criterion validity: estimate present performance (concurrent validity) or to predict future performance
(predictive validity). The construct validity of a test assesses the extent to which a measuring instrument accurately
measures a theoretical construct or trait that it is designed to measure.
p.72: A test can be reliable, but not valid, but it can never be valid without being reliable.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
4/22
Chapter 4: Descriptive MethodsDescriptive methods: describe what has been observed in a group of people or animals, but dont allow one to make
accurate predictions or determine cause-and-effect relationships. Five different types of descriptive methods
observational methods, case studies, archival method, qualitative methods, and surveys (p.79).
Observational Methods: two types, naturalistic and laboratory (systematic).
Naturalistic: Ecological validity refers to the extent to which research can be generalized to real-life situations.In nonparticipant observation (goodall), there is the issue of reactivityparticipants reacting in an unnatural
way to someone obviously watching them. Disguised observation mitigates this. Expectancy effects are the
effect of the researchers expectations on the outcome of the study naturalistic observation has greater
flexibility but less control than laboratory observation.
Laboratory: also concerned with reactivity and expectancy effects. Advantage is that the situation is contrived sothe likelihood that participants will perform the behavior is higher. P.83.
Observational Methods: Data Collection
Narrative records: full narrative descriptions of a participants behavior.E.g. piagets studies of cognitivedevelopment in children
Checklists: A static item is a means of collecting data on characteristics that will not change while the observations are being made. E.g. # ppl present, age, gender; anaction item, is used to record whether specific
behaviors were present or absent during the observational time period. Disadvantage is missing behavior not
present on the checklist.
Qualitative Methods (case studies, archival, interviews/focus groups, field studies, action research): These are
distinguished from observational methods as follows: researchers are typically not interested in simplifying, objectifying
or quantifying what they observe.
Case Study Method: e.g. Piaget; an in-depth study of one or more individuals in the hope of revealing things that are
true of all of us. Problems: atypical individual causes erroneous generalizations to population. Also, expectancy effects.
P.85
Archival Method: describing data that existed before the time of the study. E.g. whether more babies are born when themoon is full. Use US census bureau etc. Risks are selection bias (cherry-pick data sources, also risk of not reliability or
validity b/c. using someone elses data
Interviews/focus group methods: 3 types of interviews: standardized interview (fixed questions), semistandardized, and
unstandardized (unstructured)
Field studies: similar to naturalistic observation; difference is that data are always collected in narrative form and left
that way. P.90 text.
Qualitative Method: Qualitative research focuses on phenomena that occur in natural settings, and the data are
typically analyzed without the use of statistics. Both the naturalistic observational method and the case study method
can be qualitative in nature.
Surveys (summary table on p.102): closed-ended, open-ended, partially open-ended (closed ended Questions with an
additional other option). A Likert rating scale (most psychs view it as interval, but some ordinal) presents a
statement rather than a question, and respondents are asked to rate their level of agreement with the statement.p.89
A loaded question is one that includes nonneutral or emotionally laden terms (e.g. eliminating wasteful excesses). A
leading question is one that sways the respondent to answer in a desired manner e.g. most people believe... Adouble-
barreled question asks more than one thing in a single item. Survey questions should not be randomly arranged:
sensitive questions (e.g. drug/sexual use) at end, demographic questions at end b/c. boring, group Qs on similar topics
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
5/22
together.p.90 A socially desirable response is one that is given because participants believe it is deemed appropriate by
society, rather than because it truly reflects their own views or behaviors.
mail survey: less sampling bias than phone/email b/c. wide availibilty; also ppl more comfortable answeringsensitive stuff; disadv: if Qs are unclear, no clarification; low response rate: 20%
Sampling Techniques for Surveys: There are two ways to sample individuals from a population: probability sampling and
nonprobability sampling.
Probability sampling:p.95 random selection, stratified random sample (guarantee that the sample accuratelyrepresents the population on specific characteristics; cluster sampling: e.g. sample from classes that arerequired of all students at the university, such as English composition.
Non-probability Sampling:individual members of the population do not have an equal likelihood of beingselected
o Convenience (haphazard) sampling: if you wanted a sample of 100 college students, you could standoutside of the library and ask people who pass by to participate
o Quota sampling: Quota sampling is to nonprobability sampling what stratified random sampling is toprobability sampling, but still not much effort devoted to creating a sample that is truly representative
of the population
Chapter 5: Data Organization and Descriptive Statistics
In a class interval frequency distribution, individual scores are combined into categories, or intervals, and then listed
along with the frequency of scores in each interval. P.106
For nominal scale or qualitative data, a bar graph (graphical representation of frequency distribution) is most
appropriate e.g. democrats, independents, republicans. For quantitative data in ordinal, interval, or ratio scales, a
histogram is used. P.107 Unlike in a bar graph, in a histogram, the bars touch each other to indicate that the scores on
the variable represent related, increasing values.
Frequency polygona line graph of the frequencies of individual scores or intervals. Again, scores (or intervals) are
shown on thex-axis and frequencies on the y-axis. After all the frequencies are plotted, the data points are connected.
Use with quantitative, continuous data like height, weight.
Measures of Central Tendency: A measure of central tendency is a representative number that characterizes the
middleness of an entire set of data. E.g. Mean, median, and mode p. 110
Mean: The mean is appropriate for interval and ratio data but not for ordinal or nominal data
For the sample mean,
Median: not affected by extreme scores.
Measures of Variation: A measure of central tendency provides information about the middleness of a distribution of
scores but not about the width or spread of the distribution. P.114 A measure of variation indicates the degree to
which scores are either clustered or spread out in a distribution.
Range: The simplest measure of variation is the rangethe difference between the lowest and the highest scores in a
distribution. The range is usually reported with the mean of the distribution.
Standard Deviation for a population (p.117):
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
6/22
Standard Deviation for a sample:
Compare this to the average deviation for a population, which is given by . Note that the standard
deviation will always be larger than the average deviation because the squaring of the terms gives more weight to
outlying values.
If, however, sample data is being used to estimate the population standard deviation, then an unbiased estimator
modification of N 1 must be used: p.126
Notice that the symbol for the unbiased estimator of the population standard deviation is s (lowercase), whereas the
symbol for the sample standard deviation is S (uppercase). The estimate has N-1 in the denominator to compensate for
the small samples not containing as much variability as the real population.
Variance is the square of the standard deviation.
Normal distributions are bell-shaped, symmetrical, and have an identical mean, median, and mode. They are unimodal;
most observations are centrally clustered. Last, when standard deviations are plotted on thex-axis, the percentage of
scores falling between the mean and any point on thex-axis is the same for all normal curves. (p.121)
Kurtosis: how flat or peaked a normal distribution is; Platykurtic = short and wide (think: platypus = close to theground, flat); Mesokurtic = medium height/breath; Leptokurtic = tall and thin (think: lepto = leap)
In a positively skewed distribution, the peak is to the left of the center point, and
the tail extends toward the right. Reason for its name: few individuals have
extremely high scores that pull the distribution in that direction. Negatively
skewed is just the opposite. P.122 If your disease has a low median survival rate,
you would prefer a positive skewthis means some people live for a very long
time post-diagnosis.
The Z-score (p.124): A z-score or standard score is a measure of how many
standard deviation units an individual raw score falls from the mean of the
distribution. Thus, when calculating a z-score for an individual in comparison to a
sample, we use , while for a
population, we use .
If the distribution of scores for which you are
calculating transformations (z-scores) is normal
(symmetrical and unimodal), then it is referred to as
the standard normal distributiona normal
distribution with a mean of 0 and a standard
deviation of 1.p.126
The standard normal curve can also be used to determine an individualspercentile rankthe percentage of scores
equal to or below the given raw score. P131
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
7/22
Chapter 6: Correlational Methods and StatisticsCorrelational Methods: determine whether two variables are related to one another. P.148 In
addition to describing a relationship, correlations allow us to make predictions from one
variable to another. If two variables are correlated, we can predict from one variable to the
other with a certain degree of accuracy. The magnitude or strength of a relationship is
determined by the correlation coefficient describing the relationship:0 = no correlation; 0 -
0.29: weak correlation; 0.3 to 0.69: moderate correlation; 0.7 to 1.0: strong correlation; 1.0 =
perfect correlation. In a perfect correlation, an increase/decrease in one variable is always
accompanied by an increase/decrease in the other variable.
Thus, in a graph, when there is a perfect correlation, the data points all fall exactly on a straight line (the
slope is irrelevant unless it is zero). Accompanying scatterplot shows no relationship. Also, it is
possible for a correlation coefficient of zero to indicate a curvilinear relationship (the + and
correlations nullify each other e.g. Anxiety vs. text performance, memory and age ) p.144
Misinterpreting Correlations
Causality refers to the assumption that the correlation indicates a causal relationship between two variables, whereas
directionality refers to the inference made with respect to the direction of a causal relationship between two variables.
P.146.
Third variable effects: a strong correlation between two variables is not really a meaningful relationship and isreally the product of a third variable. E.g. researchers found contraceptive use strongly correlated w/. # ofelectrical appliances; the third var was socioeconomic status; to remove the effect of the 3rd var, use partial
correlation p.148.
Restrictive Range: examine the correlated vars over a very short range that isnt big enough to observe acorrelation.
Curvilinear relationships mask correlationsPearson product-moment correlation coefficient: Pearsonsris used for data measured on an interval or ratio scale of
measurement. P.151. e.g. consider a list of 20 individuals heights and weights. Step 1: calculate the mean and S.D. for
the heights and weights. Next, convert each value to a z-score. If the correlation is strong and positive, we should find
that positive z-scores on one variable go with positive z-scores on the other variable and vice versa. Step 2: calculate thecross-product i.e. multiply each of the z-scores together and sum the respective products. If both zs are consistently
positive or negative or positive/negative, you will end up with a large positive or negative value and a strong correlation
The overall formula is below: General rule of thumb: at least 10 ppl per variable. An alternative,
computational formula is listed on p.154.
Coefficient of Determination: Calculated by squaring the correlation coefficient, the coefficient of determination (r2) is
a measure of the proportion of the variance in one variable that is accounted for by another variable. R2 is typically
reported as a percentage.p.154
Correlations for Nominal or Ordinal Data:
Spearmans rank-order correlation coefficient: both vars are ordinal (ranking) scale. If one var is interval/ratio, itmust first be converted into the ordinal scale.
Point-biserial correlation coefficient: one var is a two-value dichotomous nominal (e.g. gender) and the other isinterval or ratio
Phi coefficient: both vars are dichotomous nominal vars.Regression Analysis:p.156 A tool that enables us to predict an individuals score on one variable based on knowing one
or more other variables is regression analysis. Regression analysis involves determining the equation for the best-fitting
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
8/22
line for a data set. The line has the form of y = mx + b, but is written as follows: , where Y is the
predicted value on the Yvariable, b is the slope of the line,Xrepresents an individuals score on theXvariable, and a is
the y-intercept. To compute the slope: To compute the y-intercept: , where the
bars are the respective sample means. Multiple regression analysis involves combining several predictor variables in a
single regression equation to increase the predictive accuracy because in the real world, it is unlikely that one variable is
affected by only one other variable.
Chapter 7: Hypothesis Testing and Inferential Statistics
Probability: multiplication and addition rule p.178
It is impossible statistically, however, to demonstrate that something is true. In fact, statistical techniques are much
better at demonstrating that something is not true. Whatever the research topic, thenull hypothesis always predicts
that there is no difference between the groups being compared.
One-tailed test p.184: E.g. Do students in after-school programs have higher IQs than those in the generalpopulation? The null and alternative hypotheses are:
Two-tailed test: e.g. the researcher just wants to prove that there are IQ differences between the two groups,but isnt concerned with the direction of those differences.
Errors: p.186
The p-value or alpha level: When a result is statistically significant at the 0.05 (or 5%) level, it means that the
observed difference between the sample and the population could have occurred by chance only 5 out of every 100
times. In other words, any variation between groups is most likely due to true/real differences between them. In
this case, the risk of a Type I error is 5%.
Chapter 8: Introduction to Inferential Statistics
Inferential Statistics: p.197 three teststhe z test, the t test, and the chi-square (X2) test; the z test and the t-test are
used with interval or ratio data and are parametricassumptions such as knowing population mean (u) and standard
deviation (sigma) are needed; the chi-square test is used with ordinal or nominal data and is non-parametric.
The z-test: parametric inferential statistical test. needs population parameters such as mean and standard deviation.
determines the likelihood that the sample is part of the sampling distribution. allows us to test the null hypothesis for a
single sample when the population variance is known. Remember that az-score tells us how many standard deviations
above or below the mean of the distribution an individual score falls. But in the IQ problem above, we are not comparing
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
9/22
an individuals score to the population mean, but rather a sample mean must be compared instead with a distribution of
sample means, known as the sampling distribution.
Standard Error of the Mean p.198: the standard error of the mean (the standard deviation of thesampling distribution) can never be as large as , the standard deviation for the distribution of individual scores.
Think about it this way: if the size of each of these samples were to approach the population size, their means
would all be tightly clustered around the pop. mean and the standard deviation of the sample distribution would
be very small. Thus, the central limit theorem states that for any population with mean u and standard
deviation , the distribution of sample means for sample size N will have a mean of u and a standard deviation
of/sqrt (N) and will approach a normal distribution as N approaches infinity. p.198 Thus,
The z-score will tell us how many standard deviation units a sample mean is from the population mean, or thelikelihood that the sample is from that population. P.175 e.g. if wind up with a z = 2.06 for the one-tailed test,
the zritical = 1.64 i.e. the area under the graph to the right of that is 5%. The z-value would be significant and H0
would be rejected. In APA style, report result as Z (n = 50) = 2.06, p
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
10/22
The estimated standard error of the mean: , where . is the estimatedstandard error of the mean i.e. an estimate of the standard deviation of the sampling distribution based on
sample data since the pop. Standard dev is not known. s, (the estimated standard deviation for a population,
based on sample data):
APA style: t(9) = 2.06, p
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
11/22
History: change in dependent var due to external circumstances; eg. Stress reduction b/c. exams at start andvacation at end of study
Maturation: participants mature physically, socially, and cognitively over course of study Testing: the testing effectchange in performance due to familiarity with and practice on test items. Both +
practice effect and fatigue effect
Regression to the Mean: extreme scores that are the product of chance will moderate upon retesting Instrumentation effect: observer becomes better/more fatigued with taking measures Attrition/Mortality: e.g. heaviest smokers in experimental cessation group drop-out; post-test measures would
be unduly optimistic Diffusion of treatment: people receive treatment info from other participants Experimenter/Participant Effects: experimenter bias or expectancy effects influence outcome e.g. clever hans
the mathematical horse receiving cues from owners. Solve via single blind: either the experimenter or the
participants are blind to the manipulation being made or double blind: both unaware; Participant effects include
reactivitychange in behavior due to being watched. Also, placebo effect.
Floor and ceiling effects: e.g. measure rat weight in poundsno change detectedfloor effect; ceiling effectmeasure elephant w/. 350 lb max limit bathroom scale;
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
12/22
Threats to External Validity
Generalization to Populations: hampered by the college sophomore problem Generalizations from Lab settings: control maximized in lab settingsthe artificiality criticism; solve by
conceptual replicationtest concepts via diff indep var or dep var.
Correlated-Group Designs: participants in experimental and control groups are related Within-participant design: also known as repeated measures designsall participants serve in all conditions;
benefit is that you need fewer participants (e.g. if there are 4 conditions and need 15 ppl per condition; then in
the between-participants design, need 60 ppl, whereas only 15 for within-participant design), takes less time,
and increases statistical power b/c. reduces variability due to individual differences; this mode is popular is
psychological research p.240 downside; b/c. participants tested at least twice, practice/fatigue effects; solve via
counterbalancingreverse the order of tasks presented to control and experimental groups; however, with
three conditions, 6 possibilities, 4 conditions have 24 orderings of conditions; therefore, complete
counterbalancingexposing participants to all of the orderings of conditions is not possible; also carry-over
effectsdrug administered in one condition effects performance in subsequent conditions
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
13/22
Matched-Participants Experimental design: for each participant in one condition, there is a participant in theother condition(s) who matches him or her on some relevant variable or variables. Has advantages over the
between-participant design (groups are more similar) and the within-participant design (less carryover testing
effects); downsidemore people needed; also mortality effectsif one person drops out, the pair is
compromised; also difficulty finding participants (p.242)
Chapter 10: Inferential StatisticsTwo-group Designs
The inferential statistics discussed in Chapter 7 compared single samples with populations (z test, ttest, and test).
The statistics discussed in this chapter are designed to test differences between two equivalent groups or treatment
conditions.
The t Test for Independent Groups (Samples): p.251It indicates whether the two samples perform so similarly that we conclude they are likely from the same population, or
whether they perform so differently that we conclude they represent two different populations. P.227 e.g. researcher
wants to determine whether spacingstudy same amt of material all at intervalsis superior to cramming. Thus,
The dependent var is participants scores on a test
Statistical significance indicates that an observed difference between two descriptive statistics (such as means) is
unlikely to have occurred by chance.
Rather than comparing a single sample mean to a population mean, we are comparing two
sample means. To determine how far the difference between the sample means is from the difference between the
population means, we convert the mean differences to standard errors.
The standard error of the difference between the means does have a logical meaning. If we took thousands of pairs of
samples from these two populations and found for each pair, those differences between means would not
all be the same. They would form a distribution. The mean of that distribution would be the difference between the
means of the populations and its standard deviation would be . Thus,
, where . s12 and s2
2 are the variances of the two groups. P.252 The
degrees of freedom for this independent groups t test are (n1 -1) + (n2 -1). Refer to Table A.3 for the tcritical value. APA
style: t(18) =4.92, p
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
14/22
effect sizethe proportion of variance in the dependent variable that is accounted for by the manipulation of the
independent variable. It is an estimate of the effect of the independent variable, regardless of sample size. P.232 For
the ttest, one formula for effect size, known as Cohens d, is
According to Cohen (1988, 1992), a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and alarge effect size is at least 0.80. e.g. APA: t(18) = 4.92,p = .05 (one-tailed), d= 2.198
R2: the proportion of variance accounted for in the dependent variable based on knowing which treatment group the
participants were assigned to for the independent variable.
Confidence Intervals: Same formula as before (Ch. 7), except that rather than using the sample mean and the standard
error of the mean, we use the difference between the means and the standard error of the difference between means.
p.257
T test for Correlated GroupsThe same people are used in each group (a within-participants design) or different participants are matched between
groups (a matched-participants design). P.260 In a correlated groups design, the sample includes two scores for each
person, instead of just one. The null hypothesis is that there is no difference between the two scores. The degrees of
freedom for a correlated-groups ttest are equal to N 1
Step 1: We compute a difference score for each person by subtracting one score from the other for that person (or the
two individuals in a matched pair).
The standard error of the difference scores is the standard deviation of the sampling distribution of mean
differences between dependent samples.
, where sDis the unbiased estimator of the standard deviation of the difference scores and N is the
number of participants in each group.
Effect size: Cohens d and r2 p.262
. The r2 formula is the same as that listed above
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
15/22
Confidence interval:
e.g. on word memorization differences between concrete and abstract words, we could
answer that we are 95% confident that the difference in performance on the 20-item memory test between the two
word type conditions would be between 0.96 and 4.04 words recalled correctly.
Nonparametric Tests
A nonparametric test does not use any population parameters, such as the mean and standard deviation. Three
nonparametric tests: the Wilcoxon rank-sum test, the Wilcoxon matched-pairs signed-ranks Ttest (both used withordinal data), and the chi-square test of independence, used with nominal data. P.240
Wilcoxon Rank-Sum Test: p.265
The Wilcoxon rank-sum test is similar to the independent-groups ttest; however, it uses ordinal data (ranking) rather
than interval-ratio data and compares medians rather than means. Interval or ratio data may be converted to ranked
ordinal data. The underlying distribution is not normal. First, sum the ranks for the group expected to have the smaller
total. This value needs to be equal to or less than the critical value to be statistically significant. Further, in table A.6, n1
is always the smaller of the two groups. Refer to Table A.6. Table A.6 presents the critical values for one-tailed tests
only. If a two-tailed test is used, the table can be adapted by dividing the alpha level in half. n1(the number of
participants in a group) is always the smaller of the two groups. Assumptions of this test: p.266
Wilcoxon Matched-Pairs T Test
This is a nonparametric statistic and is necessary whenever the distribution is skewed (i.e. not normal). P.243
e.g. during the first term, the teacher measures the number of books her students read and ranks them ordinally; during
the second term, a rewards program is instituted and the students are again ranked. Is there a statistically significant
difference between the # of books read? The null hypothesis is that the median number of books read does not differ;
the alternative hypothesis is that the median number of books read during rewards is greater.
Step 1: for each student, compute a difference score (subtract books read 2nd
month from those read first month); if
program had no effect, would expect most scores to be close to 0.
Step 2: rank the absolute values of the difference scores. If two scores at position 1 have the same numerical value,
they are both ranked 1.5 and the next score gets a 3. Note that any values with a difference score of zero are not
ranked and do not figure into the N value.
Step 3: give the rank the sign of the difference score it representsStep 4: sum the positive and negative ranks. for a two-tailed test, Tobt is equal to the smaller of the summed ranks. In
contrast, the Tobt for a one-tailed test is the sum of the signed rankspredictedto be smaller. p.268 As with the Wilcoxon
rank-sum test, the obtained value needs to be equal to or less than the critical value to be statistically significant.
Chi-Square Test of IndependenceThis nonparametric test compares an observed frequency distribution to an expected frequency distribution of two
nominal variables. P.245 The difference between the Chi-Square test of independence and the Chi-Square goodness-of-
fit test (ch.7) is that the goodness-of-fit test compares how well an observed frequency distribution ofone nominal
variable fits some expected pattern of frequencies, whereas the test of independence compares how well an observed
frequency distribution oftwo nominal variables fits some expected pattern of frequencies. The degrees of freedom for
this test are equal to (r-1)(c - 1), where ris the number of rows and c is the number of columns.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
16/22
Objective: determine whether babysitters are more likely to have taken first aid than those who have never worked as
babysitters.
To determine the expected frequency for each cell:
, where RTis the row total, CTis the column total, and N is the total number of observations. P.246
If the exceeds the , then thenull hypothesis can be rejected.
Chi-Square test and effect size: Phi Coefficient
As with the ttests discussed earlier in this chapter, we can also compute the effect size for a test of independence.
. Cohens (1988) specifications for the phi coefficient indicate that a phi coefficient of .10 is a small effect,
.30 is a medium effect, and .50 is a large effect. In our particular example, if the phi value is small, then the difference
observed in whether a teenager had taken a first aid class is not strongly accounted for by being a babysitter.
Summary:
First consideration: determine whether to use either a parametric or a nonparametic statistic; if the data is not normally
distributed, use nonparametric; also if certain population parameters such as mean and standard deviation are not
provide, use nonparametric (Wilcoxon or Chi-square); if data is normal, use parametric, such as T-test.
Second consideration: whether a between-participants or correlated-groups design has been used. P.248
A nonparametric test is one that does not involve the use of any population parameters, such as the mean and standard
deviation. In addition, a nonparametric test does not assume a bell-shaped distribution. The test is nonparametric
because it fits this definition.
Chapter 11: Experimental designs with More than Two Levels of an Independent
VariableThe experiments described in Chapter 9 involved manipulating one independent variable with only two levels (aka
treatments)either a control group and an experimental group or two experimental groups. Researchers may want
more than 2 levels of an independent var b/c. they can compare multiple treatments e.g. compare placebo group w/.
control/experimental groups. P.281
If group 1 is compared to group 2, 2 to 3, 3 to 4, and so on, we increase the risk of a type 1 error by
where c equals the number of comparisons performed. One way of counteracting this is to use a more stringent alpha
level by performing the Bonferroni adjustment, in which the desired alpha level is divided by the number of tests or
comparisons. However, Type II error is increased. A better method is to use a single statistical test that compares all
groupsANOVA.
ANOVA is an inferential parametric statistical test for comparing the means of three or more groups that have interval o
ratio data. P.286. If the data are ordinal, use Kruskal-Wallis analysis of variance for a between-subjects design; for a
within-subjects design, where the data are skewed and/or ordinal, use the Friedman rank test. if data are nominal, use
chi-square test. If the Fobt value is greater than the Fcrit value, the results of ANOVA indicate that at least one of the
sample means differs significantly from the others. In that case, a post hoc test for comparing each of the groups in the
study with each of the other groups must be conducted to determine which ones difer significanlty from each other. e.g
Tukeys HSD test. p.297 Also, see p. 296 for the assumptions of the anova (interval-ratio, normal distributed etc.)
One-way randomized ANOVA
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
17/22
A significant ANOVA result i.e. F-value indicates that at least one of the sample means differs significantly from the
others. to determine which means differ significantly from the others, one needs to perform a post hock test (such as
Tukeys HSD). p.297 Assumptions (p.296): data are interval/ratio, normally distributed, observations are independent
etc. The term randomizedindicates that participants are randomly assigned to conditions in a between-participants
design. The term one-wayindicates that the design uses only one independent variable. E.g. rote rehearsal vs. imagery,
vs. story-telling on # of words recalled. This is a design with one independent var with 3 levels. The null hypothesis is
. The alternative hypothesis is atleast one u not equal to another u. When a researcher
rejects H0 using an ANOVA, it means that the independent
variable affected the dependent variable to the extent that at least one group mean differs from the others by more
than would be expected based on chance.
The grand mean is the mean performance across all participants in all conditions. Since none of the participants scored
the grand mean, there is variability between conditions. Is this variability due to the independent var or due toerror
variance--chance or uncontrolled variables such as individual differences between participants?
Within-groups variance
This is an estimate of the population error variance. Error variance can be ascertained by seeing the variability within
each condition b/c. participants were treated similarly.
Between-groups variance Systematic variance due either to the effects of the independent variable or to uncontrolled confounding vars Error variance
The F-ratio
If we assume that the systematic variance is due to the effects of the independent variable, then if the independent var
has a strong effect, the F-ratio will be substantially greater than one; else it will be around 1. P.264
Step 1: Sum of Squares p.291: Several types of sums of squares (SS) are used in the calculation of an ANOVA; SSwithin +SSbetween = SStotal
Total sum of squares (SStotal): the sum of the squared deviations of each score from the grandmean. The sum of the variances of all the groups are added together to produce the total sum of squares value
Within-groups sum of squares : , where X is each individual score, and is the mean for eachgroup or condition. This is the sum of the squared deviations of each score from its group or condition mean and
is a reflection of the amount oferror variance.
Between-groups sum of squares: . This is the sum of the squared deviations of eachgroups mean from the grand mean, multiplied by the number ofparticipants in each group. The between-
groups variance is an indication of the systematic variance across the groups. The basic idea: if the independent
var has no effect, the group means would be similar to the grand mean, and there would be little variance acros
conditions.
Step 2: Mean Square (MS) is the mean squared deviation that is an estimate of variance between and within the groups
MSwithin and MSbetween groups are calculated by dividing each SS by the appropriate df. Dftotal = N -1, where N is the total
number of subjects in the study; dfwithin = Nk, where k = # of groups; dfbetween = k 1. Note that if the dfwithin number
is not present in the table at the back, use the next lowest number (because when dfvalues decrease, the critical value
increases)p.294.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
18/22
Step 3: Calculate the F-ratio p.293
In APA format, to say that a test with a between groups df of 2 and a within groups df of 21 has a value of 11.07 and is
significant at the 0.01 level, we write: F(2,21) = 11.07, p
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
19/22
within groups sum of squares, the error sum of squares is left. Thus error sum of squares, SSerror, equals
Step 3: calculate F = MSbetween/MSerror p.302
MS or mean square; dfsubjects = n -1, where n is number of subjects (p.304). dftotal = N -1, where N is the total number
of scores in the study; dfparticipants = n -1, where n = # in group; dfbetween = k-1, where k is # of conditions; dferror =
dfbetween X dfparticipants. In table A.8, use dfbetween and dferror to find the Fcv
Effect size in the repeated measures ANOVA is calculated similarly to one-way ANOVA. P.280
Tukeys Post Hoc HSD test:
Chapter 12: Complex Experimental Designs p.316In the previous chapter, we discussed designs with more than two levels of an independent variable. In this chapter, we
will look at designs with more than one independent variablefactorial designs. P.316 A complete factorial design is
one in which all levels of each independent variable are paired with all levels of every other independent variable. An
incomplete factorial design, all levels are not paired with all levels of every other var.
The factorial notation for a factorial design is determined as follows:
Thus, a 3 X 6 factorial design is one with two independent variables, the first one of which has 3 levels and the second
one, 6 levels, for a total of 18 possible conditions. It is not possible to have a 1 X 3 factorial design.
A main effect is an effect of a single independent variable. The main effect of each independent variable tells us about
the relationship between that single independent variable and the dependent variable. In other words, do different
levels of one independent variable bring about changes in the dependent variable? For example, in a study about theeffects of different rehearsal types (rote, imagery) and different word types (concrete, abstract) on memory, the first
two are the independent variables, and memory is the dependent variable. p.317 There can be as many main effects as
there are independent variables. An interaction effect is the effect of each independent variable across the levels of the
other independent variable.
The relationship can be graphed. The dependent variable always goes on they-axis. One independent variable is
placed on thex-axis, and the levels of the other independent variable are captioned in the graph. P.294 Possible
outcomes of a 2 X 2 factorial design are Main effect of A? Main Effect of B? Interaction Effect? So 2*2*2 = 8 possible
outcomes (p.296).
Question p.322: How many main effect(s) and interaction effect(s) are possible in a 4 X 6 factorial design? A 4 X 6
factorial design has two independent variables. Thus, there is the possibility of two main effects (one for each
independent variable) and one interaction effect (the interaction between the two independent variables).
Two-Way ANOVA p.323
For the factorial designs discussed in this chapter, a two-way ANOVA would be used. The term two-wayindicates that
there are two independent variables in the study. As with one-way ANOVA, if either of the variables has an effect, the
variance between the groups should be greater than the variance within the groups. In a 2 X 2 factorial design, such as
the one we have been looking at in this chapter, there are three null and alternative hypotheses. The null hypothesis for
factor A states that there is no main effect for factor A, and the alternative hypothesis states that there is an effect of
factor A. A second null hypothesis states that there is no main effect for factor B. The third null hypothesis states that
there is no interaction of factors A and B.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
20/22
Step 1: Calculate SStotal. This is calculated in the same manner as one-way ANOVA. The dftotal also is the same: N 1;
Step 2: Calculate SSA. p.325 This is the sum of the squared deviation scores of each group mean for factor A minus the
grand mean times the number of scores in each factor A condition (column). The definitional formula is:
, where is the mean for each condition of factor A, is the grand mean, and
is the number of people in each of the factor A conditions. dfA = the number of levels of factor A minus 1. P.325.
SSB is calculated similarly.
Step 3: Calculate the sum of squares interaction (SSA X B):
, where Xc is the mean for each condtion (cell), Xg is the grand
mean, and nC is thenumber of scores in each condition or cell. The degrees of freedom for the interaction are based on
the number of conditions in the study. To determine the degrees of freedom across the conditions, we multiply the
degrees of freedom for the factors involved in the interaction. p.327
Step 4: Calculate sum of squares error (SSError): The sum of squares error (SSError) is the sum of the squared deviations of
each score from its condition (cell) mean:
. dfError is calculated as follows: the number of conditions in the study is multiplied by the
number of participants in each condition minus the one score not free to vary, orAB(n 1). P.303
In the table below, A = # of conditions in A (e.g. concrete vs. abstract), B = # of conditions in B (e.g. rote vs. imagery)
To determine the Fcritical value in Table A.8, we use dferror running down the left side of the table and the dfbetween
running across the top of the table. p.329 However, note that there are three dfbetween values and thus three Fcv
values. For factor A, dfbetween is dfA, for factor b, dfbetween is dfB, for the interaction, dfbetween is dfinteraction. If FA is
significant, this means that there was a significant main effect for factor A.
Note that Tukeys Post-hoc test needs only be completed if either or both of the independent variables have more than
two levels (assuming that the main effects are significant to begin with). e.g. in a 2X6 factorial design for which both
main effects are signficant, post-hoc needs to be calculated only for the independent variable that has six levels to
determine which pairs of these six are significant). p.331
eta-squared = SSbetween/SStotal; here SSbeween equals SSA, SSB, and SSAXB, respectively p.331
Chapter 13: Quasi-Experimental and Single-Case Designs
Non-manipulated Independent variables (aka participant vars e.g. gender, age, ethnicity, political affiliation): as with
experimental studies, groups are compared and hypotheses regarding causality are tested; however ,the participants are
not assigned randomly and the groups occur naturally. (p.345)
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
21/22
Single-group posttest-only design: involves the use of a single group of participants to whom some treatment is given.
there is neither a comparison group nor a comparison of the results to any previous measurements. Thesingle-group
pretest/posttest design is an improvement over the posttest-only design in that measures are taken twicebefore the
treatment and after the treatment. The single-group time-series design involves using a single group of participants,
taking multiple measures over a period of time before introducing the treatment, and then continuing to take several
measures after the treatment. The nonequivalent control group posttest-only design is similar to the single-group
posttest-only design; however, a nonequivalent control group is added as a comparison group. Nonequivalent means
that group membership is not random, but already established. Thus, the differences observed between the two groupson the dependent variable may be due to the nonequivalence of the groups and not to the treatment.P.323. An
improvement over the previous design involves the addition of a pretest measure, making it anonequivalent control
group pretest/posttest design. a pretest allows us to assess whether the groups are equivalent on the dependent
measure before the treatment is given to the experimental group. The logical extension of the previous design is to take
more than one pretest and posttest. In a multiple-group time-series design, several measures are taken on
nonequivalent groups before and after treatment.
Internal validity is the extent to which the results of an experiment can be attributed to the manipulation of the
independent variable, rather than to some confounding variable. Thus, quasi-experimental designs lack internal validity.
p.325
Statistical Analysis:
Depending on the type of data (nominal, ordinal, or interval-ratio), the number of levels of the independent variable, the
number of independent variables, and whether the design is between-participants or within-participants, we choose the
appropriate statistic as we did for the experimental designs.
Cross-sectional Designs p.352
Researchers study individuals of different ages at the same time. The advantage of this design is that a wide variety of
ages can be studied in a short period of time. The main issue is that the researcher is typically attempting to determine
whether or not there are differences across different ages; however, the reality of the design is such that the researcher
tests not only individuals of different ages but also individuals who were born at different times and raised in different
generations or cohorts, so rather than testing age differences, may be testing generational differences.
Longitudinal Design
With a longitudinal design, the same participants are studied repeatedly over a period of time. Disadvantage: people
who attrition may differ from those who remain in the study.
Sequential Designs
a researcher begins with participants of different ages (a cross-sectional design) and tests or measures them. Then,
either a number of months or years later, the researcher retests or measures the same individuals (a longitudinal
design). P.352
Single Case Research: versions of a within-participants experiment in which only one person is measured repeatedly.
Often the research is replicated on one or two other participants. Thus, we sometimes refer to these studies as small-ndesigns.
A reversal design is a within-participants design with only one participant in which the independent variable isintroduced and removed one or more times.
o An ABA reversal design involves taking baseline measures (A), introducing the independent variable (B)and measuring behavior again, and then removing the independent variable and retaking the baseline
measures (A). the reversal controls for confounds that may be changing the dependent variable.
o The ABAB reversal design involves reintroducing the independent variable after the second baselinemeasurement.
Multiple-baseline designs: Because single-case designs are a type of within-participants design, carryovereffects from one condition to another are of concern.
7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text
22/22
o Multiple Baselines across participants: So, here we assess the effect of introducing the treatment overmultiple participants, behaviors, or situations. We control for confounds not by reversing back to
baseline after treatment, as in a reversal design, but by introducing the treatment at different times
across different people, behaviors, or situations. P.331 This eliminates the possibility that some other
extraneous variable produced the results.
o Multiple baselines across behaviors: An alternative multiple-baseline design uses only one participantand assesses the effects of introducing a treatment over several behaviors. E.g. first introduce treatmen
for aggressive behaviors, then days later, for talking out of turn, then days later for temper tantrums
o Multiple baselines across situations: introduce treatment across different situations. E.g. treat first forbad behavior in math class, then days later, for bad behavior in English class. Introducing the treatmentat different times in the two classes minimizes the possibility that a confounding variable is responsible
for the behavior change.
Recommended