Download pdf - Notes on Jackson's Research Methods and Statistics 3rd edition Text

7/31/2019 Notes on Jackson's Research Methods and Statistics 3rd edition Text

1/22

Table of Contents

Chapter 1: Thinking like a scientist

Chapter 2: Getting started: ideas, resources, ethics

Chapter 3: defining, measuring and manipulating variables

Chapter 4: descriptive methods

Chapter 5: data organization and descriptive statistics

Chapter 6: correlational methods and statistics

Chapter 7: probability and hypothesis testing

Chapter 8: introduction to inferential statistics

Chapter 9: the logic of experimental design

Chapter 10: inferential statistics: two group designs

Chapter 11: experimental designs with more than two levels of an independent variable

Chapter 12: complex experimental designs

Chapter 13: quasi-experimental and single-case designs

Chapter 1 thinking like a scientist

Sources of knowledge: p.6

Superstition, intuition (couples more likely to conceive after adoptingan illusory correlation), authority (e.g. parents, actors), tenacity (repetition increases believabilityadvertising), rationalismlogical reasoning (syllogisms: A categorical syllogism consists of three parts: the major premise, the

minor premise and the conclusion. Each of the premises has one term in common with the conclusion: in a

major premise, this is the major term (i.e., the predicate of the conclusion); in a minor premise, it is the minor

term (the subject) of the conclusion. For example:

o Major premise: All men are mortal.o Minor premise: All Greeks are men.o Conclusion: All Greeks are mortal.

Empiricismknowledge through observation and experiences; get a long list of observable facts; needrationalism to assemble these facts logically; Aristotle was an empiricist, while Plato was a theorist.

Science: rationalism + empiricism; A hypothesis is a prediction regarding the outcome of a single study. Manyhypotheses may be tested and several research studies conducted before a comprehensivetheory on a topic is

put forth.

Publicly verifiable knowledge: research can be observed, replicated, criticized, and tested for veracity p.11 principle

of falsifiability: a theory must be stated in such a way that it is possible to refute.


2/22

Scientific research has three basic goals: (1) to describe behavior, (2) to predict behavior, and (3) to explain behavior

p.14

Research Methods in Science

Descriptive Methodso Observational: Naturalistic observation and Laboratory observation (p.15)o Case Study Method: in-depth study of one or more individuals e.g. Piagets theory of cognitive

development in children developed by simply describing the individual(s) being studied.

o Survey Method: question individuals on topic(s) and then describe their responses Predictive (relational) methods: we do not systematically manipulate the variables of interest; we only

measure them; since alternative explanations cant be ruled out, cannot establish causation

o Correlational method: assesses the degree of relationship between two measured varso Quasi-experimental method: differs from the experimental method in that subjects choose to be

members of the different groups being studied i.e. subject/participant varcant be changed i.e. it is

not a manipulated variable e.g. sorority vs. non-sorority girls (p.17)

Experimental method: Controls are very important in such experiments; you control who is in the study (geta representative population), who participates in each group (control for differences in participants by

random assignment between the control (baseline) group and the experimental group), and the treatment

each group receives (e.g. some take Vit C and some do not). Other vars such as amt of sleep, type of diet,

amt of exercise might also need to be controlled. P.19

Chapter 2: Getting Started on a Research Idea

Selecting a Problem: review past research on the problem OR review the pertinent chapter in the psychology text OR

observe a problem in nature and decide how to address it

Reviewing the Literature (p.33,34): A list of psychology journals is on p.32; Psych Abstracts, published by the APA, lists

abstracts on a monthly basis of all published work; PsycINFO is an electronic database that provides abstracts and

citations to the scholarly literature in the behavioral sciences and mental health. To help you choose appropriate

keywords, use the

APAs Thesaurus of Psychological Index Terms. Whereas Psych Abstracts finds articles published on a given topic withina given year, the Social Science Citation Index (SSCI) can help you to work from a given article (a key article) and see

what has been published on that topic since the key article was published.p.34.

PsyArticles is an online database that provides full-text articles from many psychology journals and is available through

many academic libraries. ProQuest is an online database that searches both scholarly journals and popular media

sources. Full-text articles are often available.p.35

Journal Article Structure (p.37)

Research articles usually have five main sections: Abstract, Introduction, Method, Results, and Discussion. The Abstract

is a brief description of the entire paper that typically discusses each section of the paper (Introduction, Method,

Results, and Discussion). It should not exceed 120 words. The Introduction has 3 components: (1) intro to the problem

(2) review of relevant previous research, and (3) purpose/rationale for study. Method section includes participants(selection processes), materials/apparatus (testing materials, equipment), and procedure (groups used in study,

instructions to participants, experimental manipulation, controls etc); The Results section summarizes the data collected

and the type of statistic used to analyze the data. This section should include a description of the results only, not an

explanation of the results. Discussion: The results are evaluated and interpreted in the Discussion section. It typically

begins with a restatement of the predictions of the study and tells whether or not the predictions were supported.

Institutional Review Boards (IRBs) oversee all federally funded research involving human participants. P.44 If the

participants in a study are classified as at minimal risk, then an informed consent is not mandatory. P.47 In studies

where anonymity and confidentiality are at risk, an informed consent form should be used.


3/22

Chapter 3: Defining, Measuring, and Manipulating Variables (p.57)

Operational definition: defining a variable in concrete terms e.g. hunger: >12hrs w/o

food intake, anxiety: galvanic skin response or HR. Purpose: communicate clearly to

others and measure/manipulate vars consistently.

Properties of measurement are listed at the right p.58

Scales of Measurement

A nominal scale is one in which objects or individuals are assigned tocategories that have no numerical properties. Nominal scales have the

characteristic ofidentity. Such variables are categorical b/c. data is grouped

into categories. E.g. ethniticity

In an ordinal scale, the categories form a rank order along a continuum andhave the properties ofidentity and magnitude but lack equal unit size (diff

bet. Rank 1 and 2; and rank 2 and 3) and absolute zero. Also known as

ranked data. E.g. class rank

In an interval scale, the units of measurement (intervals) between thenumbers on the scale are all equal in size. When you use an interval scale, the criteriaof identity, magnitude,

and equal unit size are met. E.g. Celsius temp scale the Fahrenheit scale does not have an absolute zero.Because of this, you cannot form ratios based on this scale (for example, 100 degrees is not twice as hot as 50

degrees).

Ratio data have all four properties of measurementidentity, magnitude, equal unit size, and absolute zero.Examples of ratio scales of measurement include weight, time, and height.

Aptitude tests measure an individuals potential to do something, whereas achievement tests measure an individuals

competence in an area. P.63 Behavioral measures are often referred to asobservational measures because they involve

observing anything that a participant does. Most physical measures, or measures of bodily activity, are not directly

Observable.

Reliability refers to the consistency or stability of a measuring instrument.p65 Examples of errors include trait(participant truthfulness e.g.) error and method errors (operator using equipment). In effect, a measurement is a

combination of the true score and an error score. Observed score = True score + Measurement error

Reliability is measured using correlation coefficients. A correlation coefficient measures the degree of relationship

between two sets of scores and can vary between -1.00 and +1.00. To establish the reliability (or consistency) of a

measure, we expect a strong correlation coefficientusually in the .80s or .90sbetween the two variables or scores

being measured. A positive coefficient indicates that those who scored high on the measuring instrument at one time

also scored high at another time, those who scored low at one point scored low again. P.68

Types of reliability: test/retest reliability (lowered if practice effects--person can get better between testing frompractice), alternate-forms reliability (diff but equivalent questions on the tests), split-half reliability, and inter-rater

reliability

Validity: a measure that is valid measures what it claims to measure. P.70 A systematic examination of the test content

to determine whether it covers a representative sample of the domain of behaviors to be measured assessescontent

validity. Criterion validity: estimate present performance (concurrent validity) or to predict future performance

(predictive validity). The construct validity of a test assesses the extent to which a measuring instrument accurately

measures a theoretical construct or trait that it is designed to measure.

p.72: A test can be reliable, but not valid, but it can never be valid without being reliable.


4/22

Chapter 4: Descriptive MethodsDescriptive methods: describe what has been observed in a group of people or animals, but dont allow one to make

accurate predictions or determine cause-and-effect relationships. Five different types of descriptive methods

observational methods, case studies, archival method, qualitative methods, and surveys (p.79).

Observational Methods: two types, naturalistic and laboratory (systematic).

Naturalistic: Ecological validity refers to the extent to which research can be generalized to real-life situations.In nonparticipant observation (goodall), there is the issue of reactivityparticipants reacting in an unnatural

way to someone obviously watching them. Disguised observation mitigates this. Expectancy effects are the

effect of the researchers expectations on the outcome of the study naturalistic observation has greater

flexibility but less control than laboratory observation.

Laboratory: also concerned with reactivity and expectancy effects. Advantage is that the situation is contrived sothe likelihood that participants will perform the behavior is higher. P.83.

Observational Methods: Data Collection

Narrative records: full narrative descriptions of a participants behavior.E.g. piagets studies of cognitivedevelopment in children

Checklists: A static item is a means of collecting data on characteristics that will not change while the observations are being made. E.g. # ppl present, age, gender; anaction item, is used to record whether specific

behaviors were present or absent during the observational time period. Disadvantage is missing behavior not

present on the checklist.

Qualitative Methods (case studies, archival, interviews/focus groups, field studies, action research): These are

distinguished from observational methods as follows: researchers are typically not interested in simplifying, objectifying

or quantifying what they observe.

Case Study Method: e.g. Piaget; an in-depth study of one or more individuals in the hope of revealing things that are

true of all of us. Problems: atypical individual causes erroneous generalizations to population. Also, expectancy effects.

P.85

Archival Method: describing data that existed before the time of the study. E.g. whether more babies are born when themoon is full. Use US census bureau etc. Risks are selection bias (cherry-pick data sources, also risk of not reliability or

validity b/c. using someone elses data

Interviews/focus group methods: 3 types of interviews: standardized interview (fixed questions), semistandardized, and

unstandardized (unstructured)

Field studies: similar to naturalistic observation; difference is that data are always collected in narrative form and left

that way. P.90 text.

Qualitative Method: Qualitative research focuses on phenomena that occur in natural settings, and the data are

typically analyzed without the use of statistics. Both the naturalistic observational method and the case study method

can be qualitative in nature.

Surveys (summary table on p.102): closed-ended, open-ended, partially open-ended (closed ended Questions with an

additional other option). A Likert rating scale (most psychs view it as interval, but some ordinal) presents a

statement rather than a question, and respondents are asked to rate their level of agreement with the statement.p.89

A loaded question is one that includes nonneutral or emotionally laden terms (e.g. eliminating wasteful excesses). A

leading question is one that sways the respondent to answer in a desired manner e.g. most people believe... Adouble-

barreled question asks more than one thing in a single item. Survey questions should not be randomly arranged:

sensitive questions (e.g. drug/sexual use) at end, demographic questions at end b/c. boring, group Qs on similar topics


5/22

together.p.90 A socially desirable response is one that is given because participants believe it is deemed appropriate by

society, rather than because it truly reflects their own views or behaviors.

mail survey: less sampling bias than phone/email b/c. wide availibilty; also ppl more comfortable answeringsensitive stuff; disadv: if Qs are unclear, no clarification; low response rate: 20%

Sampling Techniques for Surveys: There are two ways to sample individuals from a population: probability sampling and

nonprobability sampling.

Probability sampling:p.95 random selection, stratified random sample (guarantee that the sample accuratelyrepresents the population on specific characteristics; cluster sampling: e.g. sample from classes that arerequired of all students at the university, such as English composition.

Non-probability Sampling:individual members of the population do not have an equal likelihood of beingselected

o Convenience (haphazard) sampling: if you wanted a sample of 100 college students, you could standoutside of the library and ask people who pass by to participate

o Quota sampling: Quota sampling is to nonprobability sampling what stratified random sampling is toprobability sampling, but still not much effort devoted to creating a sample that is truly representative

of the population

Chapter 5: Data Organization and Descriptive Statistics

In a class interval frequency distribution, individual scores are combined into categories, or intervals, and then listed

along with the frequency of scores in each interval. P.106

For nominal scale or qualitative data, a bar graph (graphical representation of frequency distribution) is most

appropriate e.g. democrats, independents, republicans. For quantitative data in ordinal, interval, or ratio scales, a

histogram is used. P.107 Unlike in a bar graph, in a histogram, the bars touch each other to indicate that the scores on

the variable represent related, increasing values.

Frequency polygona line graph of the frequencies of individual scores or intervals. Again, scores (or intervals) are

shown on thex-axis and frequencies on the y-axis. After all the frequencies are plotted, the data points are connected.

Use with quantitative, continuous data like height, weight.

Measures of Central Tendency: A measure of central tendency is a representative number that characterizes the

middleness of an entire set of data. E.g. Mean, median, and mode p. 110

Mean: The mean is appropriate for interval and ratio data but not for ordinal or nominal data

For the sample mean,

Median: not affected by extreme scores.

Measures of Variation: A measure of central tendency provides information about the middleness of a distribution of

scores but not about the width or spread of the distribution. P.114 A measure of variation indicates the degree to

which scores are either clustered or spread out in a distribution.

Range: The simplest measure of variation is the rangethe difference between the lowest and the highest scores in a

distribution. The range is usually reported with the mean of the distribution.

Standard Deviation for a population (p.117):


6/22

Standard Deviation for a sample:

Compare this to the average deviation for a population, which is given by . Note that the standard

deviation will always be larger than the average deviation because the squaring of the terms gives more weight to

outlying values.

If, however, sample data is being used to estimate the population standard deviation, then an unbiased estimator

modification of N 1 must be used: p.126

Notice that the symbol for the unbiased estimator of the population standard deviation is s (lowercase), whereas the

symbol for the sample standard deviation is S (uppercase). The estimate has N-1 in the denominator to compensate for

the small samples not containing as much variability as the real population.

Variance is the square of the standard deviation.

Normal distributions are bell-shaped, symmetrical, and have an identical mean, median, and mode. They are unimodal;

most observations are centrally clustered. Last, when standard deviations are plotted on thex-axis, the percentage of

scores falling between the mean and any point on thex-axis is the same for all normal curves. (p.121)

Kurtosis: how flat or peaked a normal distribution is; Platykurtic = short and wide (think: platypus = close to theground, flat); Mesokurtic = medium height/breath; Leptokurtic = tall and thin (think: lepto = leap)

In a positively skewed distribution, the peak is to the left of the center point, and

the tail extends toward the right. Reason for its name: few individuals have

extremely high scores that pull the distribution in that direction. Negatively

skewed is just the opposite. P.122 If your disease has a low median survival rate,

you would prefer a positive skewthis means some people live for a very long

time post-diagnosis.

The Z-score (p.124): A z-score or standard score is a measure of how many

standard deviation units an individual raw score falls from the mean of the

distribution. Thus, when calculating a z-score for an individual in comparison to a

sample, we use , while for a

population, we use .

If the distribution of scores for which you are

calculating transformations (z-scores) is normal

(symmetrical and unimodal), then it is referred to as

the standard normal distributiona normal

distribution with a mean of 0 and a standard

deviation of 1.p.126

The standard normal curve can also be used to determine an individualspercentile rankthe percentage of scores

equal to or below the given raw score. P131


7/22

Chapter 6: Correlational Methods and StatisticsCorrelational Methods: determine whether two variables are related to one another. P.148 In

addition to describing a relationship, correlations allow us to make predictions from one

variable to another. If two variables are correlated, we can predict from one variable to the

other with a certain degree of accuracy. The magnitude or strength of a relationship is

determined by the correlation coefficient describing the relationship:0 = no correlation; 0 -

0.29: weak correlation; 0.3 to 0.69: moderate correlation; 0.7 to 1.0: strong correlation; 1.0 =

perfect correlation. In a perfect correlation, an increase/decrease in one variable is always

accompanied by an increase/decrease in the other variable.

Thus, in a graph, when there is a perfect correlation, the data points all fall exactly on a straight line (the

slope is irrelevant unless it is zero). Accompanying scatterplot shows no relationship. Also, it is

possible for a correlation coefficient of zero to indicate a curvilinear relationship (the + and

correlations nullify each other e.g. Anxiety vs. text performance, memory and age ) p.144

Misinterpreting Correlations

Causality refers to the assumption that the correlation indicates a causal relationship between two variables, whereas

directionality refers to the inference made with respect to the direction of a causal relationship between two variables.

P.146.

Third variable effects: a strong correlation between two variables is not really a meaningful relationship and isreally the product of a third variable. E.g. researchers found contraceptive use strongly correlated w/. # ofelectrical appliances; the third var was socioeconomic status; to remove the effect of the 3rd var, use partial

correlation p.148.

Restrictive Range: examine the correlated vars over a very short range that isnt big enough to observe acorrelation.

Curvilinear relationships mask correlationsPearson product-moment correlation coefficient: Pearsonsris used for data measured on an interval or ratio scale of

measurement. P.151. e.g. consider a list of 20 individuals heights and weights. Step 1: calculate the mean and S.D. for

the heights and weights. Next, convert each value to a z-score. If the correlation is strong and positive, we should find

that positive z-scores on one variable go with positive z-scores on the other variable and vice versa. Step 2: calculate thecross-product i.e. multiply each of the z-scores together and sum the respective products. If both zs are consistently

positive or negative or positive/negative, you will end up with a large positive or negative value and a strong correlation

The overall formula is below: General rule of thumb: at least 10 ppl per variable. An alternative,

computational formula is listed on p.154.

Coefficient of Determination: Calculated by squaring the correlation coefficient, the coefficient of determination (r2) is

a measure of the proportion of the variance in one variable that is accounted for by another variable. R2 is typically

reported as a percentage.p.154

Correlations for Nominal or Ordinal Data:

Spearmans rank-order correlation coefficient: both vars are ordinal (ranking) scale. If one var is interval/ratio, itmust first be converted into the ordinal scale.

Point-biserial correlation coefficient: one var is a two-value dichotomous nominal (e.g. gender) and the other isinterval or ratio

Phi coefficient: both vars are dichotomous nominal vars.Regression Analysis:p.156 A tool that enables us to predict an individuals score on one variable based on knowing one

or more other variables is regression analysis. Regression analysis involves determining the equation for the best-fitting


8/22

line for a data set. The line has the form of y = mx + b, but is written as follows: , where Y is the

predicted value on the Yvariable, b is the slope of the line,Xrepresents an individuals score on theXvariable, and a is

the y-intercept. To compute the slope: To compute the y-intercept: , where the

bars are the respective sample means. Multiple regression analysis involves combining several predictor variables in a

single regression equation to increase the predictive accuracy because in the real world, it is unlikely that one variable is

affected by only one other variable.

Chapter 7: Hypothesis Testing and Inferential Statistics

Probability: multiplication and addition rule p.178

It is impossible statistically, however, to demonstrate that something is true. In fact, statistical techniques are much

better at demonstrating that something is not true. Whatever the research topic, thenull hypothesis always predicts

that there is no difference between the groups being compared.

One-tailed test p.184: E.g. Do students in after-school programs have higher IQs than those in the generalpopulation? The null and alternative hypotheses are:

Two-tailed test: e.g. the researcher just wants to prove that there are IQ differences between the two groups,but isnt concerned with the direction of those differences.

Errors: p.186

The p-value or alpha level: When a result is statistically significant at the 0.05 (or 5%) level, it means that the

observed difference between the sample and the population could have occurred by chance only 5 out of every 100

times. In other words, any variation between groups is most likely due to true/real differences between them. In

this case, the risk of a Type I error is 5%.

Chapter 8: Introduction to Inferential Statistics

Inferential Statistics: p.197 three teststhe z test, the t test, and the chi-square (X2) test; the z test and the t-test are

used with interval or ratio data and are parametricassumptions such as knowing population mean (u) and standard

deviation (sigma) are needed; the chi-square test is used with ordinal or nominal data and is non-parametric.

The z-test: parametric inferential statistical test. needs population parameters such as mean and standard deviation.

determines the likelihood that the sample is part of the sampling distribution. allows us to test the null hypothesis for a

single sample when the population variance is known. Remember that az-score tells us how many standard deviations

above or below the mean of the distribution an individual score falls. But in the IQ problem above, we are not comparing


9/22

an individuals score to the population mean, but rather a sample mean must be compared instead with a distribution of

sample means, known as the sampling distribution.

Standard Error of the Mean p.198: the standard error of the mean (the standard deviation of thesampling distribution) can never be as large as , the standard deviation for the distribution of individual scores.

Think about it this way: if the size of each of these samples were to approach the population size, their means

would all be tightly clustered around the pop. mean and the standard deviation of the sample distribution would

be very small. Thus, the central limit theorem states that for any population with mean u and standard

deviation , the distribution of sample means for sample size N will have a mean of u and a standard deviation

of/sqrt (N) and will approach a normal distribution as N approaches infinity. p.198 Thus,

The z-score will tell us how many standard deviation units a sample mean is from the population mean, or thelikelihood that the sample is from that population. P.175 e.g. if wind up with a z = 2.06 for the one-tailed test,

the zritical = 1.64 i.e. the area under the graph to the right of that is 5%. The z-value would be significant and H0

would be rejected. In APA style, report result as Z (n = 50) = 2.06, p


10/22

The estimated standard error of the mean: , where . is the estimatedstandard error of the mean i.e. an estimate of the standard deviation of the sampling distribution based on

sample data since the pop. Standard dev is not known. s, (the estimated standard deviation for a population,

based on sample data):

APA style: t(9) = 2.06, p


11/22

History: change in dependent var due to external circumstances; eg. Stress reduction b/c. exams at start andvacation at end of study

Maturation: participants mature physically, socially, and cognitively over course of study Testing: the testing effectchange in performance due to familiarity with and practice on test items. Both +

practice effect and fatigue effect

Regression to the Mean: extreme scores that are the product of chance will moderate upon retesting Instrumentation effect: observer becomes better/more fatigued with taking measures Attrition/Mortality: e.g. heaviest smokers in experimental cessation group drop-out; post-test measures would

be unduly optimistic Diffusion of treatment: people receive treatment info from other participants Experimenter/Participant Effects: experimenter bias or expectancy effects influence outcome e.g. clever hans

the mathematical horse receiving cues from owners. Solve via single blind: either the experimenter or the

participants are blind to the manipulation being made or double blind: both unaware; Participant effects include

reactivitychange in behavior due to being watched. Also, placebo effect.

Floor and ceiling effects: e.g. measure rat weight in poundsno change detectedfloor effect; ceiling effectmeasure elephant w/. 350 lb max limit bathroom scale;


12/22

Threats to External Validity

Generalization to Populations: hampered by the college sophomore problem Generalizations from Lab settings: control maximized in lab settingsthe artificiality criticism; solve by

conceptual replicationtest concepts via diff indep var or dep var.

Correlated-Group Designs: participants in experimental and control groups are related Within-participant design: also known as repeated measures designsall participants serve in all conditions;

benefit is that you need fewer participants (e.g. if there are 4 conditions and need 15 ppl per condition; then in

the between-participants design, need 60 ppl, whereas only 15 for within-participant design), takes less time,

and increases statistical power b/c. reduces variability due to individual differences; this mode is popular is

psychological research p.240 downside; b/c. participants tested at least twice, practice/fatigue effects; solve via

counterbalancingreverse the order of tasks presented to control and experimental groups; however, with

three conditions, 6 possibilities, 4 conditions have 24 orderings of conditions; therefore, complete

counterbalancingexposing participants to all of the orderings of conditions is not possible; also carry-over

effectsdrug administered in one condition effects performance in subsequent conditions


13/22

Matched-Participants Experimental design: for each participant in one condition, there is a participant in theother condition(s) who matches him or her on some relevant variable or variables. Has advantages over the

between-participant design (groups are more similar) and the within-participant design (less carryover testing

effects); downsidemore people needed; also mortality effectsif one person drops out, the pair is

compromised; also difficulty finding participants (p.242)

Chapter 10: Inferential StatisticsTwo-group Designs

The inferential statistics discussed in Chapter 7 compared single samples with populations (z test, ttest, and test).

The statistics discussed in this chapter are designed to test differences between two equivalent groups or treatment

conditions.

The t Test for Independent Groups (Samples): p.251It indicates whether the two samples perform so similarly that we conclude they are likely from the same population, or

whether they perform so differently that we conclude they represent two different populations. P.227 e.g. researcher

wants to determine whether spacingstudy same amt of material all at intervalsis superior to cramming. Thus,

The dependent var is participants scores on a test

Statistical significance indicates that an observed difference between two descriptive statistics (such as means) is

unlikely to have occurred by chance.

Rather than comparing a single sample mean to a population mean, we are comparing two

sample means. To determine how far the difference between the sample means is from the difference between the

population means, we convert the mean differences to standard errors.

The standard error of the difference between the means does have a logical meaning. If we took thousands of pairs of

samples from these two populations and found for each pair, those differences between means would not

all be the same. They would form a distribution. The mean of that distribution would be the difference between the

means of the populations and its standard deviation would be . Thus,

, where . s12 and s2

2 are the variances of the two groups. P.252 The

degrees of freedom for this independent groups t test are (n1 -1) + (n2 -1). Refer to Table A.3 for the tcritical value. APA

style: t(18) =4.92, p


14/22

effect sizethe proportion of variance in the dependent variable that is accounted for by the manipulation of the

independent variable. It is an estimate of the effect of the independent variable, regardless of sample size. P.232 For

the ttest, one formula for effect size, known as Cohens d, is

According to Cohen (1988, 1992), a small effect size is one of at least 0.20, a medium effect size is at least 0.50, and alarge effect size is at least 0.80. e.g. APA: t(18) = 4.92,p = .05 (one-tailed), d= 2.198

R2: the proportion of variance accounted for in the dependent variable based on knowing which treatment group the

participants were assigned to for the independent variable.

Confidence Intervals: Same formula as before (Ch. 7), except that rather than using the sample mean and the standard

error of the mean, we use the difference between the means and the standard error of the difference between means.

p.257

T test for Correlated GroupsThe same people are used in each group (a within-participants design) or different participants are matched between

groups (a matched-participants design). P.260 In a correlated groups design, the sample includes two scores for each

person, instead of just one. The null hypothesis is that there is no difference between the two scores. The degrees of

freedom for a correlated-groups ttest are equal to N 1

Step 1: We compute a difference score for each person by subtracting one score from the other for that person (or the

two individuals in a matched pair).

The standard error of the difference scores is the standard deviation of the sampling distribution of mean

differences between dependent samples.

, where sDis the unbiased estimator of the standard deviation of the difference scores and N is the

number of participants in each group.

Effect size: Cohens d and r2 p.262

. The r2 formula is the same as that listed above


15/22

Confidence interval:

e.g. on word memorization differences between concrete and abstract words, we could

answer that we are 95% confident that the difference in performance on the 20-item memory test between the two

word type conditions would be between 0.96 and 4.04 words recalled correctly.

Nonparametric Tests

A nonparametric test does not use any population parameters, such as the mean and standard deviation. Three

nonparametric tests: the Wilcoxon rank-sum test, the Wilcoxon matched-pairs signed-ranks Ttest (both used withordinal data), and the chi-square test of independence, used with nominal data. P.240

Wilcoxon Rank-Sum Test: p.265

The Wilcoxon rank-sum test is similar to the independent-groups ttest; however, it uses ordinal data (ranking) rather

than interval-ratio data and compares medians rather than means. Interval or ratio data may be converted to ranked

ordinal data. The underlying distribution is not normal. First, sum the ranks for the group expected to have the smaller

total. This value needs to be equal to or less than the critical value to be statistically significant. Further, in table A.6, n1

is always the smaller of the two groups. Refer to Table A.6. Table A.6 presents the critical values for one-tailed tests

only. If a two-tailed test is used, the table can be adapted by dividing the alpha level in half. n1(the number of

participants in a group) is always the smaller of the two groups. Assumptions of this test: p.266

Wilcoxon Matched-Pairs T Test

This is a nonparametric statistic and is necessary whenever the distribution is skewed (i.e. not normal). P.243

e.g. during the first term, the teacher measures the number of books her students read and ranks them ordinally; during

the second term, a rewards program is instituted and the students are again ranked. Is there a statistically significant

difference between the # of books read? The null hypothesis is that the median number of books read does not differ;

the alternative hypothesis is that the median number of books read during rewards is greater.

Step 1: for each student, compute a difference score (subtract books read 2nd

month from those read first month); if

program had no effect, would expect most scores to be close to 0.

Step 2: rank the absolute values of the difference scores. If two scores at position 1 have the same numerical value,

they are both ranked 1.5 and the next score gets a 3. Note that any values with a difference score of zero are not

ranked and do not figure into the N value.

Step 3: give the rank the sign of the difference score it representsStep 4: sum the positive and negative ranks. for a two-tailed test, Tobt is equal to the smaller of the summed ranks. In

contrast, the Tobt for a one-tailed test is the sum of the signed rankspredictedto be smaller. p.268 As with the Wilcoxon

rank-sum test, the obtained value needs to be equal to or less than the critical value to be statistically significant.

Chi-Square Test of IndependenceThis nonparametric test compares an observed frequency distribution to an expected frequency distribution of two

nominal variables. P.245 The difference between the Chi-Square test of independence and the Chi-Square goodness-of-

fit test (ch.7) is that the goodness-of-fit test compares how well an observed frequency distribution ofone nominal

variable fits some expected pattern of frequencies, whereas the test of independence compares how well an observed

frequency distribution oftwo nominal variables fits some expected pattern of frequencies. The degrees of freedom for

this test are equal to (r-1)(c - 1), where ris the number of rows and c is the number of columns.


16/22

Objective: determine whether babysitters are more likely to have taken first aid than those who have never worked as

babysitters.

To determine the expected frequency for each cell:

, where RTis the row total, CTis the column total, and N is the total number of observations. P.246

If the exceeds the , then thenull hypothesis can be rejected.

Chi-Square test and effect size: Phi Coefficient

As with the ttests discussed earlier in this chapter, we can also compute the effect size for a test of independence.

. Cohens (1988) specifications for the phi coefficient indicate that a phi coefficient of .10 is a small effect,

.30 is a medium effect, and .50 is a large effect. In our particular example, if the phi value is small, then the difference

observed in whether a teenager had taken a first aid class is not strongly accounted for by being a babysitter.

Summary:

First consideration: determine whether to use either a parametric or a nonparametic statistic; if the data is not normally

distributed, use nonparametric; also if certain population parameters such as mean and standard deviation are not

provide, use nonparametric (Wilcoxon or Chi-square); if data is normal, use parametric, such as T-test.

Second consideration: whether a between-participants or correlated-groups design has been used. P.248

A nonparametric test is one that does not involve the use of any population parameters, such as the mean and standard

deviation. In addition, a nonparametric test does not assume a bell-shaped distribution. The test is nonparametric

because it fits this definition.

Chapter 11: Experimental designs with More than Two Levels of an Independent

VariableThe experiments described in Chapter 9 involved manipulating one independent variable with only two levels (aka

treatments)either a control group and an experimental group or two experimental groups. Researchers may want

more than 2 levels of an independent var b/c. they can compare multiple treatments e.g. compare placebo group w/.

control/experimental groups. P.281

If group 1 is compared to group 2, 2 to 3, 3 to 4, and so on, we increase the risk of a type 1 error by

where c equals the number of comparisons performed. One way of counteracting this is to use a more stringent alpha

level by performing the Bonferroni adjustment, in which the desired alpha level is divided by the number of tests or

comparisons. However, Type II error is increased. A better method is to use a single statistical test that compares all

groupsANOVA.

ANOVA is an inferential parametric statistical test for comparing the means of three or more groups that have interval o

ratio data. P.286. If the data are ordinal, use Kruskal-Wallis analysis of variance for a between-subjects design; for a

within-subjects design, where the data are skewed and/or ordinal, use the Friedman rank test. if data are nominal, use

chi-square test. If the Fobt value is greater than the Fcrit value, the results of ANOVA indicate that at least one of the

sample means differs significantly from the others. In that case, a post hoc test for comparing each of the groups in the

study with each of the other groups must be conducted to determine which ones difer significanlty from each other. e.g

Tukeys HSD test. p.297 Also, see p. 296 for the assumptions of the anova (interval-ratio, normal distributed etc.)

One-way randomized ANOVA


17/22

A significant ANOVA result i.e. F-value indicates that at least one of the sample means differs significantly from the

others. to determine which means differ significantly from the others, one needs to perform a post hock test (such as

Tukeys HSD). p.297 Assumptions (p.296): data are interval/ratio, normally distributed, observations are independent

etc. The term randomizedindicates that participants are randomly assigned to conditions in a between-participants

design. The term one-wayindicates that the design uses only one independent variable. E.g. rote rehearsal vs. imagery,

vs. story-telling on # of words recalled. This is a design with one independent var with 3 levels. The null hypothesis is

. The alternative hypothesis is atleast one u not equal to another u. When a researcher

rejects H0 using an ANOVA, it means that the independent

variable affected the dependent variable to the extent that at least one group mean differs from the others by more

than would be expected based on chance.

The grand mean is the mean performance across all participants in all conditions. Since none of the participants scored

the grand mean, there is variability between conditions. Is this variability due to the independent var or due toerror

variance--chance or uncontrolled variables such as individual differences between participants?

Within-groups variance

This is an estimate of the population error variance. Error variance can be ascertained by seeing the variability within

each condition b/c. participants were treated similarly.

Between-groups variance Systematic variance due either to the effects of the independent variable or to uncontrolled confounding vars Error variance

The F-ratio

If we assume that the systematic variance is due to the effects of the independent variable, then if the independent var

has a strong effect, the F-ratio will be substantially greater than one; else it will be around 1. P.264

Step 1: Sum of Squares p.291: Several types of sums of squares (SS) are used in the calculation of an ANOVA; SSwithin +SSbetween = SStotal

Total sum of squares (SStotal): the sum of the squared deviations of each score from the grandmean. The sum of the variances of all the groups are added together to produce the total sum of squares value

Within-groups sum of squares : , where X is each individual score, and is the mean for eachgroup or condition. This is the sum of the squared deviations of each score from its group or condition mean and

is a reflection of the amount oferror variance.

Between-groups sum of squares: . This is the sum of the squared deviations of eachgroups mean from the grand mean, multiplied by the number ofparticipants in each group. The between-

groups variance is an indication of the systematic variance across the groups. The basic idea: if the independent

var has no effect, the group means would be similar to the grand mean, and there would be little variance acros

conditions.

Step 2: Mean Square (MS) is the mean squared deviation that is an estimate of variance between and within the groups

MSwithin and MSbetween groups are calculated by dividing each SS by the appropriate df. Dftotal = N -1, where N is the total

number of subjects in the study; dfwithin = Nk, where k = # of groups; dfbetween = k 1. Note that if the dfwithin number

is not present in the table at the back, use the next lowest number (because when dfvalues decrease, the critical value

increases)p.294.


18/22

Step 3: Calculate the F-ratio p.293

In APA format, to say that a test with a between groups df of 2 and a within groups df of 21 has a value of 11.07 and is

significant at the 0.01 level, we write: F(2,21) = 11.07, p


19/22

within groups sum of squares, the error sum of squares is left. Thus error sum of squares, SSerror, equals

Step 3: calculate F = MSbetween/MSerror p.302

MS or mean square; dfsubjects = n -1, where n is number of subjects (p.304). dftotal = N -1, where N is the total number

of scores in the study; dfparticipants = n -1, where n = # in group; dfbetween = k-1, where k is # of conditions; dferror =

dfbetween X dfparticipants. In table A.8, use dfbetween and dferror to find the Fcv

Effect size in the repeated measures ANOVA is calculated similarly to one-way ANOVA. P.280

Tukeys Post Hoc HSD test:

Chapter 12: Complex Experimental Designs p.316In the previous chapter, we discussed designs with more than two levels of an independent variable. In this chapter, we

will look at designs with more than one independent variablefactorial designs. P.316 A complete factorial design is

one in which all levels of each independent variable are paired with all levels of every other independent variable. An

incomplete factorial design, all levels are not paired with all levels of every other var.

The factorial notation for a factorial design is determined as follows:

Thus, a 3 X 6 factorial design is one with two independent variables, the first one of which has 3 levels and the second

one, 6 levels, for a total of 18 possible conditions. It is not possible to have a 1 X 3 factorial design.

A main effect is an effect of a single independent variable. The main effect of each independent variable tells us about

the relationship between that single independent variable and the dependent variable. In other words, do different

levels of one independent variable bring about changes in the dependent variable? For example, in a study about theeffects of different rehearsal types (rote, imagery) and different word types (concrete, abstract) on memory, the first

two are the independent variables, and memory is the dependent variable. p.317 There can be as many main effects as

there are independent variables. An interaction effect is the effect of each independent variable across the levels of the

other independent variable.

The relationship can be graphed. The dependent variable always goes on they-axis. One independent variable is

placed on thex-axis, and the levels of the other independent variable are captioned in the graph. P.294 Possible

outcomes of a 2 X 2 factorial design are Main effect of A? Main Effect of B? Interaction Effect? So 2*2*2 = 8 possible

outcomes (p.296).

Question p.322: How many main effect(s) and interaction effect(s) are possible in a 4 X 6 factorial design? A 4 X 6

factorial design has two independent variables. Thus, there is the possibility of two main effects (one for each

independent variable) and one interaction effect (the interaction between the two independent variables).

Two-Way ANOVA p.323

For the factorial designs discussed in this chapter, a two-way ANOVA would be used. The term two-wayindicates that

there are two independent variables in the study. As with one-way ANOVA, if either of the variables has an effect, the

variance between the groups should be greater than the variance within the groups. In a 2 X 2 factorial design, such as

the one we have been looking at in this chapter, there are three null and alternative hypotheses. The null hypothesis for

factor A states that there is no main effect for factor A, and the alternative hypothesis states that there is an effect of

factor A. A second null hypothesis states that there is no main effect for factor B. The third null hypothesis states that

there is no interaction of factors A and B.


20/22

Step 1: Calculate SStotal. This is calculated in the same manner as one-way ANOVA. The dftotal also is the same: N 1;

Step 2: Calculate SSA. p.325 This is the sum of the squared deviation scores of each group mean for factor A minus the

grand mean times the number of scores in each factor A condition (column). The definitional formula is:

, where is the mean for each condition of factor A, is the grand mean, and

is the number of people in each of the factor A conditions. dfA = the number of levels of factor A minus 1. P.325.

SSB is calculated similarly.

Step 3: Calculate the sum of squares interaction (SSA X B):

, where Xc is the mean for each condtion (cell), Xg is the grand

mean, and nC is thenumber of scores in each condition or cell. The degrees of freedom for the interaction are based on

the number of conditions in the study. To determine the degrees of freedom across the conditions, we multiply the

degrees of freedom for the factors involved in the interaction. p.327

Step 4: Calculate sum of squares error (SSError): The sum of squares error (SSError) is the sum of the squared deviations of

each score from its condition (cell) mean:

. dfError is calculated as follows: the number of conditions in the study is multiplied by the

number of participants in each condition minus the one score not free to vary, orAB(n 1). P.303

In the table below, A = # of conditions in A (e.g. concrete vs. abstract), B = # of conditions in B (e.g. rote vs. imagery)

To determine the Fcritical value in Table A.8, we use dferror running down the left side of the table and the dfbetween

running across the top of the table. p.329 However, note that there are three dfbetween values and thus three Fcv

values. For factor A, dfbetween is dfA, for factor b, dfbetween is dfB, for the interaction, dfbetween is dfinteraction. If FA is

significant, this means that there was a significant main effect for factor A.

Note that Tukeys Post-hoc test needs only be completed if either or both of the independent variables have more than

two levels (assuming that the main effects are significant to begin with). e.g. in a 2X6 factorial design for which both

main effects are signficant, post-hoc needs to be calculated only for the independent variable that has six levels to

determine which pairs of these six are significant). p.331

eta-squared = SSbetween/SStotal; here SSbeween equals SSA, SSB, and SSAXB, respectively p.331

Chapter 13: Quasi-Experimental and Single-Case Designs

Non-manipulated Independent variables (aka participant vars e.g. gender, age, ethnicity, political affiliation): as with

experimental studies, groups are compared and hypotheses regarding causality are tested; however ,the participants are

not assigned randomly and the groups occur naturally. (p.345)


21/22

Single-group posttest-only design: involves the use of a single group of participants to whom some treatment is given.

there is neither a comparison group nor a comparison of the results to any previous measurements. Thesingle-group

pretest/posttest design is an improvement over the posttest-only design in that measures are taken twicebefore the

treatment and after the treatment. The single-group time-series design involves using a single group of participants,

taking multiple measures over a period of time before introducing the treatment, and then continuing to take several

measures after the treatment. The nonequivalent control group posttest-only design is similar to the single-group

posttest-only design; however, a nonequivalent control group is added as a comparison group. Nonequivalent means

that group membership is not random, but already established. Thus, the differences observed between the two groupson the dependent variable may be due to the nonequivalence of the groups and not to the treatment.P.323. An

improvement over the previous design involves the addition of a pretest measure, making it anonequivalent control

group pretest/posttest design. a pretest allows us to assess whether the groups are equivalent on the dependent

measure before the treatment is given to the experimental group. The logical extension of the previous design is to take

more than one pretest and posttest. In a multiple-group time-series design, several measures are taken on

nonequivalent groups before and after treatment.

Internal validity is the extent to which the results of an experiment can be attributed to the manipulation of the

independent variable, rather than to some confounding variable. Thus, quasi-experimental designs lack internal validity.

p.325

Statistical Analysis:

Depending on the type of data (nominal, ordinal, or interval-ratio), the number of levels of the independent variable, the

number of independent variables, and whether the design is between-participants or within-participants, we choose the

appropriate statistic as we did for the experimental designs.

Cross-sectional Designs p.352

Researchers study individuals of different ages at the same time. The advantage of this design is that a wide variety of

ages can be studied in a short period of time. The main issue is that the researcher is typically attempting to determine

whether or not there are differences across different ages; however, the reality of the design is such that the researcher

tests not only individuals of different ages but also individuals who were born at different times and raised in different

generations or cohorts, so rather than testing age differences, may be testing generational differences.

Longitudinal Design

With a longitudinal design, the same participants are studied repeatedly over a period of time. Disadvantage: people

who attrition may differ from those who remain in the study.

Sequential Designs

a researcher begins with participants of different ages (a cross-sectional design) and tests or measures them. Then,

either a number of months or years later, the researcher retests or measures the same individuals (a longitudinal

design). P.352

Single Case Research: versions of a within-participants experiment in which only one person is measured repeatedly.

Often the research is replicated on one or two other participants. Thus, we sometimes refer to these studies as small-ndesigns.

A reversal design is a within-participants design with only one participant in which the independent variable isintroduced and removed one or more times.

o An ABA reversal design involves taking baseline measures (A), introducing the independent variable (B)and measuring behavior again, and then removing the independent variable and retaking the baseline

measures (A). the reversal controls for confounds that may be changing the dependent variable.

o The ABAB reversal design involves reintroducing the independent variable after the second baselinemeasurement.

Multiple-baseline designs: Because single-case designs are a type of within-participants design, carryovereffects from one condition to another are of concern.


22/22

o Multiple Baselines across participants: So, here we assess the effect of introducing the treatment overmultiple participants, behaviors, or situations. We control for confounds not by reversing back to

baseline after treatment, as in a reversal design, but by introducing the treatment at different times

across different people, behaviors, or situations. P.331 This eliminates the possibility that some other

extraneous variable produced the results.

o Multiple baselines across behaviors: An alternative multiple-baseline design uses only one participantand assesses the effects of introducing a treatment over several behaviors. E.g. first introduce treatmen

for aggressive behaviors, then days later, for talking out of turn, then days later for temper tantrums

o Multiple baselines across situations: introduce treatment across different situations. E.g. treat first forbad behavior in math class, then days later, for bad behavior in English class. Introducing the treatmentat different times in the two classes minimizes the possibility that a confounding variable is responsible

for the behavior change.