Measurement, Data Collection, Validity & Reliability Data is your friend

Preview:

Citation preview

Measurement, Data Collection,

Validity & Reliability

Data is your friend

Agenda

• Measurement

• Measures (aka, ways to collect data)

• Validity/reliability, up close and personal

Educational Measurement

• Measurement: assignment of numbers to differentiate values of a variable

• GOOD RESEARCH MUST HAVE SOUND MEASUREMENT!!

Thought Question

• Consider the following scores on a test

Marco 90 Adriane 85 Linda 75 Christy 99Chantelle 88 Jay 45 Remi 68 Marcus 97Chi Bo 92 Donnie 85

• Which measure of central tendency would Adriane use when telling her parents about her performance?

Descriptive Statistics

• Statistics: procedures that summarize and analyze quantitative data• Descriptive statistics: statistical procedures that

summarize a set of numbers in terms of central tendency or variation

• Important for understanding what the data tells the researcher

Descriptive Statistics: A Caution

• Statistics can provide us with useful information, but they can be interpreted in different ways to say different things

Thought Question

If Jay scored an 85 instead of a 45, what changes?

Highly deviant scores (called "outliers") have no more effect on the median than those scores very close to the middle. However, outliers can greatly affect the mean.

Descriptive Statistics

• Frequency distributions (see Figure 6.2)• Normal - scores equally distributed around

middle• Positively skewed - large number of low scores

and a small number of high scores; mean being pulled to the positive

• Negatively skewed - large number of high scores and a small number of low scores; mean being pulled to the negative

Normal Distribution

An Extreme Example

• Consider the salaries of 10 people

• Group A – All are teachers.

Salaries: $45,000 $45,000 $45,000

$50,000 $50,000 $50,000

$50,000 $55,000 $55,000

$55,000

An Extreme Example

• Consider the salaries of 10 people• Group B – Nine are teachers; 1 is Donovan

McNabb.Salaries: $45,000 $45,000 $45,000

$50,000 $50,000 $50,000$50,000 $55,000 $55,000$6,300,000

An Extreme Example

• What happens to the mean and median in these 2 examples? Does it change?

• What happens to the normal distribution?

Positive Skew

Negative Skew

Case in Point: Teacher Salary

• Compare Radnor to Philadelphia• Is the salary distribution for Philadelphia

going to be positively or negatively skewed? (Hint: Look at the # years of experience)

Descriptive Statistics

• Variability• How different are the scores?• Types

• Range: the difference between the highest and lowest scores

• Standard deviation• The average distance of the scores from the mean• The relationship to the normal distribution

• ±1 SD = 68% of all scores in a distribution• ±2 SD = 95% of all scores in a distribution

Variability

Variability

• Why does variability matter?

Descriptive Statistics

• Relationship• How two sets of scores relate to one another

• Correlation (positive)• Low .10 - .39• Moderate .40 - .69• High > .70

Example of Correlation

Measures of Data Collection

• Tests

• Questionnaires

• Observations

• Interviews

Measures (Means of Data Collection)

You must match the instrument to the research question!

Questionnaires

http://www.authentichappiness.sas.upenn.edu/

• Thoughts on those you responded to• Approaches to Happiness• Optimism• Grit

Examples to critique

• Measures• Questionnaire – Psychological School

Membership Survey used with middle school students

• Interview protocol – for teachers & counselors regarding professional development issues

• Observation instrument – PDE 430 for student teachers

• What are 2 benefits and 2 limitations of this measure?

Questionnaires

• Used to obtain a subject’s perceptions, attitudes, beliefs, values, opinions, or other non-cognitive traits

• Scales - a continuum that describes subject’s responses to a statement • Likert• Checklists• Ranked items

Questionnaires

• Likert scales• Response options require the subject to

determine the extent to which they agree with a statement

• Debate over odd v. even number responses• Statements must reflect extreme positive or

extreme negative positions• Example – CATS evaluations

Questionnaires

• Checklists• Choose options

• Ranked items • Sequential order• Avoids marking everything high or low

Questionnaires

• Problems with measuring non-cognitive traits• Difficulty clearly defining what is being measured

• Self-concept or self-esteem

• Response set• Responding same way (Ex - all 4’s on CATS)

• Social desirability • “PC filter”

• Faking• Agreeing with statements because of the negative

consequences associated with disagreeing

Questionnaires

• Controlling problems• Equal numbers of positively and negatively

worded statements• Alternating positive and negative statements • Providing confidentiality or anonymity to

respondents

Designing Questionnaires

• Online resources• http://pareonline.net/getvn.asp?v=5&n=3• http://www.peecworks.org/PEEC/PEEC_Inst/I0

004E536• http://www.statpac.com/surveys/

Observations

• Observations - direct observations of behaviors• Provide first hand account (ameliorates issues

of self-reporting in questionnaires)• Natural or controlled settings

• Ex – classroom vs. lab (child attachment studies)

• Structured or unstructured observations• Ex – frequency counts vs. narrative record

• Detached or involved observers

Observations

• Inference• Low inference - involves little if any inference

on the observers part• On-task/Off-task behavior instrument

• High inference - involves high levels of inference on the observers part

• Teacher effectiveness – PDE form 430

Observations

• Controlling observer effects• Observer bias

• Training• Inter-rater reliability (Cronbach’s alpha)• Multiple observers

• Contamination - knowledge of the study influences the observation

• Training• Targeting specific behaviors• Observers do not know of the expected outcomes• Observers are “blind” to which group is which

Observations

• Observer effects• Halo effectHalo effect - initial ratings influence subsequent

ratings

• Hawthorne effectHawthorne effect - increased performance results from awareness of being part of study

• LeniencyLeniency - wanting everyone to do well

• Central TendencyCentral Tendency - measuring in the middle

• Observer DriftObserver Drift - failing to record pertinent information

Interviews

• What are some challenges to doing this kind of interviewing?

http://www.youtube.com/watch?v=d6bXH2k9MKE

Interviews

• Advantages• Establish rapport & enhance motivation• Clarify responses through additional

questioning• Capture the depth and richness of responses• Allow for flexibility• Reduce “no response” and/or “neutral”

responses

Interviews

• Disadvantages• Time consuming• Expensive• Small samples• Subjective – interviewer characteristics,

contamination, bias

Validity and Reliability

What’s all the fuss about?

Validity/Reliability and Trustworthiness

• Why do we need validity and reliability in quantitative studies and “trustworthiness” in qualitative studies?

We can’t trust the results if we can’t trust the

methods!

Reader’s Digest version…

• Reliability • The extent to which scores are free from error

• Error is measured by consistency

• Validity• The extent to which inferences are appropriate,

meaningful, and useful

• “Does the instrument measure what it is supposed to measure??”

Thought Question

• On the ACT and SAT assessments, there is a definitive script that test administrators are required to follow exactly. What measurement issue are the test makers addressing?

Reliability of Measurement

• Reliability - The extent to which measures are free from error

• Error is measured by consistency

Reliability of Measurement

• Reliability• Measurement

• 0.00 indicates no reliability or consistency• 1.00 indicates total reliability or consistency• < .60 = weak reliability• > .80 = sufficient reliability

Reliability of Measurement

• Types of reliability evidence• Stability (i.e. test-retest)

• Testing the same subject using the same test on two occasions

• Limitation - carryover effects from the first to second administration of the test

• Equivalence (i.e. parallel form)• Testing the same subject with two parallel (i.e. equal)

forms of the same test taken at the same time• Limitation - difficulty in creating parallel forms

Reliability of Measurement

• Equivalence and stability• Testing the same subject with two forms of

the same test taken at different times• Limitation - difficulty in creating parallel

forms

Reliability of Measurement

• Internal consistency• Testing the same subject with one test and

“artificially” splitting the test into two halves

• Limitations - must have a minimum of ten (10) questions

• Often see “Chronbach’s alpha” for reliability coefficient (ex – Learning styles)

Reliability of Measurement

• Agreement/ Inter-rater reliability• Observational measures• Multiple observers coding similarly

Reliability of Measurement

• Enhancing reliability• Standardized administration procedures

(e.g. directions, conditions, etc.)• Appropriate reading level• Reasonable length of the testing period• Counterbalancing the order of testing if

several tests are being given

Validity of Measurement

• Validity: the extent to which inferences are appropriate, meaningful, and useful

• Current example – content tests and teacher licensure

Validity of Measurement

• For research results to have any value, validity of the measurement of a variable must exist• Use of established and “new”

instruments and the implications for establishing validity

• Importance of establishing validity prior to data collection (e.g. pilot tests)

Validity

• Content

• Predictive (criterion-related)

• Concurrent

• Construct

Thought Question

• Criticisms of standardized tests like the SAT claim that they discriminate against particular groups of students (especially minorities) and do not represent a broad enough domain of knowledge to adequately assess a student’s academic potential. What issue of validity is operating in these arguments?

Thought Question

• Other arguments against the SAT state that the tests do not adequately estimate an individual’s ability to succeed in college. What issue of validity is operating here?

Reliability & Validity of Measurement

• What is the relationship of reliability to validity?• If a watch consistently gives the time at 1:10

when actually it is 1:00, it is ____ but not ____.

• ______ is necessary but not sufficient condition for _______.

• To be _____ , an instrument must be ______, but a ____ instrument is not necessarily _____.

Reliability & Validity of Measurement

• What is the relationship of reliability to validity?• If a watch consistently gives the time at 1:10

when actually it is 1:00, it is reliable but not valid.

• Reliability is necessary but not sufficient condition for validity

• To be valid, an instrument must be reliable, but a reliable instrument is not necessarily valid.

Midterm

• Multiple Choice: 50 pts

• Short Answer: 25 pts

• Article Critique: 25 pts

Bring article with you to class. It’s ok to have notes on it.

Recommended