Upload
mazhar-poohlah
View
99
Download
7
Embed Size (px)
Citation preview
CH. 7
Scaling is a procedure for the assignment of numbers (or other symbols) to a property of objects in order to import some of the characteristics of numbers to properties in question
Scaling
Nominal Scales
Ordinal Scales
Interval Scales
Ratio Scales
Four Scales of Measurement;
4
Measurement and Scaling
A scale is a mechanism by which individuals are
distinguished as to how they differ from one another on the
variables of interest.
A scale is a continuous series of categories and has been
defined as any series of items that are arranged
progressively according to value or magnitude, into which
an item can be placed according to its quantification
Four popular scales in business research are:
1. Nominal scales
2. Ordinal scales
3. Interval scales
4. Ratio scales
SCALES
Nominal Scales: splits data into groups, e.g., men,
women
Ordinal Scales: ranks data in some order, e.g.,
exercising for 20 minutes is good, for 30 minutes is
better, for 40 minutes is best
Interval Scales: sets data on a continuum, e.g.
1 2 3 4 5very low very high
Ratio Scales: starts with absolute zero and indicates
proportion, e.g.
0 5 10 ten is twice as big as five
6
A Nominal Scale is the simplest of the four
scale types and in which the numbers or letters
assigned to objects serve as labels for
identification or classification
Example: variable of gender
Males = 1, Females = 2
Sales Zone A = Islamabad, Sales Zone B = Rawalpindi
Drink A = Pepsi Cola, Drink B = 7-Up, Drink C = Miranda
Measurement and Scaling
7
An Ordinal Scale is one that arranges objects or
alternatives according to their magnitude
Examples:
Career Opportunities = Moderate, Good, Excellent
Investment Climate = Bad, inadequate, fair, good, very
good
Merit = A grade, B grade, C grade, D grade
A problem with ordinal scales is that the difference
between categories on the scale is hard to quantify, i.e..,
excellent is better than good but how much is excellent
better?
Measurement and Scaling
8
An Interval Scale allows us to perform certain arithmetical
operations on the data collected from respondents. This scale
measure the distance between any two points on the scale
It taps the differences and the magnitudes of the differences in
the variable----Example:
Measurement and Scaling
9
A Ratio Scale is a scale that possesses absolute rather than relative qualities and has an absolute zero point.
Examples: Money
Weight
Distance
Temperature on the Kelvin Scale
Interval scales allow comparisons of the differences of magnitude (e.g. of attitudes) as well as determinations of the actual strength of the magnitude
Measurement and Scaling
10
Type of ScaleNumerical
Operation
Descriptive
Statistics
Nominal Counting
Frequency in each
category, percentage in
each category, mode
Ordinal Rank OrderingMedian, range,
percentile ranking
Interval
Arithmetic Operations on
Intervals between
numbers
Mean, standard
deviation, variance
RatioArithmetic Operations on
actual quantities
Geometric mean,
coefficient of variation
Measurement and Scaling
Nominal Numbers
Assigned
to Runners
OrdinalRank Order of
Winners
Interval Performance
Rating on a
0 to 10 Scale
Ratio Time to
Finish, in
Seconds
7 38
Third
place
Second
place
First
place
Finish
Finish
8.2 9.1 9.6
15.2 14.1 13.4
Four Scales of Measurement;
Classification of Scaling Techniques;
Scales
Nominal Ordinal
Fixed sum
Graphic rating
Interval Ratio
Classification of Scaling Techniques;
Scales
Nominal Ordinal Interval
Likert
Semantic differential
Numerical
Itemized rating
Staple
Ratio
Classification of Scaling Techniques;
Scales
Nominal
Dichotomous
Category
Ordinal
Fixed sum
Graphic rating
Interval
Likert
Semantic differential
Numerical
Itemized rating
Staple
Ratio
Nominal scales focus on only requiring arespondent to provide some type ofdescriptor as the raw response
Example.
Please indicate your current martial status.
__Married __ Single __ Single, never married __ Widowed
Four Scales of Measurement;
Ordinal scales allow the respondent to express “relative magnitude” between the raw responses to a question
Example.
Which one statement best describes your opinion of an Intel PC processor?
__ Higher than AMD’s PC processor
__ About the same as AMD’s PC processor
__ Lower than AMD’s PC processor
Four Scales of Measurement;
Interval scales demonstrate the absolute
differences between each scale point
Example.
How likely are you to recommend the new phone to a friend?
Definitely will not Definitely will
1 2 3 4 5 6 7
Four Scales of Measurement;
Ratio scales allow for the identification of absolute differences between each scale point, and absolute comparisons between raw responses
Example 1.
Please circle the number of children under 18 years of age
currently living in your household.
0 1 2 3 4 5 6 7 (if more than 7, please specify ___.)
Four Scales of Measurement;
Chapter 7
MEASUREMENT:
SCALING, RELIABILITY,
VALIDITY
Rating scales
Have several response categories and are used to obtain responses with regard to the object, event, or person studied.
Ranking scales
Make comparisons between or among objects, events, persons and extract the preferred choices and ranking among them.
Methods of Scaling;
Measurement scales that allow a respondent to register the degree (or amount) of a characteristic or attribute possessed by an object directly on the scale.
Rating Scales;
1. Dichotomous scale
2. Category scale
3. Likert scale
4. Numerical scales
5. Semantic differential scale
6. Itemized rating scale
7. Constant sum scale
8. Stapel scale
9. Graphic scale
10. Consensus scale
Types of rating scales Formats:
Dichotomous scale
Is used to obtain a Yes or No answer.
Nominal scale
Do you own a car?
Yes
No
Rating Scales Formats;
Category scale
Uses multiple items to elicit a single response.
Nominal scale
Rating Scales Formats;
A Category rating scale which the response options provided for a closed-ended question are labeled with specific verbal descriptions.
Example:
Please rate car model A on each of the following dimensions:
Poor Fair Good V. good Excellent
a) Durability [ ] [ ] [ ] [ ] [ ]
b) Fuel consumption [ ] [ ] [ ] [ ] [ ]
Rating Scales Formats;
A simple category scale with only two response categories
(or scale points) both of which are labeled.
Example:
Please rate brand A on each of the following dimensions:
poor excellent
a) Durability [ ] [ ]
b) Fuel consumption [ ] [ ]
Rating Scales Formats;
Likert scale
Is designed to examine how strongly subjects
agree or disagree with statements on a
5-point scale.
Interval scale
Rating Scales Formats;
The Likert Scale (Summated Ratings Scale)
A multiple item rating scale in which the degree of an attribute
possessed by an object is determined by asking respondents to
agree or disagree with a series of positive and/or negative
statements describing the object.
Example:
Totally
disagree Disagree Neutral Agree
Totally
agree
a) Shopping takes much longer on the Internet [ ] [ ] [ ] [ ] [ ]
b) It is a good thing that Saudi consumers have
the opportunity to buy products through the [ ] [ ] [ ] [ ] [ ]
c) Buying products over the Internet is not a
sensible thing to do [ ] [ ] [ ] [ ] [ ]
Attitude toward buying from the Internet
Rating Scales Formats;
Likert scaleMy work is very interesting
Strongly disagree
Disagree
Neither agree nor disagree
Agree
Strongly agree
Rating Scales Formats;
Semantic differential scale
Several bipolar attributes are identified at the
extremes of the scale, and respondents are asked to
indicate their attitudes.
Interval scale
Rating Scales Formats;
A Semantic Differential rating scale in which bipolar adjectives
are placed at both ends (or poles) of the scale, and response
options are expressed as “semantic” space.
Example:
Please rate car model A on each of the following dimensions:
Durable ---:-X-:---:---:---:---:--- Not durable
Low fuel consumption ---:---:---:---:---:-X-:--- High fuel consumption
Rating Scales Formats;
Numerical scale
Similar to the semantic differential scale, with the difference
that numbers on a 5-point or 7-point scale are provided, with
bipolar adjectives at both ends.
Interval scale
Poor Excellent
Durability 1 2 3 4 5 6 7
Durable Not Durable
Durability 1 2 3 4 5 6 7
Rating Scales Formats;
Itemized rating scale
A 5-point or 7-point scale with anchors, as needed, is
provided for each item and the respondent states the
appropriate number on the side of each item, or circles the
relevant number against each item.
Interval scale
I will be changing my job within the next 12 months
1 2 3 4 5
Very Unlikely Unlikely Neither Unlikely Likely Very Likely
Nor Likely
Rating Scales Formats;
Fixed or constant sum scale The respondents are here asked to distribute a given number
of points across various items.
Ordinal scale
Rating Scales Formats;
A Constant-Sum rating scale in which respondents divide a
constant sum among different attributes of an object (usually to
indicate the relative importance of each attribute).
Assumed to have ratio level properties.
Example: Divide 100 points among the following dimensions to
indicate their level of importance to you when you purchase a
car:
Durability
Fuel Consumption
Total 100
Rating Scales Formats;
Stapel scale
This scale simultaneously measure both the direction and
intensity of the attitude toward the items under study.
A simplified version of the semantic differential scale in which
a single adjective or descriptive phrase is used instead of
bipolar adjectives.
Interval data
Model A
-3 -2 -1 Durable Car 1 2 3
-3 -2 -1 Good Fuel Conaumption 1 2 3
Rating Scales Formats;
The Stapel scale is a unipolar rating scale with ten categories
numbered from -5 to +5, without a neutral point (zero). This scale
is usually presented vertically.
SEARS
+5 +5
+4 +4
+3 +3
+2 +2X
+1 +1
HIGH QUALITY POOR SERVICE
-1 -1
-2 -2
-3 -3
-4X -4
-5 -5
The data obtained by using a Stapel scale can be analyzed in the
same way as semantic differential data.
Rating Scales Formats;
Graphic rating scale
A graphical representation helps the respondents to indicate
on this scale their answers to particular question by placing a
mark at the appropriate point on the line.
Rating scales in which respondents rate an object on a
graphic continuum, usually a straight line.
Modified versions are the ladder scale and happy face scale.
Ordinal scale
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Graphic Rating Scales
Rating Scales Formats;
Paired Comparison
Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -C 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Paired Comparison
Used when, among a small number of objects, respondents are
asked to choose between two objects at a time.
Example; Choose any combination
Package -A 512 kbps 8 GB Rs: 750
Package -B 1 Mbps 8 GB Rs: 850
Package -C 512 Kbps 12 GB Rs: 900
Package -D 1 Mbps 12 GB Rs: 1000
Rating Scales Formats;
Ranking Scales Formats;
Forced Choice
Enable respondents to rank objects relative to one another,
among the alternatives provided.
Ranking Scales Formats;
Forced Choice
Ranking Scales Formats;
Comparative Scale Provides a benchmark or a point of reference to assess
attitudes toward the current object, event, or situation under
study.
Ranking Scales Formats;
Comparative Scale
Ranking Scales Formats;
Hard to attach a verbal
explanation to response
Visual impact, easy for
poor readers
Choose a visual picture 8. Graphic scale-picture
response
No standard answers Visual impact, unlimited
scale points
Choose a point on a
continuum
7. Graphic scale
Endpoints are
numerical, not verbal.
Easier to construct than
semantic differential
Choose point on scale
with 1 center adjective
6. Stapel scale
Difficult for respondents
with low education
levels
Scale approximates an
interval measure
Divide a construct sum
among response
alternatives
5. Constant sum scale
Bipolar adjectives must
be found, data may be
ordinal, not interval
Easy to construct, norms
exist for comparison, e.g.
profile analysis
Choose points between
bipolar adjectives on
relative dimensions
4. Semantic differential
and numerical scales
Hard to judge what a
single score means
Easiest scale to
construct
Evaluate statements on
a 5-point scale
3. Likert scale
Ambiguous items, few
categories, only gross
distinction.
Flexible, easy to respond Indicate a response
category
2.Category scale
Disadvantages Advantages Subject must:Rating Scale
Characteristics Different Types of Rating Scales
Goodness of Measures
Goodness of Measures;
Understanding Validity and Reliability
Situation 2 Situation 3Situation 1
Neither Reliable
nor Valid
Highly Reliable
nor Not Valid
Highly Reliable
and Valid
Illustrations of Possible Reliability and Validity Situations in
MeasurementFigure 8.1
Reliability (accuracy in
measurement)
Validity (are we
measuring the right thing?)
Goodness of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity (content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related validity
Predictive Concurrent
Testing Goodness of Measures: Forms of Reliability and Validity.
Goodness of Measures It is important to make sure that the instrument that we develop to
measure a particular concept is indeed accurately measuring the
variable, and that in fact, we are actually measuring the concept
that we set out to measure.
This ensures that in operationally defining perceptual and
attitudinal variables, we have not overlooked some important
dimensions and elements or included some irrelevant ones.
Goodness of Measures;
Item Analysis Item analysis is done to see if the items in the instrument belong
there or not.
Each item is examined for its ability to discriminate between those
subjects whose total scores are high, and those will low scores.
In item analysis, the means between the high-score group and the
low-score group are tested to detect significant differences
through the t-values.
The items with a high t-value (test which is able to identify the
highly discriminating items in the instrument) are then included in
the instrument.
Goodness of Measures;
Reliability
The reliability of a measure indicates the extent to which it
is without bias (error free) and hence ensures consistent
measurement across time and across the various items in
the instrument.
In other words, the reliability of a measure is an indication
of the stability and consistency with which the instrument
measures the concept and helps to assess the “goodness”
of a measure.
Goodness of Measures;
Stability of Measures The ability of a measure to remain the same over time —despite
uncontrollable testing conditions or the state of the respondents
themselves—is indicative of its stability and low vulnerability to
changes in the situation.
This attests to its “goodness” because the concept is stably
measured, no matter when it is done. Two tests of stability are
test-retest reliability and parallel-form reliability.
Goodness of Measures;
Reliability (accuracy in
measurement)
Validity (are we
measuring the right thing?)
Goodness of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity (content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related validity
Predictive Concurrent
Testing Goodness of Measures: Forms of Reliability and Validity.
Test-Retest Reliability The reliability coefficient obtained with a repetition of the same
measure on a second occasion is called test-retest reliability.
That is, when a questionnaire is administered to a set of
respondents now, and again to the same respondents, says
several weeks to 6 months later, then the correlation between
the scores obtained at the two different times from one and the
same set of respondents is called the test-retest coefficient.
The higher it is, the better the test-retest reliability, and
consequently, the stability of the measure across time.
Goodness of Measures;
Parallel-Form Reliability
When responses on two comparable sets of measures tapping
the same construct are highly correlated, we have parallel-form
reliability.
Both forms have similar items and the same response format, the
only changes being the wordings and the order or sequence of
the questions.
What we try to establish here is the error variability resulting
from wording and ordering of the questions.
If two such comparable forms are highly correlated the measures
are reasonably reliable.
Goodness of Measures;
Inter item Consistency Reliability
This is a test of the consistency of respondents’ answers to all
the items in a measure.
To the degree that items are independent measures of the same
concept, they will be correlated with one another.
The most popular test of inter item consistency reliability is the
Cronbach’s coefficient alpha (Cronbach’s alpha; Cronbach,
1946), which is used for multipoint-scaled items, and the Kuder-
Richardson formulas (Kuder & Richardson, 1937), used for
dichotomous items.
The higher the coefficients, the better the measuring instrument.
Goodness of Measures;
Split-Half Reliability
Split-half reliability reflects the correlations between two halves
of an instrument.
The estimates would vary depending on how the items in the
measure are split into two halves.
Split-half reliabilities could be higher than Cronbach’s alpha only
in the circumstance of there being more than one underlying
response dimension tapped by the measure and when certain
other conditions are met as well.
Hence, in almost all cases, Cronbach’s alpha can be considered
a perfectly adequate index of the interitem consistency reliability.
Goodness of Measures;
Understanding Validity and Reliability
5. ValiditySeveral types of validity tests are used to test the goodness of measures and
writers use different terms to denote them. For the sake of clarity, we may
group validity tests under three broad headings: content validity,
criterion-related validity, and construct validity.
5.1 Content ValidityContent validity ensures that the measure includes an adequate and
representative set of items that tap the concept. The more the scale items
represent the domain or universe of the concept being measured, the greater
the content validity. To put it differently, content validity is a function of how
well the dimensions and elements of a concept have been delineated.
Face validity is considered by some as a basic and a very minimum index of
content validity. Face validity indicates that the items that are intended to
measure a concept, do on the face of it look like they measure the concept.
Goodness of Measures;
Criterion-Related Validity
Criterion-related validity is established when the measure differentiates
individuals on a criterion it is expected to predict. This can be done by
establishing con-current validity or predictive validity, as explained below.
Concurrent validity is established when the scale discriminates individuals
who are known to be different; that is, they should score differently on the
instrument as in the example that follows.
Goodness of Measures;
5.3 Construct ValidityConstruct validity testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed. This is assessed
through convergent and discriminant validity, which are explained below.
Convergent validity is established when the scores obtained with two different
instruments measuring the same concept are highly correlated.
Discriminant Validity is established when, based on theory, two variables are
predicted to be uncorrelated, and the scores obtained by measuring them are
indeed empirically found to be so.
Goodness of Measures;
Thanks
Chapter 9:
Measurement: Scaling, Reliability, Validity
Table 9.1 Types of Validity
Validity Description
Content validity Does the measure adequately measure the concept?
Face validity Do “experts” validate that the instrument measures what its
name suggests it measure?
Criterion-related validity Does the measure differentiate in a manner that helps to
predict a criterion variable?
Concurrent validity Does the measure differentiate in a manner that helps to
predict a criterion variable currently?
Predictive validity Does the measure differentiate individuals in a manner as to
help predict a future criterion?
Construct validity Does the instrument tap the concept as theorized?
Convergent validity Does the measure have low correlation with a variable
That is supposed to be unrelated to this variable?
Reliability Indicates the extent to which it is without bias (error
free) and hence ensures consistent measurement across time and across the various items in the instrument.
Goodness of Measures
Stability of measures:
Test-retest reliability
Parallel-form reliability
Correlation
Internal consistency of measures:
Interitem consistency reliability
Cronbach’s alpha
Split-half reliability
Correlation
Goodness of Measures-Reliability
Validity
Ensures the ability of a scale to measure the intended concept.
Content validity
Criterion related validity
Construct validity
Goodness of Measures-Validity
Content validity Ensures that the measure includes an adequate and
representative set of items that tap the concept.
A panel of judges
Goodness of Measures-Validity
Criterion related validity
Is established when the measure differentiates individuals on a
criterion it is expected to predict
Concurrent validity: established when the scale differentiates
individuals who are known to be different
Predictive validity: indicates the ability of measuring
instrument to differentiate among individuals with reference
to future criterion
Correlation
Goodness of Measures-Validity
Construct validity
Testifies to how well the results obtained from the use of the
measure fit the theories around which the test is designed.
Convergent validity: established when the scores obtained
with two different instrument measuring the same concept are
highly correlated
Discriminant validity: established when, based on theory, two
variables are predicted to be uncorrelated, and the scores
obtained by measuring them are indeed empirically found to
be so
Correlation, factor analysis, convergent-discriminant
techniques, multitrait-multimethod analysis
Goodness of Measures-Validity
Situation 2 Situation 3Situation 1
Neither Reliable
nor Valid
Highly Reliable
nor Not Valid
Highly Reliable
and Valid
Illustrations of Possible Reliability and Validity Situations in
MeasurementFigure 8.1
Reliability (accuracy in
measurement)
Validity (are we
measuring the right thing?)
Goodness of data
Test-retest reliability
Parallel-form reliability
Interitem consistency reliability
Split-half reliability
Stability
Consistency
Face validity
Logical validity (content)
Congruent validity
(construct)
Convergent Discriminant
Criterion-related validity
Predictive Concurrent
Diagram 9.1
Testing Goodness of Measures: Forms of Reliability and Validity.