Upload
roger-mills
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
Psych 230
Psychological Measurement and Statistics
Pedro Wolf
September 2, 2009
Previously on “let’s learn statistics in five weeks”
• the logic of research– samples, populations, and variables
• descriptive and inferential statistics– statistics and parameters
• understanding experiments– experimental and correlational studies– independent and dependent variables
• characteristics of scores– nominal, ordinal, interval, and ratio scales– continuous and discrete
Which Scale?
Does the variable have an intrinsic value?
Does the variable have equal values between scores?
Does the variable have a real zero point?
Nominal
YES
Ordinal
NO
YESNO
YESNO
Interval Ratio
Continuous
• A continuous scale allows for fractional amounts – it ‘continues’ between the whole-number amount – decimals make sense
• Examples:– Height– Weight– IQ
Discrete
• In a discrete scale, only whole-number amounts can be measured– decimals do not make sense– usually, nominal and ordinal scales are discrete– some interval and ratio variables are also discrete
• number of children in a family
• Special type of discrete variable: dichotomous– only two amounts or categories– pass/fail; living/dead; male/female
Today….
• Why graphical representations of data?• Stem and leaf plots.• Box plots.• Frequency
– what is it– how a frequency distribution is created
• Graphing frequency distributions– bar graphs, histograms, polygons
• Types of distribution– normal, skewed, bimodal
• Relative frequency and the normal curve– percentiles, area under the normal curve
“… look at the data” (Robert Bolles, 1998)
• Raw data is often messy, overwhelming, and un-interpretable.
• Many data sets can have thousands of measurements and hundreds of variables.
• Graphical representations of data can make data interpretable
• Looking at the data can inspire ideas.
What in the world could these data mean?Imagine over 30,000 observations
Time Lat Long930485:23:06.8600001 32.20497 -111.028930497:04:34.77 32.20482 -111.028930497:04:59.7599998 32.20487 -111.028930497:05:46.7600002 32.20485 -111.029930497:06:05.7600002 32.20578 -111.029930497:06:16.7600002 32.20678 -111.029930497:06:28.7599998 32.20698 -111.028930497:09:31.77 32.20687 -110.999930497:09:58.77 32.2055 -110.993930497:10:07.77 32.20555 -110.992930497:10:37.77 32.20687 -110.986930497:11:38.77 32.20672 -110.979
After plotting those data•By plotting the data and superimposing it on map data, suddenly the previousslide’s data can tell a story
•Of course not all data can tell such a story
• People have developed various ways to visualize their data graphically
Stem and Leaf Plots
5 | 4 6 7 9 9 5
6 | 3 4 6 8 8 5 7 | 2 2 5 6 4 8 | 1 4 8 3 9 | 010 | 6 1N = 18
•data - 54, 56, 57, 59, 59, 63, 64, 66, 68, 72 …
•preserves the data in tact. is a way to see the distribution
•numbers on the left of the line are called the stems and represent the leading edge ofeach of the numbers
•numbers on the right of the line are called the leaves and represent the individual numbers
• indicate their value by completing the stem.
Box Plots•Each of the lines in a box plot represents either quartiles or the range of the data.
•In this particular plot the dots represent outliers.
Frequency distributions - why?
• Standard method for graphing data– easy way of visualizing group data
• Introduction to the Normal Distribution– underlies all of the statistical tests we will be studying
this semester– understanding the concepts behind statistical testing will
make life a lot easier later on
Frequency
Frequency - some definitions
• Raw scores are the scores we initially measure in a study
• The number of times a score occurs in a set is the score’s frequency
• A distribution is the general name for any organized set of data
• A frequency distribution organizes the scores based on each score’s frequency
• N is the total number of scores in the data
Understanding Frequency Distributions
• A frequency distribution table shows the number of times each score occurs in a set of data
• The symbol for a score’s frequency is simply f
• N = ∑f
Raw Scores
• The following is a data set of raw scores. We will use these raw scores to construct a frequency distribution table.
14 14 13 15 11 15
13 10 12 13 14 13
14 15 17 14 14 15
Frequency Distribution Table
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
Value Frequency7 1
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
Value Frequency7 1 6 1
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
Value Frequency7 1 6 15 3
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
Value Frequency7 1 6 15 3 4 2
Frequency Distribution Table - Example
• Make a frequency distribution table for the following scores:
5, 7, 4, 5, 6, 5, 4
X f7 1 6 15 3 4 2
Learning more about our data
• What are the values for N and ∑X for the scores below?
14 14 13 15 11 15
13 10 12 13 14 13
14 15 17 14 14 15
Results via Frequency Distribution Table
What is N?
N = ∑f
Results via Frequency Distribution Table
What is ∑X?
Results via Frequency Distribution Table
What is ∑X?
(17 * 1) = 17
(16 * 0) = 0
(15 * 4) = 60
(14 * 6) = 84
(13 * 4) = 52
(12 * 1) = 12
(11 * 1) = 11
(10 * 1) = 10
__________
Total = 246
Graphing Frequency Distributions
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis
• Why?– Because it’s not easy to make sense of this:
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis
• Why?– Because it’s not easy to make sense of this:
• On a scale of 0-10, how excited are you about this class: __________
0=absolutely dreading it 10=extremely excited/highlight of my semester
• Data (raw scores)
5 7 2 3 5 5 5 8 7 7 4 5 10 7 5 4 5 5 7 3 6 2 6 3 5 5 7 2 4 6 3 7 5 5 7 3 5 6 5 5 8 6 7 5 3 5 7 2 3 5 4 5 4 8 3 6 5 5 5 1 2 4 7 5 5 4 3 3 7 5 8 6 3 5 10 0 6 6 3 8 5 4 3 2 4 6 3 7 5 5 7 5 7 5 10 7 5 4 5 5 7 6 3 8 1 5 5 6 4 9 8 5 8 5 7 5 10 7 5 4 5 5 7 4 8 4 5 8 5 5 7 5 5 5 2 4 6 3 7 5 2 4 6 3 7 5 8 6 3 5 10 0 6 7 2 8 8 5 5 8 6 3 6 2 6 3 5 5 7 2 5 10 7 5 4 5 5 7 5 7 5 10 7 5 4 5 5 5 7 2 3 3 7 5 8 6 3 5 10 0 6
Graphing Frequency Distributions
X f10 4 9 78 357 40
633
5 434 113 112 31 60 4
Excited about course (0=no,10=yes)
1110987654321
Frequency
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10
Graphing Frequency Distributions
• A frequency distribution graph shows the scores on the X axis and their frequency on the Y axis
• The type of measurement scale (nominal, ordinal, interval, or ratio) determines whether we use:– a bar graph– a histogram– a frequency polygon
Graphs - bar graph
• A frequency bar graph is used for nominal and ordinal data
Graphs - bar graph
• A frequency bar graph is used for nominal and ordinal data
Values on the x-axis
Graphs - bar graph
• A frequency bar graph is used for nominal and ordinal data
Frequencies on the y-axis
Graphs - bar graph
• A frequency bar graph is used for nominal and ordinal data
In a bar graph, bars do not touch
Graphs - histogram
• A histogram is used for a small range of different interval or ratio scores
Graphs - histogram
• A histogram is used for a small range of different interval or ratio scores
Values on the x-axis
Graphs - histogram
• A histogram is used for a small range of different interval or ratio scores
Frequencies on the y-axis
Graphs - histogram
• A histogram is used for a small range of different interval or ratio scores
In a histogram, adjacent bars touch
Graphs - frequency polygon
• A frequency polygon is used for a large range of different scores
Graphs - frequency polygon
• A frequency polygon is used for a large range of different scores
In a freq. polygon, there are many
scores on the x-axis
Constructing a Frequency Distribution
• Step 1: make a frequency table• Step 2: put values along x-axis (bottom of page)• Step 3: put a scale of frequencies along y-axis (left
edge of page)• Step 4 (bar graphs and histograms)
– make a bar for each value
• Step 4 (frequency polygons)– mark a point above each value with a height for the
frequency of that value– connect the points with lines
Graphing - example
• A researcher observes driving behavior on a road, noting the gender of drivers, type of vehicle driven, and the speed at which they are traveling. Which type of graph should be used for each variable?
• Gender?• nominal: bar graph • Vehicle Type?• nominal: bar graph
• Speed?• ratio: frequency polygon
Use and Misuse of Graphs -2
0
100
200
300
400
500
600
2000 2001 2002 2003
Year
Nu
mb
er
of
Felo
nie
s
Use and Misuse of Graphs
• Which graph is correct?
• Neither does a very good job at summarizing the data
• Beware of graphing tricks
0
100
200
300
400
500
600
2000 2001 2002 2003
Year
Nu
mb
er
of
Felo
nie
s
210
215
220
225
230
235
240
2000 2001 2002 2003
Year
Nu
mb
er
of
Felo
nie
s
Types of Distributions
Distributions
• Frequency tables, bar-graphs, histograms and frequency polygons describe frequency distributions
Distributions - Why?
• Describing the shape of this frequency distribution is important for both descriptive and inferential statistics
• The benefit of descriptive statistics is being able to understand a set of data without examining every score
Distributions : The Normal Curve
• It turns out that many, many variables have a distribution that looks the same. This has been called the ‘normal distribution’.
• A bell-shaped curve
• Symmetrical
• Extreme scores have a low frequency
– extreme scores: scores that are relatively far above or far below the middle score
The Ideal Normal Curve
The Ideal Normal Curve
Symmetrical
The Ideal Normal Curve
Most scores in middle range
The Ideal Normal Curve
Few extreme scores
The Ideal Normal Curve
In theory, tails never reach the x-axis
Normal Curve - height
Height (inches)
80.077.575.072.570.067.565.062.560.057.555.052.5
How tall are you (in inches)?Fr
equ
ency
40
30
20
10
0
Normal Curve - hours slept
Hours of Sleep last night
13121110987654321
Frequency
60
50
40
30
20
10
0
0 1 2 3 4 5 6 7 8 9 10 11 12
Normal Curve - GPA
GPA
4.504.254.003.753.503.253.002.752.502.252.001.75
Frequency
50
40
30
20
10
0
Normal Distributions
• While the scores in the population may approximate a normal distribution, it is not necessarily so for a sample of scores
Height (inches)
72.571.570.569.568.567.566.565.564.563.562.561.5
How tall are you (in inches)? (N=10)
Freq
uen
cy
3.0
2.0
1.0
0.0
Skewed Distributions
• A skewed distribution is not symmetrical. It has only one pronounced tail
• A distribution may be either negatively skewed or positively skewed
• Negative or positive depends on whether the tail slopes towards or away from zero
– the side with the longer tail describes the distribution• Tail on negative side : negatively skewed
• Tail on positive side : positively skewed
Negatively Skewed Distributions
Negatively Skewed Distributions
Tail on negative side:Negatively skewed
Negatively Skewed Distributions
Contains extremelow scores
Negatively Skewed Distributions
Does not contain extreme high scores
Negatively Skewed Distributions
Can occur due to a “ceiling effect”
Positively Skewed Distributions
Positively Skewed Distributions
Tail on positive side:Positively skewed
Positively Skewed Distributions
Contains extremehigh scores
Positively Skewed Distributions
Does not contain extreme low scores
Positively Skewed Distributions
Can occur due to a “floor effect”
Positively Skewed Distributions
Rank in Family
654321
Frequency
100
80
60
40
20
0
Bimodal Distributions
• a symmetrical distribution containing two distinct humps
Bimodal - birth month
What month were you born?
Month Born
DecNovOctSepAugJulJunMayAprMarFebJan
Freq
uen
cy25
20
15
10
5
0
Distributions - data
• How many alcoholic drinks do you have per week?
Distributions - data
• How many alcoholic drinks do you have per week?
Alcoholic drinks per week
24.522.520.518.516.514.512.510.58.56.54.52.5.5
Frequency
100
80
60
40
20
0
Distributions - data
• How many alcoholic drinks do you have per week?
• Positively skewed
Alcoholic drinks per week
24.522.520.518.516.514.512.510.58.56.54.52.5.5
Frequency
100
80
60
40
20
0
Distributions - data
• How much did you spend on textbooks for this semester?
Distributions - data
• How much did you spend on textbooks for this semester?
Spent on Textbooks ($)
900850
800750
700650
600550
500450
400350
300250
200150
10050
Frequency
60
50
40
30
20
10
0
Distributions - data
• How much did you spend on textbooks for this semester?
• Normal – one outlier
Spent on Textbooks ($)
900850
800750
700650
600550
500450
400350
300250
200150
10050
Frequency
60
50
40
30
20
10
0
Kurtosis
• meso- Forming chiefly scientific terms with the sense ‘middle, intermediate’
• lepto- Small, fine, thin, delicate• platy- Forming nouns and adjectives, particularly in biology and
anatomy, with the sense ‘broad, flat’
Relative Frequency and the Normal Curve
Relative Frequency
• Another way to organize scores is by relative frequency
• Relative frequency is the proportion of time that a particular score occurs– remember: a proportion is a number between 0 and 1
• Simple frequency: the number of times a score occurs
• Relative frequency: the proportion of times a score occurs
Relative Frequency - Why?
• We are still asking how often certain scores occurred• Sometimes, relative frequency is easier to interpret
than simple frequency
• Example: • 82 people in the class reported drinking no alcohol weekly
– Simple frequency
• 0.42 of the class (42%) reported drinking no alcohol– Relative frequency
Relative Frequency
• The formula for a score’s relative frequency is:
relative frequency =
f
N
Relative Frequency Distribution
Example
• Using the following data set, find the relative frequency of the score 12
14 14 13 15 11 15
13 10 12 13 14 13
14 15 17 14 14 15
Example
• The frequency table for this set of data is:
14 14 13 15 11 15
13 10 12 13 14 13
14 15 17 14 14 15
Example
• The frequency for the score of 12 is 1, and N = 18
• Therefore, the relative frequency of 12 is:
Example
• The frequency for the score of 12 is 1, and N = 18
• Therefore, the relative frequency of 12 is:
06.018
1
N
ffrequencyrelative
Relative Frequencies
• We can also add relative frequencies together. – For example , what proportion of people scored a passing mark in this exam
(>3):Value Frequency Relative Frequency
6 5 5/18 = 0.285 6 6/18 = 0.334 3 3/18 = 0.173 2 2/18 = 0.112 1 1/18 = 0.061 1 1/18 = 0.06
N=18 Total=1.00
Relative Frequencies
• We can also add relative frequencies together. – For example , what proportion of people scored a passing mark in this exam
(>3): 0.28+0.33+0.17=0.78Value Frequency Relative Frequency
6 5 5/18 = 0.285 6 6/18 = 0.334 3 3/18 = 0.173 2 2/18 = 0.112 1 1/18 = 0.061 1 1/18 = 0.06
N=18 Total=1.00
Relative Frequency and the Normal Curve
• When the data are normally distributed (as most data are), we can use the normal curve directly to determine relative frequency.
• There is a known proportion of scores above or below any point• For example, exactly 0.50 of the scores lie above the mean
Relative Frequency and the Normal Curve
• The proportion of the total area under the normal curve at certain scores corresponds to the relative frequency of those scores.
Relative Frequency and the Normal Curve
• Normal distribution showing the area under the curve to the left of selected scores
Percentiles
• A percentile is the percent of all scores in the data that are at or below a score– Example: 98th percentile - 98% of the scores lie below this.
Homework
• Complete exercises 1, 6, and 9 for chapter 3.• Read chapter 4 and 5 for next week.