Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

•Distributions & Descriptive statistics

•Dr William Simpson•Psychology, University of Plymouth

Defining and measuring variables

Independent & dependent variables

• Independent variable: something we manipulate in an experiment

• Dependent variable: something we measure • By manipulating the IV, we expect to produce a

change in the DV

Scales of measurement

• variables classified according to type of scale–type of analysis depends on type of

• Worst to best: Nominal, ordinal, interval, ratio

Nominal

•Nominal data: assign categorical labels to observations•Not really measurement•E.g. male/female; married/single/widowed/divorced•Numbers on football jerseys

Ordinal

•Ordinal data: values can be ranked (ordered). Categorical but rankable•E.g. small, medium, large; movie rating 1-5; Likert scale•Can only be ranked. Rating scale is not like cm. The diff between & is not nec the same as between &

• Adding a response of "strongly agree" (5) to two responses of "disagree" (2) would give us a mean of 4, but what is the meaning of that number?

Interval

•Interval data: ordinary measurement, e.g. temperature•Unlike ordinal data, we can say the diff between 1 & 2 deg C is same as diff between 4 & 5 deg

•Ordinary measurements, but with an absolute, non-arbitrary zero point•E.g. weight, length: any scale must start at zero•deg C: not ratio, because 0 arbitrarily set at freezing pt of water

Discrete & continuous variables

• variables measured on interval & ratio scales are further identified as either:–discrete – Integers, no intermediate values. E.g.

#Smarties in a box

–continuous - measurable to any level of accuracy. E.g. Weight of Smarties contents

Frequency distributions

•We have a pile of scores•Not all scores are equally likely•How were scores distributed?

•Subjects were timed (in sec) while completing a problem-solving task:•7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2

Stem & leaf

•Two components: the stem and the leaf•In problem-solving example, stem = ones, leaf = tenths•Stems range between 5 and 9

•7.6, 8.1, 9.2, 6.8, 5.9, 6.2, 6.1, 5.8, 7.3, 8.1, 8.8, 7.4, 7.7, 8.2• • 5|98• 6|821• 7|6347• 8|1182• 9|2•Key: 9|2 means 9.2

•Heights in cm:154, 143, 148,139, 143, 147, 153, 162, 136, 147, 144, 143, 139, 142, 143, 156, 151, 164, 157, 149, 146•- Put 2 digits in stem; split stems 0-4, 5-9•13|969•14|334323•14|87796•15|431•15|67•16|24•Key: 13|6 means 136

• GSR values: 23.25, 24.13, 24.76, 24.81, 24.98, 25.31, 25.57, 25.89, 26.28, 26.34, 27.09•- Round the last 2 digits•23|3•24|188•25|0369•26|33•27|1•Key: 23|3 means 23.3

Histogram

•Alternative way to look at distribution•It is like a version of stem-and-leaf turned 90 deg

Example

• Time to complete task (min):• 8 2 6 12 9 14 1 7 7 9 11 8

12 10 5 7 10 9 10 11 4 8 2 11 10 11 13 13 14 11 13 10 12 13 5 16 11 17 10 6 13 11 5 9 12 14 8 2 12 4

•Sort scores into about 10 or so bins (similar to stem in stem-and-leaf)

•Decide on sensible bins•Count the number of observations in each bin (length of each leaf in stem-and-leaf)•This number in each bin is called the frequency

time frequency 0-1 12-3 34-5 56-7 58-9 810-11 1312-13 1014-15 316-17 2

•This table is then used to make the histogram•Histogram is bar chart with frequency on y axis and score on x axis•Sometimes done other ways, e.g. connect the dots (frequency distrib polygon)

0 2 4 6 8 10 12 14 16 18 20Time (min)

•x<-c(8, 2, 6, 12, 9, 14, 1, 7, 7, 9, 11, 8, 12, 10, 5, 7, 10, 9, 10, 11, 4, 8, 2, 11, 10, 11, 13,13, 14, 11, 13, 10, 12, 13, 5, 16, 11, 17, 10, 6, 13, 11, 5, 9, 12, 14, 8, 2, 12, 4)•hist(x)•stem(x)•boxplot(x)

Probability distributions

•Histogram is estimate of true probability distribution•Many theoretical probability distributions exist•Basis of statistical models used to make inferences about population

Binomial distribution

• Binomial distribution is a discrete distribution• the binomial distribution applies when:

–there is a series of n trials (e.g., 10 coin tosses)

–only 2 possible outcomes per trial –outcomes are mutually exclusive (head or tail)–outcome of each trial independent of others

•The binomial distribution gives the chance of getting each total number of ‘successes’ after doing all the (binary) trials of the expt•E.g. it gives the chance of getting 1, 2, or 3 girls after giving birth to 6 children•p = p(success) = p(girl) = 0.5 each trial•q = p(failure) = p(boy) = 1-p = 0.5•n = number of trials = 6

• prob distribution where n = 6 and the prob of each outcome is 0.5 on each trial looks like:

29number of girls

probability

•For any probability distribution, the y-axis is given by a formula•For the binomial, it looks like this:

• k successes in n trials; () is binomial coefficient

• you don’t need to know it

Normal distribution

•Continuous probability distribution•Every probability distribution’s y-axis is given by a formula•For normal distribution, the y-axis (probability density) is:

Descriptive statistics

•We have a pile of scores•Have made stem-and-leaf, histogram•Want to summarise further: descriptive statistics

1. Centre (location)

•What is the ‘typical’ score? If you were to make a prediction for a new score, what would it be?

a) Mean (average)

•Mean = sum(x)/n

Mean as balance point

•Imagine that each observation is a toy block•Place the blocks on a ruler; the position (1, 2, etc inches) represents the value•The balance point is the mean

•1 2 2 3

1 2 2 5 1 2 2 9

Mean is pulled towards extreme observation (outlier)

b) Median

•Median is middle score; 50th percentile•useful when extreme scores (outliers) lie in one tail of distribution (skewed)•

Calculate the median

•Sort scores•If odd n, median is middle value•If even n, median is mean of 2 middle values•25 13 9 18 1 -> 1 9 13 18 25; med=13•25 13 9 18 -> 9 13 18 25•Median= (13+18)/2 = 15.5

Median and outliers

•1 2 2 3•1 2 2 5•1 2 2 9•Median = 2 in all cases

c) Mode

•Mode is most frequently occurring score•Mean should really be used only for interval/ratio data. Mode good otherwise•E.g. mean movie rating – not really sensible. Mode sensible•Sometimes no unique mode exists (e.g. bimodal)

•Bimodality can be due to mixture of two different populations (e.g. male and female)

• Mean = 9.36 Median = 10 Mode =11

0 2 4 6 8 10 12 14 16 18 20Time (min)

Time to complete task (min)

•mean(x)•median(x)•Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))]}•Mode(x)

Likert scale

• e.g. Brief Psychiatric Rating scale (BPRS)• Interview + observations of patient's

behaviour over preceding 2–3 days• Each item scored 0-7

• Suppose we have a new treatment• Does it reduce anxiety?• Define “anxiety” as score on Q2

• We use BPRS on lots of patients• Compare treatment and placebo• How? Find mean(treatment) vs

mean(placebo)?

• The numbers 0-7 are not really numbers!• They have only rank (order) info• Ordinal

• The “numbers” are really ordered labels: “normal”, “a bit anxious”, … , “extremely anxious”

• They lack a quantitative distance between them; calculating a mean level of anxiety for the group is not really appropriate

• It makes sense to find the mode• Most frequently occurring anxiety score

• It makes sense to measure the median: person in the middle of the group in terms of anxiety, with half the responses below and the other half above

Example

• Family-Focused Treatment Versus Individual Treatment for Bipolar Disorder: Results of a Randomized Clinical Trial

• J. Consulting & Clinical Psychology, 2003, 71, 482– 492

“The psychiatrist made ratings of compliance on a 7-point Likert scale ranging from full compliance (1) to discontinued medication against medical advice (7)” p.486

• “On the whole, the participants were quite compliant with their medication, with at least 78% of the patients scoring within the compliant range at each assessment point” p.489

• - Must have made mistake before: 1 is bad, 7 is good compliance

• For each 3-month follow-up period, participants were placed in one of the following clinical outcome categories:

(a) relapse, defined as a rating of 6 or 7 on the BPRS/SADS-C core symptoms of depression (depressed mood, loss of interest), mania (hostility, elevated mood, grandiosity), or psychosis (unusual thought content, suspiciousness, hallucinations, conceptual disorganization) and at least two ancillary symptoms (suicidality, guilt, sleep disturbance, appetite disturbance, lack of energy, negative evaluation, discouragement, increased energy activity), or

(b) nonrelapse, defined as a score of 5 or below on all relevant BPRS/SADS-C core symptoms during the 3-month interval

2. Spread (dispersion)

•Measure of centre (e.g. mean) tells what value we expect•Measure of spread tells how close a value will typically be to the centre

a) Interquartile range

•Interquartile range (IQR) finds distance between the top 25% and bottom 25% of scores

Quartiles

•Quartiles divide the data into quarters•The median (Q2) divides the data into 2 piles (50% above, 50% below)•Q1 is the cutoff below which fall the bottom 25% of scores•Q3 is the cutoff below which fall the bottom 75%

– Q1 has 25% of scores below it, Q2 has 50% (i.e. it is the median) and,Q3 has 75% of scores below it (25% above)

Finding quartiles

1. Sort the data2. Find the median = Q2 = value that

splits the data into two equal piles, half below it and half above

3. Q1 = median of lower half4. Q3 = median of upper half5. IQR = Q3 – Q1

•x<-c(8, 2, 6, 12, 9, 14, 1, 7, 7, 9, 11, 8, 12, 10, 5, 7, 10, 9, 10, 11, 4, 8, 2, 11, 10, 11, 13,13, 14, 11, 13, 10, 12, 13, 5, 16, 11, 17, 10, 6, 13, 11, 5, 9, 12, 14, 8, 2, 12, 4)•x<- sort(x); x•1 2 2 2 4 4 5 5 5 6 6 7 7 7 8 8 8 8 9 9 9 9 10 10 10 10 10 10 11 11 11 11 11 11 11 12 12 12 12 12 13 13 13 13 13 14 14 14 16 17

•n=50•Q2=(x[25]+x[26])/2 = (10+10)/2=10•Q1 = x[13] = 7•Q3= x[38] =12•IQR=Q3-Q1=12-7=5•We expect scores near 10, plus-or-minus 5 points

•fivenum(x)• 1 7 10 12 17•= min, Q1, Q2, Q3, max•IQR(x)•5

•boxplot(x)70

b) Standard deviation

•Each point is some distance away from mean•Each distance from the mean is a deviation

•Deviation = score - mean

•Each deviation contributes to the spread of the data about the mean•Is the total spread just the sum of the deviations, then? •No. Mean is a balance point, so positive and negative deviations cancel out•Can find a “sort of” average or “typical” deviation if we get rid of the signs

“Average” deviation

•Average deviation actually is zero because signs cancel. Need to get rid of signs•Idea: square each deviation, average, then take (positive) square root. [RMS]•That is the standard deviation!

Calculating the SD

•Find the deviations•Square them•Find the average•Take the square root to undo the squaring•In symbols:

• N or n-1N

c) variance

•Variance = SD squared

•Useful for ANOVA (ANalysis Of VAriance)

Likert scale

• These “numbers” are not really numbers• Therefore cannot do operations like

subtraction, division, sqrt• Use IQR

Statistical Inference

•Usually we are interested in more than describing or summarising the numbers we have on hand•E.g. have a sample, calculate mean. What is mean of larger pop?•E.g. have done an expt, means differ. Is this a fluke or “real”?

• The data we have on hand are samples from some (real or theoretical) population

• We want to make inferences about population

Summary

•IV, DV•Nominal, ordinal, interval, ratio•Continuous, discrete•Stem & leaf, histogram•Probability distribution•Mean, median, mode•IQR, SD, variance

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth

Documents

Early Settlements Jamestown & Plymouth Jamestown & Plymouth

Xml plymouth

Cascade Pacific Plymouth · 2011-01-28 · Page 8 October 2009 Cascade Pacific Plymouth Club, Inc. Plymouth Parts & Cars: For Sale & Wanted Cascade Pacific Plymouth Club, Inc

Distributions & Descriptive statistics Dr William Simpson Psychology, University of Plymouth 1

Permanent Placement - Bristol-Plymouth Regional … · Cameron Craveiro - Sheet Metal Cassandra Simpson (Taunton) - Welding Sculpture Bronze Medal Winners: Kayla Leffort (Rehoboth)

Plymouth Philharmonic Choir - Plymouth & Devon's premier choir

Janet Newton's Website Plymouth High School Plymouth, Wisconsin

CONGREGATIONAL PLYMOUTH The Plymouth Epistle …

John Potter Plymouth Business School University of Plymouth Project Management

Plymouth Usa

Plymouth July

USS Plymouth Rock (LSD29) 30.pdf · USS Plymouth Rock (LSD29) Newsletter September - December 2011 Welcome to the USS Plymouth Rock Newsletter Thirtieth Edition: The USS Plymouth

Plymouth - wac.colostate.edu

Conole plymouth

COMMERCIAL PROPERTY BOND - burringtonestates.com€¦ · The Ship, Plymouth Eurotech House, Plymouth Ocean Crescent, Plymouth Dean Clarke House, Exeter Burrington Business Park, Plymouth

The Economic Impact of PLYMOUTH MUNICIPAL AIRPORT …PLYMOUTH MUNICIPAL AIRPORT (PYM) Airport Overview Located approximately four miles southwest of the Town of Plymouth in Plymouth

PhD Studentships - Plymouth Marine Laboratory · Charity No. 1091222 Company No. 4178503 PhD Studentships Plymouth Marine Laboratory, Plymouth, UK Plymouth Marine Laboratory (PML)

Plymouth Presentation

MEMBERS OF THE PLYMOUTH INSTITUTION NOW PLYMOUTH …

Kate Simpson Moriarty. Descriptive Cataloging of Rare Materials … · 2005-03-02 · Kate Simpson Moriarty. Descriptive Cataloging of Rare Materials (Books) and Its Predecessors: