Basic stats

Basic Statistics

A Brief IntroductionAllison Titcomb, Ph.D.

ICYF, SFCR, U of A

Types of Data

Stevens Levels (Scales) of Measurement:

• Nominal (Categories)• Numbers indicate difference in kind• e.g., ethnicity, gender, id#s

• Ordinal (Ordered)• Numbers represent rank orderings;

distances are not equal (e.g., grades, rank orderings on a survey)

Stevens Levels cont.

• Interval• Equal intervals, “arbitrary” zero• Ratios have no meaning• e.g., temperature in degrees F

• Ratio• Equal intervals, absolute zero• Equal ratios are equivalent• e.g, weight, height

Other Types of Data

Qualitative (nominal and ordinal) vs. Quantitative (interval and ratio)

Discrete (finite number of values) vs. Continuous (can potentially take on any numerical value)

Dichotomous (only 2 values)

What kind of data are these?

Number of crimes in a county Religious preference Pass/Fail on a test Income other examples?

Data Reduction

Descriptive Statistics a.k.a. Summary Statistics• numbers that represent some

characteristics of the set of scores• unorganized > organized• graph, shape

More data reduction

Frequency Distributions• bar diagram/histogram

• discrete vs. continuous data• nominal level

• (ordinal data-- why don’t you graph it?)

More data reduction

• frequency distributions• interval/ratio• shapes include skewed, bimodal, j

shaped…

(See samples on board/overhead)

More data reduction

Measures of Central Tendency• describing and typifying• used for comparison• Mean (typical/average score,

sensitive to extreme scores)• Median (middlemost score)• Mode (most “common” score)

More data reduction

Measures of Variability• dispersion/degree of heterogeneity• Range• Variance (degree of variability of

individual scores)• Standard Deviation (sq. root of

variance; typical “distance” between individual scores and the mean of the sample)

More data reduction

Things that contribute to variability• natural variability (true variance,

tough to measure)• sampling error• measurement error• systematic variance• MAX MIN CON

More data reduction

Normal Curve• With large numbers, many things are

“normally distributed”• majority of individuals measured are

clustered close to the mean• symmetric; mean, median, mode at

same point; range is approx. 6 standard deviations

More data reduction

Measures of relationship• Pearson’s Product Moment Correlation,

more fondly known simply as “r”• Correlation coefficient• 2 sets of scores; question is the

relationship between the 2. Is there a relationship?

• Allows us to predict; reliability

Correlation

Describing the relationship• Direction

• positive (high w/high, low w/low)• negative (low w/high, high w/low)

• Magnitude• +1.0 vs. -1.0• low correlation, no correlation

• Draw a picture a.k.a. scatterplot• Assumption is that it is Linear

Inferential Statistics

Statistics in never having to say you’re certain; judgment/ leap/ inference; generalization

population parameters and sample statistics

based on probability (relative frequency of occurrence of an event in the “long run”)


Errors in Statistical Reasoning• Null hyp-- no difference hypothesis

Types of Errors (See Handout)• Type I

• rejecting the null when it’s true• crying wolf/false alarm/trigger happy• in law, we don’t want to convict innocent• “controlled” by alpha level (e.g., 0.05)


• Type II• NOT rejecting the null when it’s wrong• “nice puppy” as the wolf bites your

fingers• In medicine, we’d rather treat someone

who isn’t sick than to NOT treat someone who is (HMOs might change that)

• Beta, effect size, power of a test, alpha level


Major types of statistical tests Don’t forget: What’s the question?

• “t test” (or “t-test statistic) two means• t for two; Gossett at Guiness Student’s t

Inferential statistics

• F test• for more than two means• a t test is a baby F test• btwn/within; Fisher an agrarian

researcher (ever heard of a split plot design?)

• interactions


• Chi square• nominal data (e.g., democrats and

republicans; males and females)

• Correlation


Statistical vs. Practical Significance• Cost/Benefit question-- Ask

“Compared to What”• Statistically significant other?• Significant findings do NOT eliminate

the need for replication


p = 0.0001 vs. p = 0.01 NOT effect size

Non significant findings often do not get published-- bias in literature?

Technology

Basic stats