21
B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

Embed Size (px)

Citation preview

Page 1: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

B AD 6243: Applied Univariate Statistics

Understanding Data and

Data Distributions

Professor Laku Chidambaram

Price College of Business

University of Oklahoma

Page 2: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

Summarizing Data

Page 3: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 3

Measures of Central Tendency• Mean

– Summarizing sample data (continuous data)– Estimating the population parameter (µ) from the sample

statistic (x)– Arithmetic average (sum of scores/number of scores)

• Median– Mid point of distribution– Can be used in summarizing ordinal data

• Mode– Most frequently occurring value– Does not consider distribution of all scores– E.g., more males than females

Page 4: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 4

• Take a sample distribution of hourly billing rates for a group of IT workers:30, 20, 40, 50, 60, 60, 100, 60, 30

x = 450

n = 9

Mean = 50

Median: 20, 30, 30, 40, 50, 60, 60, 60, 100 = 50

Mode = 60

An Example

Page 5: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 5

Measures of Dispersion

• Range– Difference between highest and lowest score– Does not take into account all scores in distribution

• Variance– Measure of how much scores deviate from mean on

average– Use sample statistic (s2) to estimate population

parameter (2)• Standard Deviation

– Square root of the variance– Measure of consistency of scores

Page 6: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 6

Example (contd.)

x x-x (x-x)2

20 -30 90030 -20 40030 -20 40040 -10 10050 0 060 10 10060 10 10060 10 100

100 50 250050 0 4600

Average Sum Sum

Range = 80

(x-x)2 = 4600 (sum of squared differences)

s2 = (x-x)2/n-1 = 4600/8 = 575 (variance)

s = 24 (standard deviation)

Page 7: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 7

Frequencies

Page 8: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

Box Plot

BAD 6243: Applied Univariate Statistics 8

Page 9: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 9

Nature of the Distribution• Skewness

– Symmetry of the distribution– Value is zero in normal distribution– If skewed positively (pile up of scores on left) or negatively (pile up

of scores on right), standardized z scores can be useful

• Kurtosis– “Peak”edness of the distribution– Value is zero in normal distribution– If positive (peaked) or negative (flat), standardized z scores can be

useful

• Statistical TestsTests of Normality

.227 9 .199 .909 9 .310VAR00001Statistic df Sig. Statistic df Sig.

Kolmogorov-Smirnova

Shapiro-Wilk

Lilliefors Significance Correctiona.

Page 10: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 10

A Summary of Results

Page 11: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 11

Histogram with Normal Curve

Page 12: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

Understanding Data Distributions

Page 13: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 13

Data Distributions• A data distribution is a way of representing the

frequency of occurrence of values for a variable• Data distributions can be discrete (e.g.,

Bernoulli, Binomial, Poisson) or continuous (e.g., Normal, Exponential, Uniform)

• A histogram, representing a probability density function, depicts a data distribution

• Data distributions are defined by the functional form and the values of parameters

• Our focus is on the shape of such distributions and their implications for statistical inference

Page 14: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 14

Normal Distribution• Refers to a family of distributions (a.k.a

Gaussian distributions) that are bell shaped and:– Represent a continuous probability distribution– Are symmetric (with scores concentrated in the

middle)– Can be specified mathematically in terms of two

parameters: the mean (μ) and the standard deviation (σ)

– Have one mode– Are asymptotic

Page 15: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 15

An Example

                                                                                                                    

Page 16: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 16

Standard Normal Distribution

The area P under the standard normal probability curve, with the respective z-statistic

Page 17: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 17

The z Distribution• The standard normal distribution, sometimes called the z

distribution (as indicated by the formula below), is a normal distribution with a mean of 0 and a standard deviation of 1

• Normal distributions can be transformed to a standard normal distribution using the formula:

where X is a score from the original normal distribution, μ is its mean and σ is the standard deviation

• A z-score represents the number of standard deviations above or below the mean

• Note that the z distribution will only be a normal distribution if the original distribution (X) is normal

Page 18: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 18

Areas Under the Curve

The Empirical Rule: 68-95-99.7

Page 19: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 19

An Example

• If IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15, – what proportion of scores would be greater

than 125?– what proportion of scores would fall between

90 and 120?– what proportion of scores would be less than

85?

Page 20: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

BAD 6243: Applied Univariate Statistics 20

Some Key Concepts

• Central Limit Theorem– As sample size increases, the sampling distribution of

the mean for simple random samples of n cases, taken from a population with a mean equal to and a finite variance equal to 2, approximates a normal distribution

• Sampling Distribution of the Mean*• Standard Deviation vs. Standard Error of

the Mean• Sample Size vs. Number of Samples• Other Distributions

Page 21: B AD 6243: Applied Univariate Statistics Understanding Data and Data Distributions Professor Laku Chidambaram Price College of Business University of Oklahoma

The Central Limit Theorem

21