65
Quantitative Data Analysis: Statistics – Part 1

Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Embed Size (px)

Citation preview

Page 1: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Quantitative Data Analysis: Statistics – Part 1

Page 2: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

"... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can, for example, never foretell what any one man will do, but you can say with precision what an average number will be up to. Individuals vary, but percentages remain constant. So says the statistician."

Page 3: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Overview

Part 1 Picturing the Data Pitfalls of Surveys Averages Variance and Standard Deviation

Part 2 The Normal Distribution Z-Tests Confidence Intervals T-Tests

Page 4: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

 

~ THE GOLDEN RULE ~

Statistics NEVER replace

the judgment of the expert.

Page 5: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Approach to Statistical Research

1. Formulate a Hypothesis

2. State predictions of the hypothesis

3. Perform experiments or observations

4. Interpret experiments or observations

5. Evaluate results with respect to hypothesis

6. Refine hypothesis and start again

(Basically the same as all other research)

Page 6: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Hypothesis Testing

H0 : Null Hypothesis, status quo

HA : Alternative Hypothesis, research question

So, either :

"The data does not support H0"

or

"We fail to reject H0"

Page 7: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Types of Data

Continuous height, age, time

Discrete # of days worked this week, # leaves on a tree

Ordinal {Good, O.K., Bad}

Nominal {Yes/No}, {Teacher/Chemist/Haberdasher}

Page 8: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Picturing The Data

Page 9: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 10: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Time-Series Plots

Time related Data e.g. Stock Prices

Page 11: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 12: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Pie Charts

Nominal/Ordinal

Only suitable for data that adds up to 1

Hard to compare values in the chart

Page 13: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 14: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Bar Charts

Nominal/Ordinal

Easier to compare values than pie chart

Suitable for a wider range of data

Page 15: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 16: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Histograms

Continuous Data Divide Data into

ranges

Page 17: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 18: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Dot Plots

Nominal/Ordinal

Represents all the

data

Difficult to read

Page 19: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 20: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Scatter Plots

Excellent for examining association between two variables

Page 21: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 22: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Box Plots

Nominal/Ordinal 1IQR, 3IQR - First

interquartile range (IQR), third interquartile range (3QR)

Outliers

Page 23: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 24: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 25: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 26: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

John Tukey

• Born June 16, 1915• Died July 26, 2000• Born in New Bedford,

Massachusetts• He introduced the box

plot in his 1977 book,"Exploratory Data Analysis"

• Also the Cooley–Tukey FFT algorithm and jackknife estimation

Page 27: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

• While working with John von Neumann on early computer designs, Tukey introduced the word "bit" as a contraction of "binary digit". The term "bit" was first used in an article by Claude Shannon in 1948.

• The term "software", which Paul Niquette claims he coined in 1953, was first used in print by Tukey in a 1958 article in American Mathematical Monthly, and thus some people attribute the term to him.

John Tukey Paul Niquette Claude Shannon John von Neumann

Page 28: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 1

In a telephone survey of 68 households, when asked do they have pets, the following were the responses :

16 : No Pets 28 : Dogs 32 : Cats

Draw the appropriate graphic to illustrate the results !!

Page 29: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 1 - Solution

Total number surveyed = 68

Number with no pets = 16

=>Total with pets = (68 - 16) = 52

But total 28 dogs + 32 cats = 60

=> So some people have both cats and dogs

Page 30: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 31: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 1 - Solution

How many? It must be (60 - 52) = 8 people No pets = 16 Dogs = 20 Cats = 24 Both = 8

-------------------------

Total = 68

Page 32: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 1 - Solution

Graphic: Pie Chart or Bar Chart

Page 33: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 1 - Solution

Graphic: Pie Chart or Bar Chart

Page 34: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Pitfalls of Surveys

Page 35: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

1936 US Presidential Election Alf Landon (R) vs. Franklin D. Roosevelt (D)

Page 36: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

Page 37: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

Literary Digest had been conducting successful presidential election polls since 1916

They had correctly predicted the outcomes of the 1916, 1920, 1924, 1928, and 1932 elections by conducting polls.

These polls were a lucrative venture for the magazine: readers liked them; newspapers played them up; and each “ballot” included a subscription blank.

Page 38: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

In 1936 they sent out 10 million ballots to two groups of people: prospective subscribers, “who were chiefly

upper- and middle-income people” a list designed to "correct for bias" from the

first list, consisting of names selected from telephone books and motor vehicle registries

Page 39: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 40: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

Response rate: approximately 25%, or 2,376,523 responses

Result: Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%)

Page 41: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

Response rate: approximately 25%, or 2,376,523 responses

Result: Landon in a landslide (predicted 57% of the vote, Roosevelt predicted 40%)

Election result: Roosevelt received approximately 60% of the vote

Page 42: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

POSSIBLE CAUSES OF ERROR Selection Bias: By taking names and addresses from

telephone directories, survey systematically excluded poor voters. Republicans were markedly overrepresented in 1936, Democrats did not have as many phones, 

not as likely to drive cars, and did not read the Literary Digest

“Sampling Frame” is the actual population of individuals from which a sample is drawn: Selection bias results when sampling frame is not representative of the population of interest

Page 43: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Literary Digest Poll

POSSIBLE CAUSES OF ERROR Non-response Bias: Because only 20% of 10

million people returned surveys, non-respondents may have different preferences from respondents Indeed, respondents favored Landon Greater response rates reduce the odds of

biased samples

Page 44: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Definitions and Formula

Page 45: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Terminology

Population: is a set of entities concerning which statistical inferences are to be drawn.

Sample: a number of independent observations from the same probability distribution

Parameter: the distribution of a random variable as belonging to a family of probability distributions, distinguished from each other by the values of a finite number of parameters

Bias: a factor that causes a statistical sample of a population to have some examples of the population less represented than others.

Page 46: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Outliers (and their treatment)

Page 47: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Outliers (and their treatment)

An "outlier" is an observation that does not fit the pattern in the rest of the data

Check the data Check with the measurer If reason to believe it is NOT real, change it if possible,

otherwise leave it out (but note). If reason to believe it is real, leave it out and note.

Page 48: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Mean

The Mean (Arithmetic) The mean is defined as the sum of all

the elements, divided by the number of elements.

The statistical mean of a set of observations is the average of the measurements in a set of data

Page 49: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Mode

The mode is defined as the most frequently element in a set of elements. For example [1, 3, 6, 6, 6, 6, 7, 7, 12, 12, 17]

has a mode of 6. Given the list of data [1, 1, 2, 4, 4] the mode

is not unique - the dataset may be said to be bimodal, while a set with more than two modes may be described as multimodal.

Page 50: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Median

The median is defined as the middle element, or the value separating the higher half of a sample from the lower half.

If there is an even number of elements, it is half the sum of the middle two elements.

Given the list of data [1, 1, 2, 4, 4] the median is 2.

Page 51: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Variance

But there can be a lot of variance in individual elements,

e.g. teacher salaries

Average = €22,000

Lowest = € 12,000

Difference = 12,000 - 22,000 = -10,000

Page 52: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Variance

Page 53: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Variance

Page 54: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Variance

Page 55: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

The Variance

Sum of (Sample - Average) = 0, thus we need to define variance.

The variance of a set of data is a cumulative measure of the squares of the difference of all the data values from the mean divided by sample size minus one.

Page 56: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Standard Deviation

The standard deviation of a set of data is the positive square root of the variance.

- 1

- 1

Page 57: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Born 27 March 1857 Died 27 April 1936 Born in Islington, London, England Father of Mathematical Statistics protégé of Francis Galton Inventor of the P-value, the Pearson correlation coefficient, Chi distance, the Method of

moments, and Principal Component Analysis

Karl Pearson

Page 58: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Karl Pearson the term "standard deviation" in 1893, "although the idea was by then nearly a century old" (Abbott; Stigler, page 328). The term "standard deviation" was introduced in a lecture of 31 January 1893, as a convenient substitute for the cumbersome "root mean square error" and

the older expressions "error of mean square" and "mean error." The term was first used in a publication in 1894 by Pearson in "Contributions to the Mathematical Theory of Evolution," (Philosophical Transactions of the

Royal Society A, 185, (1894), 71-110.).

http://jeff560.tripod.com/s.html

Page 59: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 2

Find the mean and variance of the following sample values :

36, 41, 43, 44, 46

Page 60: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 2

Mean:

=(36 + 41 + 43 + 44 + 46) / 5

=210 / 5

=42

Page 61: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

Question 2

Variance

Difference Square36 – 42 = -6 3641 – 42 = -1 143 – 42 = 1 144 – 42 = 2 446 – 42 = 4 16 ------------------------------------ 58

Variance = 58 / (5 -1) = 58 / 4 = 14.5

Standard Deviation = SquareRoot(14.5) = 3.8

Page 62: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

http://www.oerrecommender.org/visits/94142

Page 63: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,
Page 64: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,

http://mathforum.org/library/drmath/view/65410.html

Page 65: Quantitative Data Analysis: Statistics – Part 1. "... while man is an insoluble puzzle, in the aggregate he becomes a mathematical certainty. You can,