41
MONDAY, DECEMBER 23 2013

Sriram seminar on introduction to statistics

Embed Size (px)

DESCRIPTION

introduction to statistics

Citation preview

Page 1: Sriram seminar on introduction to statistics

MONDAY, DECEMBER 23 2013

Page 2: Sriram seminar on introduction to statistics

INTRODUCTION TO THEORY OF STATISTICS

BY SRIRAM C

I sem M.Tech Geoinformatics KSRSAC

Page 3: Sriram seminar on introduction to statistics

Today:Central Tendency , Dispersion & Probability

From frequency tables to distributions Types of Distributions: Normal, Skewed Level of Measurement:

Nominal, Ordinal, Interval Central Tendency: Mode, Median, Mean Dispersion: Variance, Standard Deviation

Page 4: Sriram seminar on introduction to statistics

Descriptive statistics are concerned with describing the characteristics of frequency

distributions

Where is the center? What is the range? What is the shape [of the

distribution]?

Page 5: Sriram seminar on introduction to statistics

Frequency Distributions OR HISTOGRAMS

Simple depiction of all the data Graphic — easy to understand Problems

Not always precisely measured Not summarized in one number or datum

Simple depiction of all the data Graphic — easy to understand Problems

Not always precisely measured Not summarized in one number or datum

Page 6: Sriram seminar on introduction to statistics

Frequency TableTest Scores

Observation Frequency

65 1

70 2

75 3

80 4

85 3

90 2

95 1

Page 7: Sriram seminar on introduction to statistics

Frequency Distributions

Test Score

Frequency

4

3

2

1

65 70 75 80 85 90 95

Page 8: Sriram seminar on introduction to statistics

Normally Distributed Curve

Page 9: Sriram seminar on introduction to statistics

Skewed Distributions

Page 10: Sriram seminar on introduction to statistics

Summarizing Distributions

Two key characteristics of a frequency distribution are especially important when summarizing data or when making a prediction from one set of results to another:

Central Tendency What is in the “Middle”? What is most common? What would we use to predict?

Dispersion How Spread out is the distribution? What Shape is it?

Page 11: Sriram seminar on introduction to statistics

Three measures of central tendency are commonly used in statistical analysis - the mode, the median, and the mean

Each measure is designed to represent a typical score

The choice of which measure to use depends on:

• the shape of the distribution (whether normal or skewed), and

• the variable’s “level of measurement” (data are nominal, ordinal or interval).

Page 12: Sriram seminar on introduction to statistics

Appropriate Measures of Central Tendency

• Nominal variables Mode

• Ordinal variables Median

• Interval level variables Mean - If the distribution is normal

(median is better with skewed distribution)

• Nominal variables Mode

• Ordinal variables Median

• Interval level variables Mean - If the distribution is normal

(median is better with skewed distribution)

Page 13: Sriram seminar on introduction to statistics

ModeMost Common Outcome

Find the Mode

4 5 6 6 7 8 9 10 12Ans:6

Page 14: Sriram seminar on introduction to statistics

MedianMiddle-most Value50% of observations are above the Median,

50% are below itThe difference in magnitude between the

observations does not matterTherefore, it is not sensitive to outliersFormula Median = n + 1 / 2

Page 15: Sriram seminar on introduction to statistics

To compute the median first you rank order the values of X from

low to high: 85, 94, 94, 96, 96, 96, 96, 97, 97, 98

then count number of observations = 10. add 1 = 11.  divide by 2 to get the middle score the

5 ½ score here 96 is the middle score score

Page 16: Sriram seminar on introduction to statistics

Mean - Average

Most common measure of central tendency Best for making predictions Applicable under two conditions:1. scores are measured at the interval level, and2. distribution is more or less normal [symmetrical]. Symbolized as:

for the mean of a sample μ for the mean of a population

X

Page 17: Sriram seminar on introduction to statistics

Finding the MeanFinding the Mean• X = (Σ X / N)• If X = {3, 5, 10, 4, 3}

X = (3 + 5 + 10 + 4 + 3) / 5

= 25 / 5

= 5

• X = (Σ X / N)• If X = {3, 5, 10, 4, 3}

X = (3 + 5 + 10 + 4 + 3) / 5

= 25 / 5

= 5

Page 18: Sriram seminar on introduction to statistics

Find the MeanQ: 4, 5, 8, 7A: 6 Median: 6.5

Q: 4, 5, 8, 1000A: 254.25Median: 6.5

Page 19: Sriram seminar on introduction to statistics

Why can’t the mean tell us everything?

Mean describes Central Tendency, what the average outcome is.

We also want to know something about how accurate the mean is when making predictions.

The question becomes how good a representation of the distribution is the mean? How good is the mean as a description of central tendency -- or how good is the mean as a predictor?

Answer -- it depends on the shape of the distribution. Is the distribution normal or skewed?

Page 20: Sriram seminar on introduction to statistics

Measures of VariabilityCentral Tendency doesn’t tell us everything Dispersion/Deviation/Spread tells us a lot about

how a variable is distributed.We are most interested in Standard Deviations

(σ) and Variance (σ2)

Page 21: Sriram seminar on introduction to statistics

DispersionOnce you determine that the variable of interest

is normally distributed, ideally by producing ahistogram of the scores, the next question to beasked about the Normally Distributed Curve is

its dispersion: how spread out are the scores around the mean.

Dispersion is a key concept in statistical thinking.

The basic question being asked is how much do the scores deviate around the Mean? The more “bunched up” around the mean the better your ability to make accurate predictions.

Page 22: Sriram seminar on introduction to statistics

How well does the mean represent the scores in a distribution? The logic here is to determine how much spread is in the scores. How much do the scores "deviate" from the mean? Think of the mean as the true score or as your best guess. If every X were very close to the Mean, the mean would be a very good predictor.

If the distribution is very sharply peaked then the mean is a good measure of central tendency and if you were to use the mean to make predictions you would be right or close much of the time.

Page 23: Sriram seminar on introduction to statistics

Mean Deviation

The key concept for describing normal distributions

and making predictions from them is calleddeviation from the mean. We could just calculate the average distance

between each observation and the mean.• We must take the absolute value of the

distance, otherwise they would just cancel out to zero!

Formula: | |iX X

n

Page 24: Sriram seminar on introduction to statistics

Mean Deviation: An ExampleMean Deviation: An Example

X – Xi Abs. Dev.

7 – 6 1

7 – 10 3

7 – 5 2

7 – 4 3

7 – 9 2

7 – 8 1

1. Compute X (Average)2. Compute X – X and take

the Absolute Value to get Absolute Deviations

3. Sum the Absolute Deviations

4. Divide the sum of the absolute deviations by N

Data: X = {6, 10, 5, 4, 9, 8} X = 42 / 6 = 7

Total: 12 12 / 6 = 2

Page 25: Sriram seminar on introduction to statistics

What Does it Mean?On Average, each observation is two

units away from the mean.

Is it Really that Easy?• No!• Absolute values are difficult to manipulate algebraically• Absolute values cause enormous problems for calculus

(Discontinuity)• We need something else…

Page 26: Sriram seminar on introduction to statistics

Variance and Standard DeviationInstead of taking the absolute value, we

square the deviations from the mean. This yields a positive value.

This will result in measures we call the Variance and the Standard Deviation

Sample- Population-s: Standard Deviation σ: Standard

Deviations2: Variance σ2: Variance

Page 27: Sriram seminar on introduction to statistics

Example:

-1 1

3 9

-2 4

-3 9

2 4

1 1

Data: X = {6, 10, 5, 4, 9, 8}; N = 6

Total: 42 Total: 28

Standard Deviation:

76

42

N

XX

Mean:

Variance:2

2 ( ) 284.67

6

X Xs

N

16.267.42 ss

XX 2)( XX X

6

10

5

4

9

8

Page 28: Sriram seminar on introduction to statistics

Introduction to Probability

Experiments, Counting Rules, Experiments, Counting Rules,

and Assigning Probabilitiesand Assigning Probabilities Events and Their ProbabilityEvents and Their Probability Some Basic RelationshipsSome Basic Relationships

of Probabilityof Probability Conditional ProbabilityConditional Probability

Page 29: Sriram seminar on introduction to statistics

Probability as a Numerical MeasureProbability as a Numerical Measureof the Likelihood of Occurrenceof the Likelihood of Occurrence

00 11..55

Increasing Likelihood of OccurrenceIncreasing Likelihood of Occurrence

ProbabilitProbability:y:

The eventThe eventis veryis veryunlikelyunlikelyto occur.to occur.

The occurrenceThe occurrenceof the event isof the event is just as likely asjust as likely asit is unlikely.it is unlikely.

The eventThe eventis almostis almostcertaincertainto occur.to occur.

Page 30: Sriram seminar on introduction to statistics

An Experiment and Its Sample SpaceAn Experiment and Its Sample Space

An An experimentexperiment is any process that generatesis any process that generates well-defined outcomes.well-defined outcomes. An An experimentexperiment is any process that generatesis any process that generates well-defined outcomes.well-defined outcomes.

The The sample spacesample space for an experiment is the set of for an experiment is the set of all experimental outcomes.all experimental outcomes. The The sample spacesample space for an experiment is the set of for an experiment is the set of all experimental outcomes.all experimental outcomes.

An experimental outcome is also called a An experimental outcome is also called a samplesample pointpoint.. An experimental outcome is also called a An experimental outcome is also called a samplesample pointpoint..

Page 31: Sriram seminar on introduction to statistics

Events & Probabilities…

An individual outcome of a sample space is called a simple event [cannot break it down into several other events],

An event is a collection or set of one or more simple events in a sample space.

Roll of a die: S = {1, 2, 3, 4, 5, 6}

Simple event: the number “3” will be rolled

Event: an even number (one of 2, 4, or 6) will be rolled

Page 32: Sriram seminar on introduction to statistics

Events & Probabilities…

The probability of an event is the sum of the probabilities of the simple events that constitute the event.

E.g. (assuming a fair die) S = {1, 2, 3, 4, 5, 6} and

P(1) = P(2) = P(3) = P(4) = P(5) = P(6) = 1/6

Then:

P(EVEN) = P(2) + P(4) + P(6) = 1/6 + 1/6 + 1/6 = 3/6 = 1/2

Page 33: Sriram seminar on introduction to statistics

Probability Rules:Mathematical Notation

Page 34: Sriram seminar on introduction to statistics

Random Variables

A random variable is a variable whose value is a numerical outcome of a random phenomenon often denoted with capital alphabetic

symbols(X, Y, etc.)

a normal random variable may be denoted asX ~ N(µ, )

The probability distribution of a random variable X tells us what values X can take and how to assign probabilities to those values

Page 35: Sriram seminar on introduction to statistics

Random Variables

Random variables that have a finite (countable) list of possible outcomes, with probabilities assigned to each of these outcomes, are called discrete

Random variables that can take on any value in an interval, with probabilities given as areas under a density curve, are called continuous

Page 36: Sriram seminar on introduction to statistics

Random Variables

Discrete random variables number of pets owned (0, 1, 2, … ) numerical day of the month (1, 2, …,

31) how many days of class missed

Continuous random variables weight temperature time it takes to travel to work

Page 37: Sriram seminar on introduction to statistics

Conditional Probability…

Conditional probability is used to determine how two events are related; that is, we can determine the probability of one event given the occurrence of another related event.

Experiment: random select one student in class.

P(randomly selected student is male) =

P(randomly selected student is male/student is on 3rd row) =

Conditional probabilities are written as P(A | B) and read as “the probability of A given B” and is calculated as:

Page 38: Sriram seminar on introduction to statistics

Conditional Probability…Again, the probability of an event given that another event has occurred is called a conditional probability…

P( A and B) = P(A)*P(B/A) = P(B)*P(A/B) both are trueKeep this in mind!

Page 39: Sriram seminar on introduction to statistics

Data ExplorationSUMMARYDescriptive statistics help describe your data’s distributionA measure of central tendency and dispersion are needed to describe your data’s distribution statisticallyIdeally your data fits the descriptions of a normal distribution with data distributed evenly on either side of the measure of central tendency. The following are measures of central tendency: mean, median and modeThe following are measure of dispersion: range, variance, and standard deviationHistograms and box plots can help you illustrate your data’s distributionYour descriptive statistics, histograms and/or box plots together help you describe the nature of your dataAfter exploring your data using descriptive statistics it’s good to reflect on your question and modify or refine it as needed.

Page 40: Sriram seminar on introduction to statistics
Page 41: Sriram seminar on introduction to statistics

Thanks

MONDAY, DECEMBER 23 2013