Statistics and Quantitative Analysis U4320 Segment 4: Statistics and Quantitative Analysis Prof....

Preview:

Citation preview

Statistics and Quantitative Analysis U4320

Segment 4: Statistics and

Quantitative Analysis Prof. Sharyn O’Halloran

Probability Distributions A. Distributions: How do simple

probability tables relate to distributions? 1. What is the Probability of getting a head? ( 1 coin

toss)

1/2

0 1 Proportion of Heads

Prob.

Probability Distributions(cont.)

2. Now say we flip the coin twice. The picture now looks like:

1/2

0 1/2 Proportion of Heads

1/4

1

0 1headhead 2 headss

Probability Distributions(cont.)

As number of coin tosses increases, the distribution looks like a bell-shaped curve:

0

Proportion of Heads

1/2 1

Probability Distributions(cont.)

3. General: Normal Distribution Probability distributions are idealized bar graphs or histograms. As we get

more and more tosses, the probability of any one observation falls to zero. Thus, the final result is a bell-shaped curve

Probability Distributions(cont.)

B. Properties of a Normal Distribution 1. Formulas: Mean and Variance

Population Sample

Mean XN

ii

N

1 X

X

n

ii

n

1

Variance

2

2

1

( )X

N

ii

N

s

X X

n

ii

n

2

2

1

1

( )

Probability Distributions(cont.)

2. Note Difference with the Book The population mean is written as:

Variance as:

Example: Two tosses of a coin xpx(),

2 2( )()x px.

Number of Heads x

Probability p(x)

0 1/4 1 1/2 2 1/4

Probability Distributions(cont.)

2. Note Difference with the Book (cont.) So the average, or expected, number of heads in two tosses of a coin is:

0*1/4 + 1*1/2 + 2*1/4 = 1.

Probability Distributions(cont.)

3. Expected Value

E(x) = Average or Mean

Ex2 = Expected Variance

Probability Distributions(cont.)

C. Standard Normal Distribution 1.Definition: a normal distribution with mean 0 and standard deviation 1.

Z-values - Are points on the x-axis that show how many standard deviations that point is away from the mean m.

Z-values

Total Area of Curve = 1

Probability Distributions(cont.)

C. Standard Normal Distribution 2. Characteristics

symmetric Unimodal. continuous distributions

3. Example: Height of people are normally distributed with mean 5'7"

What is the proportion of people taller than 5'7"?

5'7" Z-values

Total Area of Curve = 1

area=1/2

Probability Distributions(cont.)

D. How to Calculate Z-scores Definition: Z-value is the number of standard deviations away from the mean Definition: Z-tables give the probability (score) of observing a particular z-

value.

Probability Distributions(cont.)

1. What is the area under the curve that is greater than 1 ? Prob (Z>1)

The entry in the table is 0.159, which is the total area to the right of 1.

Z-values

Total Area of Curve = 1

z=1

0.159

Probability Distributions(cont.)

2. What is the area to the right of 1.64? Prob (Z > 1.64)

The table gives 0.051, or about 5%.

Z-values

Total Area of Curve = 1

1.64

0.051

Probability Distributions(cont.)

3. What is the area to the left of -1.64? Prob (Z < -1.64)

Z-values

Total Area of Curve = 1

-1.64

0.051

Probability Distributions(cont.)

4. What is the probability that an observation lies between 0 and 1? Prob (0 < Z < 1)

Z-values 1.00

0.159

0.50

0.34

Probability Distributions(cont.)

5. How would you figure out the area between 1 and 1.5 on the graph? Prob (1 < Z < 1.5)

Z-values 1.00

0.159

1.50

0.0670.092

Probability Distributions(cont.)

6. What is the area between -1 and 2? Prob (-1 < Z < 2) P (-1<Z<0) = .341 P (0<z<2) = .50-.023 =.477 0.477 + 0.341 = .818

Z-values-1.00 2.00

0.023

0.159

0.341 0.477

Probability Distributions(cont.)

7. What is the area between -2 and 2? Prob(-2<Z<2) 1- Prob (Z< -2) - Prob (Z>2) = 1 - .023 - .023 = .954

Z-values-2.00 2.00

0.023

0.023

Probability Distributions(cont.)

E. Standardization 1. Standard Normal Distribution -- is a very special case where the

mean of distribution equals 0 and the standard deviation equals 1.

Z-values

Probability Distributions(cont.)

2. Case 1: Standard deviation differs from 1 For a normal distribution with mean 0 and some standard deviation , you

can convert any point x to the standard normal distribution by changing it to x/.

Z-values-2.00 2.00-2.00' 2.00'

SD=1

SD=3

Probability Distributions(cont.)

3. Case 2: Mean differs from 0 So starting with any normal distribution with mean and standard deviation 1, you can convert to a

standard normal by taking x- and using this as your Z-value. Now, what would be the area under the graph between 50 and 51?

Prob (0<Z<1) = .341

Z-values x=51

SD=1

Probability Distributions(cont.)

4. General Case: Mean not equal to 0 and SD not equal to 1 Say you have a normal distribution with mean & standard deviation . You can convert any point x in that distribution to the same point in the standard normal by computing

This is called standardization. The Z-value corresponds to x. The Z-table lets you look up the Z-Score of any number.

Zx.

Probability Distributions(cont.)

5. Trout Example: a. The lengths of trout caught in a lake are normally distributed with mean 9.5" and standard deviation 1.4". There is a law that you can't keep any fish below 12". What percent of the trout is this? (Can keep above 12)

Step 1: Standardize Find the Z-score of 12: Prob (x>12)

Step 2: Find z-score Find Prob (Z>1.79) Look up 1.79 in your table; only .037, or about 4% of the fish could be kept.

12 - 9.5Z = --------- = 1.79. 1.4

Probability Distributions(cont.)

5. Trout Example (cont.): b. Now they're thinking of changing the standard to 10" instead of 12". What proportion of fish could be kept under the new limit?

Standardize Prob (x>10)

Step 2: Find z-score Prob (Z>.36) In your tables, this gives .359, or almost 36% of the fish could be kept under the new law.

10 - 9.5Z = --------- = 0.36. 1.4

Joint Distributions A. Probability Tables

1. Example: Toss a coin 3 times. How many heads and how many runs do we observe? Def: A run is a sequence of one or more of the same event in a row Possible outcomes

Toss Probability Heads x

Runs y

TTT 1/8 0 1 TTH 1/8 1 2 THT 1/8 1 3 THH 1/8 2 2 HTT 1/8 1 2 HTH 1/8 2 3 HHT 1/8 2 2 HHH 1/8 3 1

Joint Distributions (cont.)

2. Joint Distribution Table

Runs Heads y x

1 2 3

0 1/8 0 0 1/8 1 0 1/4 (2/8) 1/8 3/8 2 0 1/4 (2/8) 1/8 3/8 3 1/8 0 0 1/8

1/4 1/2 1/4 1

Joint Distributions (cont.)

3. Definition: The joint probability of x and y is the probability that both x and y occur.

p(x,y) = Pr(X and Y)

p(0, 1) = 1/8, p(1, 2) = 1/4, and p(3, 3) = 0.

Joint Distributions (cont.)

B. Marginal Probabilities 1. Def: Marginal probability is the sum of the rows and columns. The overall probability of an event occurring.

So the probability that there are just 1 head is the prob of 1 head and 1 runs + 1 head and 2 runs + 1 head and 3 runs

= 0 + 1/4 + 1/8 = 3/8

px pxyy

() (,).

Joint Distributions (cont.)

C. Independence A and B are independent if P(A|B) = P(A).

P A BP A B

P B( | )

( & )

( ) ;

P A P A B( ) ( | ) ,

P A B P A P B( , ) ( ) ( ) .

Joint Distributions (cont.)

Are the # of heads and the # of runs independent?

# Runs y

# heads x

1 2 3 marg dist

0 1/8

1 3/8

2 3/8

3 1/8

marg dist 1/4 1/2 1/4 1

Correlation and Covariance A. Covariance

1.Definition of Covariance Which is defined as the expected value of the product of the differences from the means.

x y x Y

i x i Yi

N

i x i Y

E X Y

X Y

N

X Y p x y

, ( )( )

( )( )

( )( ) ( , ).

1

Correlation and Covariance

2.Graph

Correlation and Covariance B. Correlation

1.Definition of Correlation

x y

x y ySD,

* Covariance

SDx

Correlation and Covariance

2. Characteristics of Correlation

-1 1 if =1 then

x

y

Correlation and Covariance

2. Characteristics of Correlation (cont.)

if = -1

x

y

Correlation and Covariance

2. Characteristics of Correlation (cont.)

if = 0

x

y

Correlation and Covariance

2. Characteristics of Correlation (cont.)Why?

x y

x y

x i y

i xi

n

i yi

n

y

x

N

y

N

N,

)( )

( ) ( )

(x i

2

1

2

1

Sample HomeworkGET /FILE 'gss91.sys'.The SPSS/PC+ system file is read from file gss91.sys The SPSS/PC+ system file contains 1517 cases, each consisting of 203 variables (including system variables). 203 variables will be used in this session.------------------------------- COMPUTE AFFAIRS = XMARSEX.RECODE AFFAIRS (0,5,8,9 = SYSMIS) (1,2 = 0) (3,4 = 1).VALUE LABELS AFFAIRS 0 'BAD' 1 'OK'.The raw data or transformation pass is proceeding 1517 cases are written to the compressed active file. ***** Memory allows a total of 10345 Values, accumulated across all Variables. There also may be up to 1293 Value Labels for each Variable.

 

Sample Homework------------------------------------------------------------------------------- AFFAIRS Valid Cum Value Label Value Frequency Percent Percent Percent BAD .00 870 57.4 90.2 90.2 OK 1.00 94 6.2 9.8 100.0 . 553 36.5 Missing ------- ------- ------- Total 1517 100.0 100.0 -------------------------------------------------------------------------------

Sample Homework

AFFAIRS Mean .098 Std err .010 Median .000 Mode .000 Std dev .297 Variance .088 Kurtosis 5.398 S E Kurt .157 Skewness 2.718 S E Skew .079 Range 1.000 Minimum .000 Maximum 1.000 Sum 94.000 Valid cases 964 Missing cases 553

Sample HomeworkCOMPUTE MONEY = INCOME91. RECODE MONEY (0,22,98,99 = SYSMIS) (1 THRU 15 = 0) (16 THRU 21 = 1). VALUE LABELS MONEY 0 'LOW' 1 'HIGH'. FREQUENCIES /VARIABLES AFFAIRS MONEY /STATISTICS ALL. MONEY Valid Cum Value Label Value Frequency Percent Percent Percent LOW .00 787 51.9 57.5 57.5 HIGH 1.00 581 38.3 42.5 100.0 . 149 9.8 Missing ------- ------- ------- Total 1517 100.0 100.0 -------------------------------------------------------------------------------

Sample HomeworkMONEY Mean .425 Std err .013 Median .000 Mode .000 Std dev .494 Variance .245 Kurtosis -1.910 S E Kurt .132 Skewness .305 S E Skew .066 Range 1.000 Minimum .000 Maximum 1.000 Sum 581.000 Valid cases 1368 Missing cases 149

Sample HomeworkCROSSTABS /TABLES=MONEY BY AFFAIRS /CELLS /STATISTICS=CORR. Memory allows for 7,021 cells with 2 dimensions for general CROSSTABS. ------------------------------------------------------------------------------- MONEY by AFFAIRS

AFFAIRS MONEY

BAD OK COLUMN TOTAL

LOW COUNT 450 55 505 ROW % 89.1 10.9 COL. % 57.0 66.3 TOTAL % 51.6 6.3 57.9 HIGH COUNT 339 28 367 ROW % 92.4 7.6 COL. % 43.0 33.7 TOTAL % 38.9 3.2 42.1 ROW 789 83 872 TOTAL 90.5 9.5 100.0

Sample Homework

Approximate Statistic Value ASE1 T-value Significance -------------------- --------- -------- ------- ------------ Pearson's R -.05487 .03267 -1.62089 .10540 Spearman Correlation -.05487 .03267 -1.62089 .10540 Number of Missing Observations: 645

Recommended