Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Review of Probability and Statistics —Appendix B and C ofWooldridge’s textbook
1
What is statistics
1. Statistics is about using sample to understand population
2. We are interested in population, but we cannot get data for everybody. For instance, thegovernment wants to know what’s the proportion of people who have got coronavirus.This is a difficult task because it is impossible to give every person a medical test
3. However, it is usually much easier to get data for a sample, which by definition is asubset of population. For instance, we may give every student in this eco311 class acoronavirus test
4. Then statistics tells us how to use the result from a sample to make statistical inferenceabout population. Suppose 1 out of 20 students is confirmed with the virus, then we cansay about 5 percent of population have the virus
5. We emphasize the word “about” because we may get different result using a differentsample (e.g., students in eco317 class can be another sample). One key issue ofstatistics is accounting for sampling variability
2
Random Sample
1. Statistics works well when the sample is representative of population. The best sampleis random sample or iid sample, which is obtained by randomly selecting members frompopulation. Random sample ensures that everyone has equal chance to be selected.
2. Obviously the sample consisting of eco311 students is not a random sample—for onething, young college students cannot represent old people who are more vulnerable tovirus
3. We can redefine population as all college students in Ohio. Then the eco311 samplesupposedly works better
4. If population is still all Americans, we may use random number such as SSN to selectthe sample, and give those selected people medical tests
5. Lesson: define your population properly, and do not oversell the results from a specificsample
3
Distribution, Mean and Variance
1. Using statistics jargon, we assume population follow a random distribution. Forinstance we can use Bernoulli distribution to describe whether a person has virus (y = 1)or has no virus (y = 0)
2. A random distribution can be characterized by moments
(a) the first moment is expected value, also called population mean or expectation
µ ≡ E(y) = weighted average of possible values= ∑j
y jP(y = y j) (1)
were the weight is probability. The mean value measures the center of distribution
(b) the second moment is variance
σ2 ≡ var(y) = E(y−µ)2 = ∑
j(y j−µ)2P(y = y j) (2)
Variance measures dispersion of the distribution
4
Skewness, Kurtosis, Percentile, and Median
1. The third moment is skewness. A distribution has long right (left) tail if skewness isgreater (less) than zero. A distribution is symmetric if skewness is zero
2. The fourth moment is kurtosis. A distribution has a tail fatter (thinner) than normaldistribution if kurtosis is greater (less) than 3. The height of tail measures theprobability of value on the tail
3. The kth percentile is the value below which k percent of the observations may be found
4. Median is 50th percentile
5
Properties of Expectation
Let c be a constant, and x and y be two random variables. We have following three propertiesof expectation
E(c) = c (3)
Proof: E(c) = cP(y = c) = c since P(y = c) = 1
E(cy) = cE(y) (4)
Proof: E(cy) = ∑ j cy jP(y = y j) = c∑ j y jP(y = y j) = cE(y)
E(x+ y) = E(x)+E(y) (5)
6
Properties of Variance
var(c) = 0 (6)
Proof: var(c) = E(c−µ)2 = E(c− c)2 = 0
var(cy) = c2var(y) (7)
Proof: var(cy) = E(cy− cµ)2 = c2E(y−µ)2 = c2var(y)
7
Covariance and Properties
Covariance measures the linear association between x and y, and is defined as
σx,y ≡ cov(x,y) = E[(x−µx)(y−µy)] (8)
There are three properties of covariance
cov(c,y) = 0 (9)
cov(y,y) = var(y) (10)
var(x+ y) = var(x)+ var(y)+2cov(x,y) (11)
The last property looks similar to this
(a+b)2 = a2 +b2 +2ab
8
Proof of (11)
By definition
var(x+ y) = E((x+ y)−E(x+ y))2 (12)
= E((x−µx)+(y−µy))2 (13)
= E((x−µx)2 +(y−µy)
2 +2(x−µx)(y−µy)) (14)
= E(x−µx)2 +E(y−µy)
2 +2E((x−µx)(y−µy)) (15)
= var(x)+ var(y)+2cov(x,y) (16)
After-class exercise: prove that
var(y) = E(y2)−µ2y
9
Correlation
A bounded and unit-free measurement of association is correlation (coefficient)
ρ ≡ corr(x,y) =σx,y
σxσy=
cov(x,y)√var(x)
√var(y)
(17)
We can prove that−1≤ ρ ≤ 1 (18)
Two variables are perfectly positively (negatively) correlated if ρ equals one (negative one).Two variables are unrelated if ρ equals zero
10
iid Sample
A sample (y1,y2, . . . ,yn) is iid (independently and identically distributed) sample, which isalso random sample, if all three conditions below are satisfied
E(yi) = µ, var(yi) = σ2, cov(yi,y j) = 0, (∀i, j) (19)
Recall that we obtain iid sample by randomly drawing members from the population. IIDsample is the best sample.
11
Statistical Inference I: Estimation
1. Because data for population are unavailable, the population mean is unknown. The mostcommon problem of estimation is to find µ =?
2. In statistics class we use sample mean as the estimate for population mean
y≡ ∑ni yi
n(sample mean) (20)
Note that sample mean is a random variable because it varies across different samples
3. If we use iid sample, then the average of sample mean is population mean, i.e., samplemean from iid sample is an unbiased estimator
E(y) =∑
ni E(yi)
n=
∑ni µ
n=
nµ
n= µ (21)
4. The dispersion of sample mean is measured by its variance
var(y) =var(∑n
i yi)
n2 =∑
ni σ2
n2 =nσ2
n2 =σ2
n(22)
5. σ is standard deviation; σ√n is standard error
12
Law of Large Number
1. Notice that as n→ ∞, var(y) approaches 0, see (22). That implies when the sample sizeincreases, sample mean based on iid sample converges to the population mean (aconstant). This result is called law of large number
2. So we prefer large iid sample over small iid sample, simply because large samplecontains more information
13
Stata Interface
14
Introduction to Stata Interface
1. You will learn Stata in this course
2. To get access to stata, you can go to FSB computer lab or use Virtual PC (goole“miamioh FSB virtual pc”).
3. Stata can be found in the department apps folder on the desktop of virtual PC and ECOsub-folder
4. Stata has four windows. You type command in window B, and result appears in windowC. Old commands are stored in window A. Window D displays the names of variables
5. You can use mouse to pick an old command in window A, modify and execute it
6. You can use mouse to select variables in window D, other than typing their names
7. Stata allow you to use menu to do certain things. For instance, you can open an Exceldata file by selecting File — Import — Excel Spreadsheet
15
Example I: Estimation
16
Remarks
1. We use simulated data
2. We generate 1000 observations (n = 1000) of random values that follow a standardnormal distribution. Because data are generated by ourselves, we know the true value ofpopulation mean. In this case, we know µ = 0
3. Our first sample consists of the first five observations. We use stata command sum toreport summary statistics. The first sample mean is x = 0.0430
4. It is ok that x 6= µ, because we are using a sample
17
Example 1—continued
18
Remarks
1. Next we compare the first sample mean 0.0430 to the second sample mean -0.6231obtained by using the next five observations.
2. The second sample mean is bad since it differs substantially from the true value µ = 0
3. This bad estimate is not unexpected given that we use a small sample with only fiveobservations.
4. Lesson: the small sample can be dominated by extreme value (outlier), or in otherwords, small sample can be noisy
5. We get a much better sample mean that equals -0.022 using all 1000 observations. Thelarge sample presents more pattern than noise. This finding is consistent with law oflarge number
6. Lesson: sample mean is a random variable—sample mean varies from one sample toanother. Bad (good) sample can produce bad (good) estimate
19
Normal Distribution
1. Because of central limit theorem, the normal distribution plays a key role in statistics
2. A general normal random variable can be expressed as
y∼ N(µ,σ2) (23)
where µ is the population mean, and σ2 is the population variance
3. After standardizing y (computing its z-score), we get a standard normal random variablewith mean of zero and variance of 1
z≡ y−µ
σ∼ N(0,1) (24)
4. Table G.1 of the textbook reports P(z < some value). For instance
P(z < 1.96) = 0.975 (25)
P(−1.96 < z < 1.96) = 0.95 (26)
The last result defines the 95 percent confidence interval for a standard normal variable20
Example 1—continued
21
Remarks
1. We obtain more descriptive statistics by using sum, detail
2. The sample skewness is 0.0356, close to the population skewness of 0 (normaldistribution is symmetric)
3. The sample kurtosis is 2.8660, close to the population kurtosis of 3
4. Median is -0.0360, close to the sample mean -0.0225, confirming the symmetry.Median is 50 percent percentile, so 500 observations (half sample) are below -0.0360
5. Stata function normal reports the probability P(z < c) for given c; Stata functioninvnormal reports c so that P(z < c) equals a given probability. These two functionscan substitute Table G.1 of the textbook
22
Confidence Intervals
1. The 95 percent confidence interval for a general normal variable is based on this fact
P(µ−1.96σ < y < µ +1.96σ) = 0.95 (27)
Proof: let’s subtract population mean and divide by standard deviation for each term inthe inequality
P(µ−1.96σ < y < µ +1.96σ) =
P(
µ−1.96σ −µ
σ<
y−µ
σ<
µ +1.96σ −µ
σ
)= P(−1.96 < z < 1.96) = 0.95 (28)
2. Confidence intervals is important because it attaches kind of certainty to randomness. Inother words, confidence intervals make a random variable become partially predictable.
3. We are almost sure (with 95 percent probability) that a general normal random variabletakes a value between
(µ−1.96σ , µ +1.96σ) (95 confidence interval) (29)23
Rejection Zone
1. The probability that a normal random variable taking a value outside the 95 confidenceintervals is only 0.05
2. That is a small probability. So it is unlikely for a normal variable to take those extremevalues lying on the two tails
3. In short, tails are “unlikely zone”, and hypothesis testing is based on this idea—ahypothesis will be rejected if we end up in the tail part or rejection zone
24
Confidence Intervals and Rejection Zone (RZ)
25
Central Limit Theorem (CLT)
1. CLT states that as sample size rises, the sample mean of an iid sample (from anydistribution) converges to a normal random variable
y∼ N(
µ,σ2
n
), (as n→ ∞) (30)
2. The amazing part is, convergence of y to normal distribution occurs even though y doesnot follow normal distribution
3. Do not read too much into CLT: y does not converge to normal distribution. It is y thatconverges to normal distribution when sample rises
4. We get a standard normal distribution after standardizing y :
y−µ
σ/√
n∼ N (0,1) , (as n→ ∞) (31)
26
One-Sample T Test
1. The standardized sample mean is nothing but t statistic (t value, t ratio)
2. Under the null hypothesisH0 : µ = c (32)
the t statistic follows a standard normal distribution in large sample (i.e., n > 120)
t− statistic≡ y− cσ/√
n=
y− cse∼ N (0,1) , (when n > 120) (33)
3. We reject H0 if the t-statistic is in the tail (rejection zone) and there are three approachesto tell whether that is the case
(a) (critical value approach): we reject H0 when |t|> 1.96
(b) (p value approach): we reject H0 when 2∗P(z > |t|)< 0.05
(c) (confidence intervals approach): we reject H0 when c is outside CI
The three approaches lead to the same conclusion
27
Example 1—continued
28
Remarks
1. First we use stata command ttest to test the null hypothesis H0 : µ = 0
2. The standard error is σ/√
n = 0.9990194/√
1000 = 0.03159
3. The t-statistic is y−cse = −0.0225097−0
0.03159 =−0.7125. We do not reject H0 : µ = 0 because|−0.7125|< 1.96 (i.e., t value is not in tail or rejection zone)
4. The p-value is 2∗P(z > 0.7125) = 0.4763, greater than 0.05. So we do not reject H0
5. We do not reject H0 also because 0 is inside the CI (-0.0845, 0.0394)
6. Nevertheless, we can reject H0 : µ = 1 since t statistic equals -32.3663.
7. Intuitively, we reject H0 : µ = 1 because the sample mean -0.0225 is far away thehypothesized value 1; we cannot reject H0 : µ = 0 because the sample mean -0.0225 isclose to the hypothesized value 0. We use standard error as yardstick to measure thedistance between sample mean and hypothesized value. H0 is rejected if the gapbetween sample mean and hypothesized value exceeds 1.96se in absolute value
29