54
06 - Collections of Random Variables Random Samples What is a Statistic Sample Mean Sample Variance Expectations in Samples Important Inequalities General Idea of Asymptotics Types of Convergence LLN CLT Delta Method References Collections of Random Variables Brian Vegetabile, Dustin Pluta 2018 Statistics Bootcamp Department of Statistics University of California, Irvine September 14th, 2018 1 / 54

Collections of Random Variables - Data Science Initiative

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Collections of Random Variables

Brian Vegetabile, Dustin Pluta

2018 Statistics BootcampDepartment of Statistics

University of California, Irvine

September 14th, 2018

1 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Considering Multiple Random Variables I

• While random variables are interesting in themselves,most of statistics revolves around collections of randomvariables.

• Need the tools to discuss the relationships between therandom variables in a sample

2 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Considering Multiple Random Variables II

• Ultimately we will want to make use of theorems andproperties of random samples to perform inference onparameters of the distributions underlying a sample.

• For example, a statistic such as the sample mean will beused to make inference about the population mean

• We often rely on approximations that are due to “large”samples

3 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Defining a Random Sample I

• We begin by defining a random sample

Definition (Random Sample)

Consider n random variables X1, X2, . . . , Xn, these randomvariables form a random sample if each Xi is independent ofall others and the marginal pmf or pdf of each RV is the samefunction f . Such random variables are said to be independentand identically distributed.

Definition (Sample Size)

The number of random variables, n, in a random sample isreferred to as the sample size.

4 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Defining a Random Sample II

• One way to think about the idea of the random sample isthat there is some large or infinite ‘population’ where eachrandom variable is selected from

• In this population, each random variable is generatedusing the same density or mass function f

• The sample size n is the number of random variablesselected from that population.

5 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Examples of Random Samples

• Consider 100 coin flips from a fair coin. Each coin flip canbe considered as a random variable from a Bernoullidistribution with success probability p = 0.5.

• Consider the height of students in high schools around thecountry. It may be reasonable to assume that the heightsof these students come from a population where height isrepresented as a normal distribution centered at someaverage µ with variance σ2.

• Consider measuring 250 failure times for light bulbs froma single production line. The time to failure may beassumed to come from a population of light bulbs wherefailure time can be modeled as an exponential distributionwith rate parameter λ.

The main idea is ‘repeated observations’ of the samephenomenon.

6 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Joint Distribution of a Sample

• Based upon the definition of a random sample, we canconstruct the joint distribution which represents theprobability distribution of the sample.

• Assuming a parameterized probability function f(xi|θ)and we let x = (x1, . . . , xn), then

f(x|θ) = f(x1|θ)f(x2|θ) . . . f(xn|θ) =

n∏i=1

f(xi|θ)

• Unfortunately this joint distribution will not always benice to work with

• Additionally we may actually be interested in thedistribution of a function of the random variables.

7 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Defining Statistics & Sampling Distributions I

• Similar to the definition to random variable, we provide afew definitions

Definition (Statistic (DeGroot))

Suppose that the observable random variables of interest areX1, . . . , Xn. Let r be an arbitrary real-valued function of nreal variables. Then the random variable T = r(X1, . . . , Xn) iscalled a statistic.

Definition (Statistic (Dudewicz))

Any function of the random variables that are being observedsay tn(X1, X2, . . . , Xn), is called a statistic. Further sinceX1, X2, . . . , Xn are random variables, it is a random variable.

8 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Defining Statistics & Sampling Distributions II

Definition (Statistic (Casella))

Let X1, X2, . . . , Xn be a random sample of size n from apopulation and let T (x1, . . . , xn) be a real-valued orvector-valued function whose domain includes the samplespace of (X1, X2, . . . , Xn). Then the random variable orrandom vector Y = T (X1, . . . , Xn) is called a statistic. Theprobability distribution of a statistic Y is called the samplingdistribution of Y .

9 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Defining Statistics & Sampling Distributions III

• The major take away from all of these definitions is that astatistic is a function of random variables

• Since it is a function of random variables, it is also arandom variable and therefore has a distribution!

• Often we will specifically be interested in the distributionof the statistic

• Since these distributions will often be unknown, we willlook to finding approximations for them

10 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Sums of Random Variables & the Sample Mean

• A primary statistic involves the sum of the randomvariables and functions of these sums

• That is

T (X1, X2, . . . , Xn) = X1 +X2 + · · ·+Xn =

n∑i=1

Xi

• The sample mean mean is a function of a sum of randomvariables,

X̄ =1

n

n∑i=1

Xi

11 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Sample Variance

• Another statistic is the sample variance defined as

S2 =1

n− 1

n∑i=1

(Xi − X̄)2

• The sample standard deviation is defined as S =√S2.

12 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Expectations of Sums of Random Variables I

• Claim: Let X1, X2, . . . , Xn be a random sample and letg(x) be a function such that E(g(X1)) and V ar(g(X1))exist, then

E

(n∑i=1

g(Xi)

)= nE(g(X1))

and

V ar

(n∑i=1

g(Xi)

)= n(V ar(g(X1))

13 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Expectations of Sums of Random Variables II

• Claim: Let X1, X2, . . . , Xn be a random sample and letg(x) be a function such that E(g(X1)) exists,thenE(

∑ni=1 g(Xi)) = nE(g(X1))

• Demonstrate full proof

• Short “Proof”:

E

(n∑i=1

g(Xi)

)=

n∑i=1

E (g(Xi)) = nE(g(X1))

14 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Expectations of Sums of Random Variables III

• Now consider the sample mean, assume that theexpectation of the population is µ and the variance in thepopulation is σ2

• This implies that

E(X̄) = E

(1

n

n∑i=1

Xi

)

=1

nE

(n∑i=1

Xi

)

=1

nnE(X1)

= E(X1) = µ

15 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Expectations of Sums of Random Variables IV

• Additionally we can find the variance of the sample mean

V ar(X̄) = V ar

(1

n

n∑i=1

Xi

)

=1

n2V ar

(n∑i=1

Xi

)

=1

n2nV ar(X1)

=1

nV ar(X1) =

σ2

n

• These two results imply that regardless of thedistribution of the statistic itself, we know specificproperties of the distribution!

16 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

MGF’s for Sample Means I

• We also can calculate the MGF for the sample mean,specifically, define Y = 1

n

∑∞i=1Xi, where Xi are from a

random sample (iid)...

MY (t) = E(etY )

= E(etn

∑ni=1Xi)

17 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

MGF’s for Sample Means II

MY (t) = E(etn

∑ni=1Xi)

=

∫· · ·∫∫

etn

∑ni=1 xif(x1, . . . , xn)dx1 . . . dxn

=

∫· · ·∫∫ n∏

i=1

etnxi

n∏i=1

f(xi)dx1 . . . dxn

= [MX(t/n)]n

18 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example MGFs of the Sample Mean ofExponential Random Variables I

• The last result is mainly useful if we already know thedistribution of the underlying observations of the sample.

• Consider a random sample of size n, X1, X2, . . . from aexponential distribution with MGF

MX(t) =1

1− λt

19 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example MGFs of the Sample Mean ofExponential Random Variables II

therefore

MX(t/n) =

(1

1− λn t

)and

MX̄(t) =

(1

1− λn t

)n• Looking this up we see this is the MGF of a Gamma(n, λn)

• We’ll compare this with some approximations later

20 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Bounding Probabilities on Statistics I

• The true distribution of a statistic is often difficult toobtain. Often, there are simple methods to get estimatesfor probability statements regarding the statistic

• Consider Markov’s Inequality

Theorem (Markov’s Inequality)

Suppose that X is a random variable such that P (X ≥ 0) = 1,then for every real number t > 0

Pr(X ≥ t) ≤ E(X)

t

• DeGroot and Schervish state that the Markov inequality isprimarily of interest for large values of t

21 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Bounding Probabilities on Statistics II

• Related to Markov’s Inequality and often more useful isChebyshev’s Inequality.

Theorem (Chebyshev’s Inequality)

Let X be a random variable for which V ar(X) exists. Thenfor every real number t > 0,

Pr(|X − E(X)| ≥ t) ≤ V ar(X)

t2

• The proof follows from Markov’s Inequality consideringthe random variable Y = (X − E(X))2.

22 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Bounding Probabilities on Statistics III

• If we consider the sample mean and apply Chebyshev’sInequality, it follows that

Pr(|X̄ − µ| ≥ t) ≤ σ2

nt2

• This can be a very useful inequality for boundingprobabilities, or helping to choose sample sizes

23 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Application: Utilizing Chebyshev’s Inequality

• From DeGroot and Schervish, Consider a random variableX with V ar(σ2) and consider t = 3σ, then byChebyshev’s

Pr(|X − E(X)| ≥ 3σ) ≤ σ2

(3σ)2=

1

9≈ 0.11

• This implies that the probability that the a randomvariable will differ from its mean by more than 3 standarddeviations is less than 0.11

24 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Application: Chebyshev’s and the Sample Mean I

DeGroot and Schervish Example 6.2.1 - Determining the Re-quired Number of Observations

• Suppose that a random sample is to be taken from adistribution for which the value of the mean µ is notknown, but for which it is known that the standarddeviation σ is 2 units or less. We shall determine howlarge the sample size must be in order to make theprobability at least 0.99 that |X̄ − µ| will be less than 1unit.

25 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Application: Chebyshev’s and the Sample Mean II

• Since σ2 ≤ 22 = 4, it follows that for every sample of sizen, that

Pr(|X̄ − µ| ≥ 1) ≤ 4

n

Further by our problem statement we would likePr(|X̄ − µ| < 1) = 0.99, thus 0.01 ≤ 4

n which implies thatwe need 400 observations.

26 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

General Idea of Asymptotics

• While the true distribution may be unknown for any givenstatistic, there may be assumptions for approximating thedistribution ‘as n grows large’

• That is we may be able to make some statements aboutthe distribution of the statistic in the limit.

• This is where the Law of Large Numbers, Central LimitTheorem, and the Delta Method come into play

• We first review a few types of convergence.

27 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Types of Convergence

• We’ll be interested in the idea of what happens to thedistribution of the statistic as the sample size grows toinfinity.

• There are three main types of convergence we’ll outlinetoday

• Convergence in Law (or Distribution)• Convergence in Probability• Almost-Sure Convergence

28 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Convergence in Distribution

• We define our first mode of convergence: convergence indistribution or convergence in law

Definition (Convergence in Distribution)

A sequence of random variables, X1, X2, . . . , converges indistribution to a random variable X if

limn→∞

FXn(x) = FX(x)

for all points x where FX(x) is continuous. Denoted XnL−→ X

or XnD−→ X

• We see here that we are first talking about distributionfunctions converging to another distribution

• This is fundamentally different than the next few types ofconvergence.

29 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Convergence in Probability and Almost Surely I

• A somewhat ‘weak’ convergence is outlined below

Definition (Convergence in Probability)

A sequence of random variables X1, X2, . . . , converges inprobability to a random variable X if, for every ε > 0,

limn→∞

P (|Xn −X| ≥ ε) = 0

denoted XnP−→ X.

• Notice that the form of this looks very similar to some ofthe inequalities that we have demonstrated...

30 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Convergence in Probability and Almost Surely II

• A stronger version of convergence is almost-sureconvergence

Definition (Almost-Sure Convergence)

A sequence of random variables X1, X2, . . . ,, converges almostsurely to a random variable X if for every ε > 0,

P ( limn→∞

|Xn −X| < ε) = 1

denoted Xna.s.−−→ X.

• This is sometimes referred to as convergence withprobability 1.

31 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Converges Almost Surely

Dudewicz and Mishra Example 6.2.6

• Let Xn be a sequence of random variables defined by

Xn =

{0 with probability 1− (1

2)n

1 with probability (12)n

for n = 1, 2, 3, . . . . Then it can be shown thatP (limn→∞Xn = 0) = 1, hence Xn

a.s.−−→ 0.

32 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Converges in Probability I

Dudewicz and Mishra Example 6.2.6

• Let Xn be a sequence of random variables defined by

Xn =

{0 with probability 1− (1

2)n

1 with probability (12)n

for n = 1, 2, 3, . . . . To show convergence in probability,we can appeal to Markov’s Inequality...

33 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Converges in Probability II

• We see that E(Xn) = (12)n and E(X2) = (1

2)n.

• Therefore V ar(Xn) = 2n−122n

• Applying Markov’s Inequality we have for every ε > 0

Pr(|Xn| > ε) ≤ 1

2nε2

and therefore

limn→∞

P (|Xn| ≥ ε) = 0

• Therefore the sequence Xn converges in probability to arandom variable X that is “degenerate at zero” (takes onvalue 0 with probability 1)

34 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Weak Law of Large Numbers I

• Understanding convergence principles will allow us tounderstand properties of the sampling distribution as thesample size grows.

• The first result of major importance is the Weak Law ofLarge Numbers

Theorem (Weak Law of Large Numbers)

Suppose that X1, . . . , Xn form a random sample from adistribution for which the mean is µ and the variance exists.Let X̄n denote the sample mean. Then

X̄nP−→ µ

35 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Weak Law of Large Numbers II

• Proof:

Pr(|X̄n − µ| ≤ ε) ≥ 1− σ2

nε2

Hence

limn→∞

Pr(|X̄n − µ| < ε) = 1

showing the result.

• This result says that with high probability X̄n tends to avalue µ if the sample size is large

• This also suggests that if a large sample is taken of anunknown distribution, the sample mean will be a goodapproximation of the population mean with highprobability.

36 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Weak Law of Large Numbers R Demo

37 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Central Limit Theorem I

• The WLLN is great start for understanding thedistribution of the sample mean but fortunately, we canactually do better!

Theorem (Central Limit Theorem)

Let X1, X2, . . . be a sequence of independent and identicallydistributed random variables with mean µ and finite varianceσ2. Then √

n(X̄n − µ)L−→ N(0, σ2)

38 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Central Limit Theorem II

• Sketch of proof: Use moment generating functions ofcharacteristic functions to find the MGF ofY =

√n(X̄n − µ). Expand this characteristic function

using a Taylor series expansion and show that it convergesto the MGF of a normal distribution with variance σ2.

• There are additional versions of the Central LimitTheorem which reduce some of the assumptions, they willbe introduced throughout the year.

39 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Distribution of the Sample Mean ofExponential Random Variables I

• The Central Limit Theorem may be one of the mostimportant results in all of statistics.

• Consider the random sample X1, X2, . . . Xn whereXi ∼ Exponential(λ). That is

f(x|λ) =1

λexp

{−xλ

}Find the distribution of the sample mean from such asample.

40 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Distribution of the Sample Mean ofExponential Random Variables II

• Appealing to the Central Limit Theorem, we knowE(Xi) = λ exists and further that the varianceV ar(X) = λ2 is finite.

• This implies that

√n(X̄n − λ)

L−→ N(0, λ2)

• Thus we can say that

X̄n∼̇N(λ,λ2

n

)

41 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Central Limit Theorem Example

42 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Normal Approximation to the BinomialDistribution I

• In defining the binomial distribution, we stated that itcould be thought of as the sum of independent Bernoullitrials with success probability p.

• We can attempt to approximate the Binomial distributionthen using the Central Limit Theorem...

• First, E(X) = p and V ar(X) = p(1− p), therefore

√n(X̄n − p)

L−→ N(0, p(1− p))

based on the CLT

43 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Normal Approximation to the BinomialDistribution II

• This implies that

√n

(1

n

n∑i=1

Xi − p

)L−→ N(0, p(1− p))

factoring out a 1/n, this becomes

√n

n

(n∑i=1

Xi − np

)L−→ N(0, p(1− p))

44 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Normal Approximation to the BinomialDistribution III

• Now defining Y =∑n

i=1Xi (A binomial random variable),we have

1√n

(Y − np) L−→ N(0, p(1− p))

• Rearranging, this implies that

Y ∼̇N(np, np(1− p))

45 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Normal Approximation to the BinomialDistribution IV

• Why is such an approximation useful?

• Recall the mass function for the binomial distribution

f(x|p) =

(n

x

)px(1− p)n−x

•(nx

)= n!

x!(n−x)! which can become computationallychallenging for large n, where as the density of the normaldistribution is relatively easy to calculate computationally.

• Therefore these approximations can become very usefulthroughout statistics

46 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Normal Approximation to Binomial R Example

47 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

The Delta Method I

• The Central Limit Theorem is powerful in that it allows usto talk about the distribution of the sample mean.

• What if we’re interested in more complicated functions ofthe sample mean?

• This is where the Delta Method comes into play

• We’ll provide an informal derivation of the delta method

48 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

The Delta Method II

• Consider X1, X2, . . . , Xn which forms a random samplefrom a distribution that has a finite mean µ and finitevariance σ2.

• By CLT,√n(X̄n − µ)

L−→ N(0, σ2).

• Now suppose there exists a function g(X̄n) and we wouldlike to approximate its distribution.

49 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

The Delta Method III

• The delta method works by taking a Taylor seriesexpansion of g(X̄n) at the mean of the distribution, thatis

g(X̄n) ≈ g(µ) + g′(µ)(X̄n − µ) + . . .

and ignoring the higher order terms

50 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

The Delta Method IV

• Therefore

g(X̄n)− g(µ) = g′(µ)(X̄n − µ)√n(g(X̄n)− g(µ)) = g′(µ)

√n(X̄n − µ)

• We know√n(X̄n − µ)

L−→ N(0, σ2), which implies that

√n(g(X̄n)− g(µ))

L−→ N(0, (g′(µ))2σ2)

51 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Delta Method I

Ferguson - Chapter 7 Example 1

• Consider a random sample with mean µ and variance σ2,

by the√n(X̄n − µ)

L−→ N(0, σ2). What is the distributionof X̄2

n?

52 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

Example - Delta Method II

• Here g(X̄n) = X̄2n, thus g′(X̄n) = 2X̄n, thus g′(µ) = 2µ.

Utilizing the delta method formula we have that

√n(X̄2

n − µ2)L−→ N(0, 4µ2σ2)

• Notice that if µ = 0 this becomes a degenerate randomvariable and thus this approximation may not be useful...

53 / 54

06 -Collections of

RandomVariables

RandomSamples

What is aStatistic

Sample Mean

SampleVariance

Expectationsin Samples

ImportantInequalities

General IdeaofAsymptotics

Types ofConvergence

LLN

CLT

Delta Method

References

References

Stephen Abbott. Understanding analysis. Springer, 2001.

George Casella and Roger L Berger. Statistical inference.Second edition, 2002.

EJ Dudewicz and SN Mishra. Modern mathematical statistics.john wilsey & sons. Inc., West Sussex, 1988.

Sheldon Ross. A first course in probability. Pearson, ninthedition, 2014.

Mark J Schervish. Probability and Statistics. 2014.

54 / 54