myblue5. Getting Started with Statisticssman/courses/6739/6739-05-StatsBasics-200… · Coke vs....

Preview:

Citation preview

5. Getting Started with Statistics

Dave Goldsman

H. Milton Stewart School of Industrial and Systems EngineeringGeorgia Institute of Technology

7/12/20

ISYE 6739 — Goldsman 7/12/20 1 / 74

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 2 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:

Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules

— normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data.

We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Lesson 5.1 — Introduction to Descriptive Statistics

What’s Coming Up:Three high-level lessons on what Statistics is (not involving much math).

Several lessons on estimating parameters of probability distributions.

One lesson on certain distributions that will come up in subsequentStatistics modules — normal, time for t, χ2, and F .

Statistics forms a rational basis for decision-making using observed orexperimental data. We make these decisions in the face of uncertainty.

Statistics helps us answer questions concerning:

The analysis of one population (or system).

The comparison of many populations.

ISYE 6739 — Goldsman 7/12/20 3 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example):

We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter.

Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Examples:

Election polling.

Coke vs. Pepsi.

The effect of cigarette smoking on the probability of getting cancer.

The effect of a new drug on the probability of contracting hepatitis.

What’s the most popular TV show during a certain time period?

The effect of various heat-treating methods on steel tensile strength.

Which fertilizers improve crop yield?

King of Siam — etc., etc., etc.

Idea (Election polling example): We can’t poll every single voter. Thus, wetake a sample of data from the population of voters, and try to make areasonable conclusion based on that sample.

ISYE 6739 — Goldsman 7/12/20 4 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.),

and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of Data

Continuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables:

Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables:

Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables:

These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Statistics tells us how to conduct the sampling (i.e., how many observations totake, how to take them, etc.), and then how to draw conclusions from thesampled data.

Types of DataContinuous variables: Can take on any real value in a certaininterval. For example, the lifetime of a lightbulb or the weight of anewborn child.

Discrete variables: Can only take on specific values. E.g., the numberof accidents this week at a factory or the possible rolls of a pair of dice.

Categorical variables: These data are not typically numerical.What’s your favorite TV show during a certain time slot?

ISYE 6739 — Goldsman 7/12/20 5 / 74

Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,if only to identify any obvious issues such as nonstandard distributions,missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with. Ifyou take enough observations, the histogram will eventually converge to thetrue distribution. But sometimes choosing the optimal number of cells is alittle tricky — like Goldilocks!

ISYE 6739 — Goldsman 7/12/20 6 / 74

Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,if only to identify any obvious issues such as nonstandard distributions,missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with. Ifyou take enough observations, the histogram will eventually converge to thetrue distribution. But sometimes choosing the optimal number of cells is alittle tricky — like Goldilocks!

ISYE 6739 — Goldsman 7/12/20 6 / 74

Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,if only to identify any obvious issues such as nonstandard distributions,missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with.

Ifyou take enough observations, the histogram will eventually converge to thetrue distribution. But sometimes choosing the optimal number of cells is alittle tricky — like Goldilocks!

ISYE 6739 — Goldsman 7/12/20 6 / 74

Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,if only to identify any obvious issues such as nonstandard distributions,missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with. Ifyou take enough observations, the histogram will eventually converge to thetrue distribution. But sometimes choosing the optimal number of cells is alittle tricky — like Goldilocks!

ISYE 6739 — Goldsman 7/12/20 6 / 74

Introduction to Descriptive Statistics

Plotting Data

A picture is worth 1000 words. Always plot data before doing anything else,if only to identify any obvious issues such as nonstandard distributions,missing data points, outliers, etc.

Histograms provide a quick, succinct look at what you are dealing with. Ifyou take enough observations, the histogram will eventually converge to thetrue distribution. But sometimes choosing the optimal number of cells is alittle tricky — like Goldilocks!

ISYE 6739 — Goldsman 7/12/20 6 / 74

Summarizing Data

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 7 / 74

Summarizing Data

Lesson 5.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52

67 11 87 99 37 62 40 33 80 83

99 90 18 73 68 75 75 90 36 55

ISYE 6739 — Goldsman 7/12/20 8 / 74

Summarizing Data

Lesson 5.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52

67 11 87 99 37 62 40 33 80 83

99 90 18 73 68 75 75 90 36 55

ISYE 6739 — Goldsman 7/12/20 8 / 74

Summarizing Data

Lesson 5.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52

67 11 87 99 37 62 40 33 80 83

99 90 18 73 68 75 75 90 36 55

ISYE 6739 — Goldsman 7/12/20 8 / 74

Summarizing Data

Lesson 5.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52

67 11 87 99 37 62 40 33 80 83

99 90 18 73 68 75 75 90 36 55

ISYE 6739 — Goldsman 7/12/20 8 / 74

Summarizing Data

Lesson 5.2 — Summarizing Data

In addition to plotting data, how do we summarize data?

It’s nice to have lots of data. But sometimes it’s too much of a good thing!Need to summarize.

Example: Grades on a test (i.e., raw data):

23 62 91 83 82 64 73 94 94 52

67 11 87 99 37 62 40 33 80 83

99 90 18 73 68 75 75 90 36 55

ISYE 6739 — Goldsman 7/12/20 8 / 74

Summarizing Data

Stem-and-Leaf Diagram of grades.

Easy way to write down all of the data.Saves some space, and looks like a sideways histogram.

9 9944100

8 73320

7 5533

6 87422

5 52

4 0

3 763

2 3

1 81

ISYE 6739 — Goldsman 7/12/20 9 / 74

Summarizing Data

Stem-and-Leaf Diagram of grades. Easy way to write down all of the data.Saves some space, and looks like a sideways histogram.

9 9944100

8 73320

7 5533

6 87422

5 52

4 0

3 763

2 3

1 81

ISYE 6739 — Goldsman 7/12/20 9 / 74

Summarizing Data

Stem-and-Leaf Diagram of grades. Easy way to write down all of the data.Saves some space, and looks like a sideways histogram.

9 9944100

8 73320

7 5533

6 87422

5 52

4 0

3 763

2 3

1 81

ISYE 6739 — Goldsman 7/12/20 9 / 74

Summarizing Data

Grouped Data

Cumul. Proportion of

Range Freq. Freq. observations so far

0–20 2 2 2/30

21–40 5 7 7/30

41–60 2 9 9/30

61–80 10 19 19/30

81–100 11 30 1

ISYE 6739 — Goldsman 7/12/20 10 / 74

Summarizing Data

Grouped Data

Cumul. Proportion of

Range Freq. Freq. observations so far

0–20 2 2 2/30

21–40 5 7 7/30

41–60 2 9 9/30

61–80 10 19 19/30

81–100 11 30 1

ISYE 6739 — Goldsman 7/12/20 10 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

Summary Statistics:

n = 30 observations.

If Xi is the ith score, then the sample mean is

X ≡n∑i=1

Xi/n = 66.5.

The sample variance is

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 = 630.6.

Remark: Before you take any observations, X and S2 must be regarded asrandom variables.

ISYE 6739 — Goldsman 7/12/20 11 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean:

X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median:

The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

In general, suppose that we sample iid data X1, . . . , Xn from the populationof interest.

Example: Xi is the lifespan of the ith lightbulb we observe.

We’re most interested in measuring the “center” and “spread” of theunderlying distribution of the data.

Measures of Central Tendency:

Sample Mean: X =∑n

i=1Xi/n.

Sample Median: The “middle” observation when the Xi’s are arrangednumerically.

ISYE 6739 — Goldsman 7/12/20 12 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode:

“Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Example: 16, 7, 83 gives a median of 16.

Example: 16, 7, 83, 20 gives a “reasonable” median of 16+202 = 18.

Remark: The sample median is less susceptible to “outlier” data than thesample mean. One bad number can spoil the sample mean’s entire day.

Example: 7, 7, 7, 672, 7 results in a sample mean of 140 and a samplemedian of 7.

Sample Mode: “Most common” value. Not the most useful measuresometimes.

Example: 16, 7, 20, 83, 7 gives a mode of 7.

ISYE 6739 — Goldsman 7/12/20 13 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =

1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation:

S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range:

maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Measures of Variation (dispersion, spread)

Sample Variance:

S2 ≡ 1

n− 1

n∑i=1

(Xi − X)2 =1

n− 1

( n∑i=1

X2i − nX2

),

the latter expression being easier to compute.

Sample Standard Deviation: S = +√S2.

Sample Range: maxiXi −miniXi.

ISYE 6739 — Goldsman 7/12/20 14 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n

and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: Suppose the data takes p different values X1, . . . , Xp, withfrequencies f1, . . . , fp, respectively.

How to calculate X and S2 quickly?

X =

p∑j=1

fjXj/n and S2 =

∑pj=1 fjX

2j − nX2

n− 1.

Example: Suppose we roll a die 10 times.

Xj 1 2 3 4 5 6

fj 2 1 1 3 0 3

Then X = (2 · 1 + 1 · 2 + · · ·+ 3 · 6)/10 = 3.7, and S2 = 3.789. 2

ISYE 6739 — Goldsman 7/12/20 15 / 74

Summarizing Data

Remark: If the individual observations can’t be determined in frequencydistributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth intervalby mj , j = 1, . . . , c, and the total sample size n =

∑cj=1 fj = 30.

Xj interval mj fj

100–150 125 10

150–200 175 15

200–300 250 5

X ≈∑c

j=1 fjmj

n= 170.833 and

S2 ≈∑c

j=1 fjm2j − nX2

n− 1= 1814. 2

ISYE 6739 — Goldsman 7/12/20 16 / 74

Summarizing Data

Remark: If the individual observations can’t be determined in frequencydistributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth intervalby mj , j = 1, . . . , c, and the total sample size n =

∑cj=1 fj = 30.

Xj interval mj fj

100–150 125 10

150–200 175 15

200–300 250 5

X ≈∑c

j=1 fjmj

n= 170.833 and

S2 ≈∑c

j=1 fjm2j − nX2

n− 1= 1814. 2

ISYE 6739 — Goldsman 7/12/20 16 / 74

Summarizing Data

Remark: If the individual observations can’t be determined in frequencydistributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth intervalby mj , j = 1, . . . , c, and the total sample size n =

∑cj=1 fj = 30.

Xj interval mj fj

100–150 125 10

150–200 175 15

200–300 250 5

X ≈∑c

j=1 fjmj

n= 170.833 and

S2 ≈∑c

j=1 fjm2j − nX2

n− 1= 1814. 2

ISYE 6739 — Goldsman 7/12/20 16 / 74

Summarizing Data

Remark: If the individual observations can’t be determined in frequencydistributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth intervalby mj , j = 1, . . . , c, and the total sample size n =

∑cj=1 fj = 30.

Xj interval mj fj

100–150 125 10

150–200 175 15

200–300 250 5

X ≈∑c

j=1 fjmj

n= 170.833 and

S2 ≈∑c

j=1 fjm2j − nX2

n− 1= 1814. 2

ISYE 6739 — Goldsman 7/12/20 16 / 74

Summarizing Data

Remark: If the individual observations can’t be determined in frequencydistributions, you might just break the observations up into c intervals.

Example: Suppose c = 3, where we denote the midpoint of the jth intervalby mj , j = 1, . . . , c, and the total sample size n =

∑cj=1 fj = 30.

Xj interval mj fj

100–150 125 10

150–200 175 15

200–300 250 5

X ≈∑c

j=1 fjmj

n= 170.833 and

S2 ≈∑c

j=1 fjm2j − nX2

n− 1= 1814. 2

ISYE 6739 — Goldsman 7/12/20 16 / 74

Candidate Distributions

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 17 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with.

We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests.

But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

Lesson 5.3 — Candidate Distributions

Time to make an informed guess about the type of probability distributionwe’re dealing with. We’ll look at more-formal methodology for fittingdistributions later in the course when we do goodness-of-fit tests. But for now,some preliminary things we should think about:

Is the data from a discrete, continuous, or mixed distribution?

Univariate/multivariate?

How much data is available?

Are experts around to ask about nature of the data?

What if we do not have much/any data — can we at least guess at a gooddistribution?

ISYE 6739 — Goldsman 7/12/20 18 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the distribution is a discrete random variable, then we have a number offamiliar choices to select from.

Bernoulli(p) (success with probability p)

Binomial(n, p) (number of successes in n Bern(p) trials)

Geometric(p) (number of Bern(p) trials until first success)

Negative Binomial (number of Bern(p) trials until multiple successes)

Poisson(λ) (counts the number of arrivals over time)

Empirical (the all-purpose “sample” distribution based on the histogram)

ISYE 6739 — Goldsman 7/12/20 19 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Candidate Distributions

If the data suggest a continuous distribution. . . .

Uniform (not much is known from the data, except perhaps the minimumand maximum possible values)

Triangular (at least we have an idea regarding the minimum, maximum,and “most likely” values)

Exponential(λ) (e.g., interarrival times from a Poisson process)

Normal (a good model for heights, weights, IQs, sample means, etc.)

Beta (good for specifying bounded data)

Gamma, Weibull, Gumbel, lognormal (reliability data)

Empirical (our all-purpose friend)

ISYE 6739 — Goldsman 7/12/20 20 / 74

Introduction to Estimation

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 21 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables.

If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Lesson 5.4 — Introduction to Estimation

Definition: A statistic is a function of the observations X1, . . . , Xn, andnot explicitly dependent on any unknown parameters.

Examples of statistics: X and S2, but not (X − µ)/σ.

Statistics are random variables. If we take two different samples, we’d expectto get two different values of a statistic.

A statistic is usually used to estimate some unknown parameter from theunderlying probability distribution of the Xi’s.

Examples of parameters: µ, σ2.

ISYE 6739 — Goldsman 7/12/20 22 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s.

Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ.

Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and

S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Introduction to Estimation

Let X1, . . . , Xn be iid RV’s and let T (X) ≡ T (X1, . . . , Xn) be a statisticbased on the Xi’s. Suppose we use T (X) to estimate some unknownparameter θ. Then T (X) is called a point estimator for θ.

Examples: X is usually a point estimator for the mean µ = E[Xi], and S2 isoften a point estimator for the variance σ2 = Var(Xi).

It would be nice if T (X) had certain properties:

Its expected value should equal the parameter it’s trying to estimate.

It should have low variance.

ISYE 6739 — Goldsman 7/12/20 23 / 74

Unbiased Estimation

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 24 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.

Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ).

Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,

E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Lesson 5.5 — Unbiased Estimation

Definition: T (X) is unbiased for θ if E[T (X)] = θ.

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ.Then X is always unbiased for µ.

E[X] = E

[1

n

n∑i=1

Xi

]= E[Xi] = µ.

That’s why X is called the sample mean. 2

Baby Example: In particular, suppose X1, . . . , Xn are iid Exp(λ). Then Xis unbiased for µ = E[Xi] = 1/λ.

But be careful. . . . 1/X is biased for λ in this exponential case, i.e.,E[1/X] 6= 1/E[X] = λ. 2

ISYE 6739 — Goldsman 7/12/20 25 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2.

Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ). Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2. Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ). Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2. Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ). Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2. Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ). Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2. Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ).

Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Example/Theorem: Suppose X1, . . . , Xn are iid anything with mean µ andvariance σ2. Then S2 is always unbiased for σ2.

E[S2] = E

[1

n− 1

n∑i=1

(Xi − X)2

]= Var(Xi) = σ2.

This is why S2 is called the sample variance. 2

Baby Example: Suppose X1, . . . , Xn are iid Exp(λ). Then S2 is unbiasedfor Var(Xi) = 1/λ2. 2

ISYE 6739 — Goldsman 7/12/20 26 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =

n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =

n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =

n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =

n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

Proof (of general result): First, some algebra gives

n∑i=1

(Xi − X)2 =

n∑i=1

(X2i − 2XXi + X2)

=

n∑i=1

X2i − 2X

n∑i=1

Xi + nX2

=

n∑i=1

X2i − 2nX2 + nX2

=

n∑i=1

X2i − nX2.

So. . .

ISYE 6739 — Goldsman 7/12/20 27 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]

=1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)

=n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)

=n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

E[S2] =1

n− 1E[ n∑i=1

(Xi − X)2]

=1

n− 1E[ n∑i=1

X2i − nX2

]=

1

n− 1

( n∑i=1

E[X2i ]− nE[X2]

)=

n

n− 1

(E[X2

1 ]− E[X2])

(since the Xi’s are iid)

=n

n− 1

(Var(X1) + (E[X1])2 −Var(X)− (E[X])2

)=

n

n− 1(σ2 − σ2/n) (since E[X1] = E[X] and Var(X) = σ2/n)

= σ2. Done. 2

Remark: S is not unbiased for the standard deviation σ.

ISYE 6739 — Goldsman 7/12/20 28 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ),

i.e., the pdf isf(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ.

Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

Big Example: Suppose that X1, . . . , Xniid∼ Unif(0, θ), i.e., the pdf is

f(x) = 1/θ, for 0 < x < θ. Think of it this way: I give you a bunch ofrandom numbers between 0 and θ, and you have to guess what θ is.

We’ll look at three unbiased estimators for θ:

Y1 = 2X .

Y2 = n+1n max1≤i≤nXi.

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

If they’re all unbiased, which one’s the best?

ISYE 6739 — Goldsman 7/12/20 29 / 74

Unbiased Estimation

“Good” Estimator:

Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased):

E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator:

Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased):

E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

“Good” Estimator: Y1 = 2X .

Proof (that it’s unbiased): E[Y1] = 2E[X] = 2E[Xi] = θ. 2

“Better” Estimator: Y2 = n+1n max1≤i≤nXi.

Why might this estimator for θ make sense? (We’ll say why it’s “better” in alittle while.)

Proof (that it’s unbiased): E[Y2] = n+1n E[maxiXi] = θ iff

E[maxXi] =nθ

n+ 1(which is what we’ll show below).

ISYE 6739 — Goldsman 7/12/20 30 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n

=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n

= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

First, let’s get the cdf of M ≡ maxiXi:

P (M ≤ y) = P (X1 ≤ y and X2 ≤ y and · · · and Xn ≤ y)

= P (X1 ≤ y)P (X2 ≤ y) · · ·P (Xn ≤ y) (Xi’s indep)

= [P (X1 ≤ y)]n (Xi’s identically distributed)

=

[∫ y

0fX1(x) dx

]n=

[∫ y

0(1/θ) dx

]n= (y/θ)n.

ISYE 6739 — Goldsman 7/12/20 31 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n

=nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy

=

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

This implies that the pdf of M is

fM (y) ≡ d

dy(y/θ)n =

nyn−1

θn, 0 < y < θ,

and this implies that

E[M ] =

∫ θ

0yfM (y) dy =

∫ θ

0

nyn

θndy =

n+ 1.

Whew! This finally shows that Y2 = n+1n max1≤i≤nXi is an unbiased

estimator for θ! 2

Lastly, let’s look at. . .

ISYE 6739 — Goldsman 7/12/20 32 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2

= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

“Ugly” Estimator:

Y3 =

{12X w.p. 1/2

−8X w.p. 1/2.

Ha! It’s possible to get a negative estimate for θ, which is strange since θ > 0!

Proof (that it’s unbiased):

E[Y3] = 12E[X] · 1

2− 8E[X] · 1

2= 2E[X] = θ. 2

Usually, it’s good for an estimator to be unbiased, but the “ugly” estimator Y3

shows that unbiased estimators can sometimes be goofy.

Therefore, let’s look at some other properties an estimator can have.

ISYE 6739 — Goldsman 7/12/20 33 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2).

First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X)

=4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi)

=4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

For instance, consider the variance of an estimator.

Big Example (cont’d): Again suppose that

X1, . . . , Xniid∼ Unif(0, θ).

Recall that both Y1 = 2X and Y2 = n+1n M are unbiased for θ.

Let’s find Var(Y1) and Var(Y2). First,

Var(Y1) = 4Var(X) =4

n·Var(Xi) =

4

n· θ

2

12=

θ2

3n.

ISYE 6739 — Goldsman 7/12/20 34 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)

<θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased,

but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.

We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Unbiased Estimation

Meanwhile,

Var(Y2) =

(n+ 1

n

)2

Var(M)

=

(n+ 1

n

)2

E[M2]−(n+ 1

n· E[M ]

)2

=

(n+ 1

n

)2 ∫ θ

0

nyn+1

θndy − θ2

= θ2 · (n+ 1)2

n(n+ 2)− θ2 =

θ2

n(n+ 2)<

θ2

3n.

Thus, both Y1 and Y2 are unbiased, but Y2 has much lower variance than Y1.We can break the “unbiasedness tie” by choosing Y2. 2

ISYE 6739 — Goldsman 7/12/20 35 / 74

Mean Squared Error

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 36 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Lesson 5.6 — Mean Squared Error

We’ll now talk about a statistical performance measure that combinesinformation about the bias and the variance of an estimator.

Definition: The Mean Squared Error (MSE) of an estimator T (X) of θ is

MSE(T (X)) ≡ E[(T (X)− θ)2].

Before giving an easier interpretation of MSE, define the bias of an estimatorfor the parameter θ,

Bias(T (X)) ≡ E[T (X)]− θ.

ISYE 6739 — Goldsman 7/12/20 37 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

Theorem/Proof: Easier interpretation of MSE.

MSE(T (X)) = E[(T (X)− θ)2]

= E[T 2]− 2θE[T ] + θ2

= E[T 2]− (E[T ])2 + (E[T ])2 − 2θE[T ] + θ2

= Var(T ) + (E[T ]− θ︸ ︷︷ ︸Bias

)2.

So MSE = Bias2 + Var, and thus combines the bias and variance of anestimator. 2

ISYE 6739 — Goldsman 7/12/20 38 / 74

Mean Squared Error

The lower the MSE the better.

If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE

— even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)).

If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10,

whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14.

Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19

and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

The lower the MSE the better. If T1(X) and T2(X) are two estimators of θ,we’d usually prefer the one with the lower MSE — even if it happens to havehigher bias.

Definition: The relative efficiency of T2(X) to T1(X) isMSE(T1(X))/MSE(T2(X)). If this quantity is < 1, then we’d want T1(X).

Example: Suppose that estimator A has bias = 3 and variance = 10, whileestimator B has bias = −2 and variance = 14. Which estimator (A or B) hasthe lower mean squared error?

Solution: MSE = Bias2 + Var, so

MSE(A) = 9 + 10 = 19 and MSE(B) = 4 + 14 = 18.

Thus, B has lower MSE. 2

ISYE 6739 — Goldsman 7/12/20 39 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Mean Squared Error

Example: X1, . . . , Xniid∼ Unif(0, θ).

Two estimators: Y1 = 2X , and Y2 = n+1n maxiXi.

Showed before E[Y1] = E[Y2] = θ (so both estimators are unbiased).

Also, Var(Y1) = θ2

3n , and Var(Y2) = θ2

n(n+2) .

Thus,

MSE(Y1) =θ2

3nand MSE(Y2) =

θ2

n(n+ 2),

so Y2 is better (by an order of magnitude, actually). 2

ISYE 6739 — Goldsman 7/12/20 40 / 74

Maximum Likelihood Estimation

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 41 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x).

Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ).

The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Lesson 5.7 — Maximum Likelihood Estimation

Definition: Consider an iid random sample X1, . . . , Xn, where each Xi haspmf/pdf f(x). Further, suppose that θ is some unknown parameter from Xi.

The likelihood function is L(θ) ≡∏ni=1 f(xi).

The maximum likelihood estimator (MLE) of θ is the value of θ thatmaximizes L(θ). The MLE is a function of the Xi’s and is a RV.

Remark: We can very informally regard the MLE as the “most likely”estimate of θ.

ISYE 6739 — Goldsman 7/12/20 42 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi)

=

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi

= λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ.

Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))

= n`n(λ)− λn∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Exp(λ). Find the MLE for λ.

First of all, the likelihood function is

L(λ) =

n∏i=1

f(xi) =

n∏i=1

λe−λxi = λn exp(− λ

n∑i=1

xi

).

Now maximize L(λ) with respect to λ. Could take the derivative and plowthrough all of the horrible algebra. Too tedious. Need a trick. . . .

Useful Trick: Since the natural log function is one-to-one, it’s easy to seethat the λ that maximizes L(λ) also maximizes `n(L(λ))!

`n(L(λ)) = `n

(λn exp

(− λ

n∑i=1

xi

))= n`n(λ)− λ

n∑i=1

xi.

ISYE 6739 — Goldsman 7/12/20 43 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ))

=d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)

=n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

The trick makes our job less horrible.

d

dλ`n(L(λ)) =

d

(n`n(λ)− λ

n∑i=1

xi

)=

n

λ−

n∑i=1

xi ≡ 0.

This implies that the MLE is λ = 1/X . 2

Remarks:

λ = 1/X makes sense, since E[X] = 1/λ.

At the end, we put a little hat over λ to indicate that this is the MLE. It’slike a party hat!

At the end, we make all of the little xi’s into big Xi’s to indicate that thisis a random variable.

Just to be careful, you “probably” ought to do a second-derivative test.

ISYE 6739 — Goldsman 7/12/20 44 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p).

Find the MLE for p.Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi)

=

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi

= p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

Example: Suppose X1, . . . , Xniid∼ Bern(p). Find the MLE for p.

Useful trick for this problem: Since

Xi =

{1 w.p. p

0 w.p. 1− p,

we can write the pmf as

f(x) = px(1− p)1−x, x = 0, 1.

Thus, the likelihood function is

L(p) =

n∏i=1

f(xi) =

n∏i=1

pxi(1− p)1−xi = p∑n

i=1 xi(1− p)n−∑n

i=1 xi .

ISYE 6739 — Goldsman 7/12/20 45 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)⇒

p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =

n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)⇒

p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =

n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)⇒

p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =

n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)

⇒p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =

n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)⇒

p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Maximum Likelihood Estimation

This implies that

`n(L(p)) =

n∑i=1

xi `n(p) +(n−

n∑i=1

xi

)`n(1− p)

⇒d

dp`n(L(p)) =

∑i xip−n−

∑i xi

1− p≡ 0

(1− p)( n∑i=1

xi

)= p

(n−

n∑i=1

xi

)⇒

p = X.

This makes sense since E[X] = p. 2

ISYE 6739 — Goldsman 7/12/20 46 / 74

Trickier MLE Examples

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 47 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =n∏i=1

f(xi) =n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2).

Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =n∏i=1

f(xi) =n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =n∏i=1

f(xi) =n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi)

=n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi) =

n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}

=1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi) =

n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi) =

n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi) =

n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Lesson 5.8 — Trickier MLE Examples

Example: X1, . . . , Xniid∼ Nor(µ, σ2). Get simultaneous MLEs for µ and σ2.

L(µ, σ2) =

n∏i=1

f(xi) =

n∏i=1

1√2πσ2

exp{− 1

2

(xi − µ)2

σ2

}=

1

(2πσ2)n/2exp

{− 1

2

n∑i=1

(xi − µ)2

σ2

}.

⇒ `n(L(µ, σ2)) = −n2`n(2π)− n

2`n(σ2)− 1

2σ2

n∑i=1

(xi − µ)2

⇒ ∂

∂µ`n(L(µ, σ2)) =

1

σ2

n∑i=1

(xi − µ) ≡ 0,

and so µ = X .

ISYE 6739 — Goldsman 7/12/20 48 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2))

= − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2.

Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Similarly, take the partial with respect to σ2 (not σ),

∂σ2`n(L(µ, σ2)) = − n

2σ2+

1

2σ4

n∑i=1

(xi − µ)2 ≡ 0,

and eventually get

σ2 =1

n

n∑i=1

(Xi − X)2. 2

Remark: Notice how close σ2 is to the (unbiased) sample variance,

S2 =1

n− 1

n∑i=1

(Xi − X)2 =n

n− 1σ2.

σ2 is a little bit biased, but it has slightly less variance than S2. Anyway, as ngets big, S2 and σ2 become the same.

ISYE 6739 — Goldsman 7/12/20 49 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ).

Find the MLEs for r and λ.

L(r, λ) =n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =

n∏i=1

f(xi)

=λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =

n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =

n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =

n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

Example: The pdf of the Gamma distribution w/parameters r and λ is

f(x) =λr

Γ(r)xr−1e−λx, x > 0.

Suppose X1, . . . , Xniid∼ Gam(r, λ). Find the MLEs for r and λ.

L(r, λ) =

n∏i=1

f(xi) =λnr

[Γ(r)]n

( n∏i=1

xi

)r−1e−λ

∑i xi

⇒ `n(L) = rn `n(λ)− n `n(Γ(r)) + (r − 1)`n( n∏i=1

xi

)− λ

n∑i=1

xi

⇒ ∂

∂λ`n(L) =

rn

λ−

n∑i=1

xi ≡ 0,

so that λ = r/X .

ISYE 6739 — Goldsman 7/12/20 50 / 74

Trickier MLE Examples

The Trouble in River City is, we need to find r.

To do so, we have

∂r`n(L) =

∂r

[rn `n(λ)− n `n(Γ(r)) + (r − 1)`n

( n∏i=1

xi

)− λ

n∑i=1

xi

]= n `n(λ)− n

Γ(r)

d

drΓ(r) + `n

( n∏i=1

xi

)= n `n(λ)− nΨ(r) + `n

( n∏i=1

xi

)≡ 0,

where Ψ(r) ≡ Γ′(r)/Γ(r) is the digamma function.

ISYE 6739 — Goldsman 7/12/20 51 / 74

Trickier MLE Examples

The Trouble in River City is, we need to find r. To do so, we have

∂r`n(L) =

∂r

[rn `n(λ)− n `n(Γ(r)) + (r − 1)`n

( n∏i=1

xi

)− λ

n∑i=1

xi

]

= n `n(λ)− n

Γ(r)

d

drΓ(r) + `n

( n∏i=1

xi

)= n `n(λ)− nΨ(r) + `n

( n∏i=1

xi

)≡ 0,

where Ψ(r) ≡ Γ′(r)/Γ(r) is the digamma function.

ISYE 6739 — Goldsman 7/12/20 51 / 74

Trickier MLE Examples

The Trouble in River City is, we need to find r. To do so, we have

∂r`n(L) =

∂r

[rn `n(λ)− n `n(Γ(r)) + (r − 1)`n

( n∏i=1

xi

)− λ

n∑i=1

xi

]= n `n(λ)− n

Γ(r)

d

drΓ(r) + `n

( n∏i=1

xi

)

= n `n(λ)− nΨ(r) + `n( n∏i=1

xi

)≡ 0,

where Ψ(r) ≡ Γ′(r)/Γ(r) is the digamma function.

ISYE 6739 — Goldsman 7/12/20 51 / 74

Trickier MLE Examples

The Trouble in River City is, we need to find r. To do so, we have

∂r`n(L) =

∂r

[rn `n(λ)− n `n(Γ(r)) + (r − 1)`n

( n∏i=1

xi

)− λ

n∑i=1

xi

]= n `n(λ)− n

Γ(r)

d

drΓ(r) + `n

( n∏i=1

xi

)= n `n(λ)− nΨ(r) + `n

( n∏i=1

xi

)≡ 0,

where Ψ(r) ≡ Γ′(r)/Γ(r) is the digamma function.

ISYE 6739 — Goldsman 7/12/20 51 / 74

dg2
Highlight

Trickier MLE Examples

The Trouble in River City is, we need to find r. To do so, we have

∂r`n(L) =

∂r

[rn `n(λ)− n `n(Γ(r)) + (r − 1)`n

( n∏i=1

xi

)− λ

n∑i=1

xi

]= n `n(λ)− n

Γ(r)

d

drΓ(r) + `n

( n∏i=1

xi

)= n `n(λ)− nΨ(r) + `n

( n∏i=1

xi

)≡ 0,

where Ψ(r) ≡ Γ′(r)/Γ(r) is the digamma function.

ISYE 6739 — Goldsman 7/12/20 51 / 74

Trickier MLE Examples

At this point, substitute in λ = r/X , and use a computer method (bisection,Newton’s method, etc.) to search for the value of r that solves

n `n(r/X)− nΨ(r) + `n( n∏i=1

xi

)≡ 0.

The gamma function is readily available in any reasonable software package;but if the digamma function happens to be unavailable in your town, you cantake advantage of the approximation

Γ′(r).=

Γ(r + h)− Γ(r)

h(for any small h of your choosing). 2

ISYE 6739 — Goldsman 7/12/20 52 / 74

Trickier MLE Examples

At this point, substitute in λ = r/X , and use a computer method (bisection,Newton’s method, etc.) to search for the value of r that solves

n `n(r/X)− nΨ(r) + `n( n∏i=1

xi

)≡ 0.

The gamma function is readily available in any reasonable software package;but if the digamma function happens to be unavailable in your town, you cantake advantage of the approximation

Γ′(r).=

Γ(r + h)− Γ(r)

h(for any small h of your choosing). 2

ISYE 6739 — Goldsman 7/12/20 52 / 74

Trickier MLE Examples

At this point, substitute in λ = r/X , and use a computer method (bisection,Newton’s method, etc.) to search for the value of r that solves

n `n(r/X)− nΨ(r) + `n( n∏i=1

xi

)≡ 0.

The gamma function is readily available in any reasonable software package;but if the digamma function happens to be unavailable in your town, you cantake advantage of the approximation

Γ′(r).=

Γ(r + h)− Γ(r)

h(for any small h of your choosing). 2

ISYE 6739 — Goldsman 7/12/20 52 / 74

dg2
Highlight

Trickier MLE Examples

At this point, substitute in λ = r/X , and use a computer method (bisection,Newton’s method, etc.) to search for the value of r that solves

n `n(r/X)− nΨ(r) + `n( n∏i=1

xi

)≡ 0.

The gamma function is readily available in any reasonable software package;but if the digamma function happens to be unavailable in your town, you cantake advantage of the approximation

Γ′(r).=

Γ(r + h)− Γ(r)

h(for any small h of your choosing). 2

ISYE 6739 — Goldsman 7/12/20 52 / 74

dg2
Highlight
dg2
Highlight

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ).

Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits).

Then

L(θ) =n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

dg2
Highlight

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i.

In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Trickier MLE Examples

Example: Suppose X1, . . . , Xniid∼ Unif(0, θ). Find the MLE for θ.

The pdf is f(x) = 1/θ, 0 < x < θ, (beware of the funny limits). Then

L(θ) =

n∏i=1

f(xi) = 1/θn if 0 ≤ xi ≤ θ, ∀i

In order to have L(θ) > 0, we must have 0 ≤ xi ≤ θ, ∀i. In other words, wemust have θ ≥ maxi xi.

Subject to this constraint, L(θ) = 1/θn is maximized at the smallest possibleθ value, namely, θ = maxiXi.

This makes sense in light of the similar (unbiased) estimator,Y2 = n+1

n maxiXi, from a previous lesson. 2

Remark: We used very little calculus in this example!

ISYE 6739 — Goldsman 7/12/20 53 / 74

Invariance Property of MLEs

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 54 / 74

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

dg2
Highlight

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

dg2
Highlight

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness.

For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function.

It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

Invariance Property of MLEs

Lesson 5.9 — Invariance Property of MLEs

We can get MLEs of functions of parameters almost for free!

Theorem (Invariance Property): If θ is the MLE of some parameter θ andh(·) is any reasonable function, then h(θ) is the MLE of h(θ).

Remark: We noted before that such a property does not hold forunbiasedness. For instance, although E[S2] = σ2, it is usually the case thatE[√S2] 6= σ.

Remark: The proof of the Invariance Property is “easy” when h(·) is aone-to-one function. It’s not so easy — but still generally true — when h(·) isnastier.

ISYE 6739 — Goldsman 7/12/20 55 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2

=

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X .

Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

We saw that the MLE for σ2 is σ2 = 1n

∑ni=1(Xi − X)2.

If we consider the function h(y) = +√y, then the Invariance Property says

that the MLE of σ is

σ =

√σ2 =

√√√√ 1

n

n∑i=1

(Xi − X)2. 2

Example: Suppose X1, . . . , Xniid∼ Bern(p).

We saw that the MLE for p is p = X . Then Invariance says that the MLE forVar(Xi) = p(1− p) is p(1− p) = X(1− X). 2

ISYE 6739 — Goldsman 7/12/20 56 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x)

= 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

dg2
Highlight
dg2
Highlight

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x)

= e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Invariance Property of MLEs

Example: Suppose X1, . . . , Xniid∼ Exp(λ).

We define the survival function as

F (x) = P (X > x) = 1− F (x) = e−λx.

In addition, we saw that the MLE for λ is λ = 1/X .

Then Invariance says that the MLE of F (x) is

F (x) = e−λx = e−x/X .

This kind of thing is used all of the time in the actuarial sciences. 2

ISYE 6739 — Goldsman 7/12/20 57 / 74

Method of Moments Estimation

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 58 / 74

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables.

Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk],

i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

dg2
Highlight

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk

(so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

Method of Moments Estimation

Lesson 5.10 — Method of Moments Estimation

Recall that the kth moment of a random variable X is

µk ≡ E[Xk] =

{ ∑x x

kf(x) if X is discrete∫R x

kf(x) dx if X is continuous.

Definition: Suppose X1, . . . , Xn are iid random variables. Then themethod of moments (MoM) estimator for µk is mk ≡

∑ni=1X

ki /n.

Remark: As n→∞, the Law of Large Numbers implies that∑ni=1X

ki /n→ E[Xk], i.e., mk → µk (so this is a good estimator).

Remark: You should always love your MoM!

ISYE 6739 — Goldsman 7/12/20 59 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

dg2
Highlight

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21

=1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

dg2
Highlight

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2

=n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan:

Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk].

Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

dg2
Highlight

Method of Moments Estimation

Examples:

The MoM estimator for the true mean µ1 = µ = E[Xi] is the sample meanm1 = X =

∑ni=1Xi/n.

The MoM estimator for µ2 = E[X2i ] is m2 =

∑ni=1X

2i /n.

The MoM estimator for Var(Xi) = E[X2i ]− (E[Xi])

2 = µ2 − µ21 is

m2 −m21 =

1

n

n∑i=1

X2i − X2 =

n− 1

nS2.

(For large n, it’s also OK to use S2.)

General Game Plan: Express the parameter of interest in terms of the truemoments µk = E[Xk]. Then substitute in the sample moments mk.

ISYE 6739 — Goldsman 7/12/20 60 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi),

so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

dg2
Highlight

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Pois(λ).

Since λ = E[Xi], a MoM estimator for λ is X .

But also note that λ = Var(Xi), so another MoM estimator for λ is n−1n S2

(or plain old S2). 2

Usually use the easier-looking estimator if you have a choice.

Example: Suppose X1, . . . , Xniid∼ Nor(µ, σ2).

MoM estimators for µ and σ2 are X and n−1n S2 (or S2), respectively.

For this example, these estimators are the same as the MLEs. 2

Let’s finish up with a less-trivial example. . . .

ISYE 6739 — Goldsman 7/12/20 61 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Beta(a, b). The pdf is

f(x) =Γ(a+ b)

Γ(a)Γ(b)xa−1(1− x)b−1, 0 < x < 1.

It turns out (after lots of algebra) that

E[X] =a

a+ band Var(X) =

ab

(a+ b)2(a+ b+ 1).

Let’s estimate a and b via MoM.

ISYE 6739 — Goldsman 7/12/20 62 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Beta(a, b). The pdf is

f(x) =Γ(a+ b)

Γ(a)Γ(b)xa−1(1− x)b−1, 0 < x < 1.

It turns out (after lots of algebra) that

E[X] =a

a+ band Var(X) =

ab

(a+ b)2(a+ b+ 1).

Let’s estimate a and b via MoM.

ISYE 6739 — Goldsman 7/12/20 62 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Beta(a, b). The pdf is

f(x) =Γ(a+ b)

Γ(a)Γ(b)xa−1(1− x)b−1, 0 < x < 1.

It turns out (after lots of algebra) that

E[X] =a

a+ b

and Var(X) =ab

(a+ b)2(a+ b+ 1).

Let’s estimate a and b via MoM.

ISYE 6739 — Goldsman 7/12/20 62 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Beta(a, b). The pdf is

f(x) =Γ(a+ b)

Γ(a)Γ(b)xa−1(1− x)b−1, 0 < x < 1.

It turns out (after lots of algebra) that

E[X] =a

a+ band Var(X) =

ab

(a+ b)2(a+ b+ 1).

Let’s estimate a and b via MoM.

ISYE 6739 — Goldsman 7/12/20 62 / 74

Method of Moments Estimation

Example: Suppose X1, . . . , Xniid∼ Beta(a, b). The pdf is

f(x) =Γ(a+ b)

Γ(a)Γ(b)xa−1(1− x)b−1, 0 < x < 1.

It turns out (after lots of algebra) that

E[X] =a

a+ band Var(X) =

ab

(a+ b)2(a+ b+ 1).

Let’s estimate a and b via MoM.

ISYE 6739 — Goldsman 7/12/20 62 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒

a =bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)

=E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X],

S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X),

and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a.

Then afterlots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

We have

E[X] =a

a+ b⇒ a =

bE[X]

1− E[X]

.=

bX

1− X, (1)

so

Var(X) =ab

(a+ b)2(a+ b+ 1)=

E[X]b

(a+ b)(a+ b+ 1).

Plug into the above X for E[X], S2 for Var(X), and bX1−X for a. Then after

lots of algebra, we can solve for b:

b.=

(1− X)2X

S2− 1 + X.

To finish up, you can plug back into Equation (1) to get the MoM estimatorfor a.

ISYE 6739 — Goldsman 7/12/20 63 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971. Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971. Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971.

Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971. Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971. Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Method of Moments Estimation

Example: Consider the following data set consisting of n = 10 observationsthat we have obtained from a Beta distribution.

0.86 0.77 0.84 0.38 0.83 0.54 0.77 0.94 0.37 0.40

We immediately have X = 0.67, and S2 = 0.04971. Then the MoMestimators are

b.=

(1− X)2X

S2− 1 + X = 1.1377,

and then

a.=

bX

1− X= 2.310. 2

ISYE 6739 — Goldsman 7/12/20 64 / 74

Sampling Distributions

Outline

1 Introduction to Descriptive Statistics

2 Summarizing Data

3 Candidate Distributions

4 Introduction to Estimation

5 Unbiased Estimation

6 Mean Squared Error

7 Maximum Likelihood Estimation

8 Trickier MLE Examples

9 Invariance Property of MLEs

10 Method of Moments Estimation

11 Sampling Distributions

ISYE 6739 — Goldsman 7/12/20 65 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal:

Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”:

Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample.

The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.

The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

dg2
Highlight

Sampling Distributions

Introduction and Normal Distribution

Goal: Talk about some distributions we’ll need later to do “confidenceintervals” (CIs) and “hypothesis tests”: Normal, χ2, t, and F .

Definition: Recall that a statistic is just a function of the observationsX1, . . . , Xn from a random sample. The function does not depend explicitlyon any unknown parameters.

Example: X and S2 are statistics, but (X − µ)/σ is not.

Since statistics are RV’s, it’s useful to figure out their distributions.The distribution of a statistic is called a sampling distribution.

Example: X1, . . . , Xniid∼ Nor(µ, σ2) ⇒ X ∼ Nor(µ, σ2/n).

The normal is used to get CIs and do hypothesis tests for µ.

ISYE 6739 — Goldsman 7/12/20 66 / 74

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df),

and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

dg2
Highlight
dg2
Highlight

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

dg2
Highlight

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have.

For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant,

then you might have n− 1 df, sinceknowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

dg2
Highlight
dg2
Highlight

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

dg2
Highlight

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter.

For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

Sampling Distributions

χ2 Distribution

Definition/Theorem: If Z1, . . . , Zkiid∼ Nor(0, 1), then Y ≡

∑ki=1 Z

2i has

the chi-squared distribution with k degrees of freedom (df), and wewrite Y ∼ χ2(k).

The term “df” informally corresponds to the number of “independent piecesof information” you have. For example, if you have RV’s X1, . . . , Xn suchthat

∑ni=1Xi = c, a known constant, then you might have n− 1 df, since

knowledge of any n− 1 of the Xi’s gives you the remaining Xi.

We also informally “lose” a degree of freedom every time we have to estimatea parameter. For instance, if we have access to n observations, but have toestimate two parameters µ and σ2, then we might only end up with n− 2 df.

In reality, df corresponds to the number of dimensions of a certain space (notcovered in this course)!

ISYE 6739 — Goldsman 7/12/20 67 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

dg2
Highlight
dg2
Highlight

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.

In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right.

(You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

The pdf of the chi-squared distribution is

fY (y) =1

2k/2Γ(k2

)y k2−1e−y/2, y > 0.

Fun Facts: Can show that E[Y ] = k, and Var(Y ) = 2k.

The exponential distribution is a special case of the chi-squared distribution.In fact, χ2(2) ∼ Exp(1/2).

Proof: Just plug k = 2 into the pdf. 2

For k > 2, the χ2(k) pdf is skewed to the right. (You get an occasional“large” observation.)

For large k, the χ2(k) is approximately normal (by the CLT).

ISYE 6739 — Goldsman 7/12/20 68 / 74

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α.

Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

dg2
Highlight

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

dg2
Highlight

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα).

In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

dg2
Highlight

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

dg2
Highlight

Sampling Distributions

Definition: The (1− α) quantile of a RV X is that value xα such thatP (X > xα) = 1− F (xα) = α. Note that xα = F−1(1− α), where F−1(·)is the inverse cdf of X .

Notation: If Y ∼ χ2(k), then we denote the (1− α) quantile with thespecial symbol χ2

α,k (instead of xα). In other words, P (Y > χ2α,k) = α.

You can look up χ2α,k, e.g., in a table at the back of the book or via the Excel

function CHISQ.INV(1− α, k).

Example: If Y ∼ χ2(10), then

P (Y > χ20.05,10) = 0.05,

where we can look up χ20.05,10 = 18.31. 2

ISYE 6739 — Goldsman 7/12/20 69 / 74

dg2
Highlight
dg2
Highlight

Sampling Distributions

Theorem: χ2’s add up.

If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

dg2
Highlight

Sampling Distributions

Theorem: χ2’s add up. If Y1, . . . , Yn are independent with Yi ∼ χ2(di), forall i, then

∑ni=1 Yi ∼ χ2(

∑ni=1 di).

Proof: Just use mgf’s. Won’t go thru it here. 2

So where does the χ2 distribution come up in statistics?

It usually arises when we try to estimate σ2.

Example: If X1, . . . , Xniid∼ Nor(µ, σ2), then, as we’ll show in the next

module,

S2 =1

n− 1

n∑i=1

(Xi − X)2 ∼ σ2χ2(n− 1)

n− 1. 2

ISYE 6739 — Goldsman 7/12/20 70 / 74

dg2
Highlight
dg2
Highlight
dg2
Highlight
dg2
Highlight

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent.

Then T ≡ Z/√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom,

and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts:

The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

t Distribution

Definition/Theorem: Suppose that Z ∼ Nor(0, 1), Y ∼ χ2(k), and Z andY are independent. Then T ≡ Z/

√Y/k has the Student t distribution

with k degrees of freedom, and we write T ∼ t(k).

The pdf is

fT (x) =Γ(k+1

2

)√πk Γ

(k2

)(x2

k+ 1)− k+1

2, x ∈ R.

Fun Facts: The t(k) looks like the Nor(0,1), except the t has fatter tails.

The k = 1 case gives the Cauchy distribution, which has really fat tails.

As the degrees of freedom k becomes large, t(k)→ Nor(0, 1).

Can show that E[T ] = 0 for k > 1, and Var(T ) = kk−2 for k > 2.

ISYE 6739 — Goldsman 7/12/20 71 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.

In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then

P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05,

where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

Notation: If T ∼ t(k), then we denote the (1− α) quantile by tα,k.In other words, P (T > tα,k) = α.

Example: If T ∼ t(10), then P (T > t0.05,10) = 0.05, where we findt0.05,10 = 1.812 in the back of the book or via the Excel functionT.INV(1− α, k). 2

Remarks: So what do we use the t distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe mean µ. Stay tuned.

By the way, why did I originally call it the Student t distribution?

“Student” is the pseudonym of the guy (William Gossett) who first derived it.Gossett was a statistician at the Guinness Brewery.

ISYE 6739 — Goldsman 7/12/20 72 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent.

Then F ≡ X/nY/m = mX/(nY ) has the F distribution with

n and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

F Distribution

Definition/Theorem: Suppose that X ∼ χ2(n), Y ∼ χ2(m), and X and Yare independent. Then F ≡ X/n

Y/m = mX/(nY ) has the F distribution withn and m df, denoted F ∼ F (n,m).

The pdf is

fF (x) =Γ(n+m

2

)Γ(n2

)Γ(m2

) ( nm)n2 x

n2−1

( nmx+ 1)n+m

2

, x > 0.

Fun Facts: The F (n,m) is usually a bit skewed to the right.

Note that you have to specify two df’s.

Can show that E[F ] = mm−2 (m > 2), and Var(F ) = blech.

t distribution is a special case — can you figure out which?

ISYE 6739 — Goldsman 7/12/20 73 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.

That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m.

Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Sampling Distributions

Notation: If F ∼ F (n,m), then we denote the (1− α) quantile by Fα,n,m.That is, P (F > Fα,n,m) = α.

Tables came be found in back of the book for various α, n,m or you can usethe Excel function F.INV(1− α, n,m)

Example: If F ∼ F (5, 10), then P (F > F0.05,5,10) = 0.05, where we findF0.05,5,10 = 3.326. 2

Remarks: It can be shown that F1−α,m,n = 1/Fα,n,m. Use this fact if youhave to find something like F0.95,10,5 = 1/F0.05,5,10 = 1/3.326.

So what do we use the F distribution for in statistics?

It’s used when we find confidence intervals and conduct hypothesis tests forthe ratio of variances from two different processes. Details later.

ISYE 6739 — Goldsman 7/12/20 74 / 74

Recommended