315
Week 9: Regression in the Social Sciences Brandon Stewart 1 Princeton November 14 and 19, 2018 1 These slides are heavily influenced by Matt Blackwell, Justin Grimmer, Jens Hainmueller, Erin Hartman and Kosuke Imai. Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 1 / 66

Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Week 9: Regression in the Social Sciences

Brandon Stewart1

Princeton

November 14 and 19, 2018

1These slides are heavily influenced by Matt Blackwell, Justin Grimmer, JensHainmueller, Erin Hartman and Kosuke Imai.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 1 / 66

Page 2: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Where We’ve Been and Where We’re Going...

Last WeekI diagnostics

This ‘Week’I Wednesday:

F making an argument in social sciencesF causal inference

I Monday:F more causal inferenceF visualization

Next WeekI selection on observables

Long RunI probability → inference → regression → causal inference

Questions?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 2 / 66

Page 3: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 3 / 66

Page 4: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 3 / 66

Page 5: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 6: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 7: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 8: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math?

We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 9: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 10: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 11: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 12: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Why Are We Doing All of This Again?

We are all here because we are trying to do some social science, thatis, we are in the business of knowledge production.

Quantitative methods are an increasingly big part of that so whetheryou are reading or actively doing quantitative analysis it is going to bethere.

So why all the math? We are taking a future-oriented approach. Wewant to prepare you for the next big thing

Methods that became popular in the social sciences since I took theequivalent of this class: machine learning, text-as-data, Bayesiannonparametrics, design-based inference, DAG-based causal inference,deep learning

A technical foundation prepares you to learn new methods for the restof your career. Trust me now is the time to invest.

Knowing how methods work also makes you a better reader of work.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 4 / 66

Page 13: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 5 / 66

Page 14: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 15: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:

1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 16: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 17: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:

I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 18: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 19: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 20: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Quantitative Social Science

Three components of quantitative social science:1 Argument2 Research Design3 Presentation

These two classes we will focus on:I identification and causal inference (argument, design)I visualization and quantities of interest (argument, presentation)

My core argument: to have a hope of success we need to be clearabout the estimand. The implicit estimand is often (but not always)causal.

We will mostly talk about statistical methods here (it is a statistics class!)but the best work is a combination of substantive and statistical theory.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 6 / 66

Page 21: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 22: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 23: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 24: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 25: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 26: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is Causal Inference?

A causal inference is a statement about counterfactuals — it is astatement about the difference between what did and didn’t happen

The core puzzle of causal inference is how you get the informationabout what didn’t happen

The difference between prediction and causal inference is theintervention on the system under study

Like it or not, social science theories are almost always expressed ascausal claims: e.g. “an increase in X causes an increase in Y ” (orsomething more opaque meaning the same thing)

The study of causal inference helps us understand the assumptions weneed to make this kind of claim.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 7 / 66

Page 27: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 28: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 29: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 30: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 31: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 32: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 33: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification

A quantity of interest is identified when (given stated assumptions)access to infinite data would result in the estimate taking on only asingle value

For example, having all dummy variables in a linear model is notstatistically identified because they cannot be distinguished from theintercept.

Causal identification is what we can learn about a causal effect fromavailable data.

If an effect is not identified, no estimation method will recover it.

This means the relevant question is “what’s your identificationstrategy?” or what are the set of assumptions that let you claim youwill be able to estimate a causal effect from the observed data?

As we will see this is not a conversation about estimation: in otherwords, if someone answers “regression” they have made a categoryerror

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 8 / 66

Page 34: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. that the linear approximation to the CEFis good enough)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 9 / 66

Page 35: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. that the linear approximation to the CEFis good enough)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 9 / 66

Page 36: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. that the linear approximation to the CEFis good enough)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 9 / 66

Page 37: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. that the linear approximation to the CEFis good enough)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 9 / 66

Page 38: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification vs. Estimation

Identification: How much can you learn about the estimand if youhave an infinite amount of data?

Estimation: How much can you learn about the estimand from afinite sample?

Identification precedes estimation

The role of assumptions:

Often identification requires (hopefully minimal) assumptions

Even when identification is possible, estimation may imposeadditional assumptions (i.e. that the linear approximation to the CEFis good enough)

Law of Decreasing Credibility (Manski): The credibility of inferencedecreases with the strength of the assumptions maintained

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 9 / 66

Page 39: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 40: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 41: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 42: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 43: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 44: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 45: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 46: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Brief History of Potential Outcomes and Causal Inference

Introduction of potential outcomes in randomized experiments byNeyman (1923)

I Super-population inference and confidence intervals

Introduction of randomization as the “reasoned basis” for inference byFisher (1925)

I p-values and permutation inference

Causal effects defined at the unit level, allowing for effects to bedefined without a known assignment mechanism by Rubin (1974)

Potential outcomes expanded to observational studies by Rubin(1974)

Formalization of the assignment mechanism in potential outcomes byRubin (1975, 1978)

Pearl (1995) develops graphical models for causal inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 10 / 66

Page 47: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 11 / 66

Page 48: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 12 / 66

Page 49: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 12 / 66

Page 50: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:

Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 51: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 52: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 53: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 54: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 55: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential OutcomesDefinitions:Ti : Dichotomous Treatment assignment for unit i (multi-valuedtreatments–just more potential outcomes for each unit)

Ti =

{1 Unit is assigned to treatment0 Unit is not assigned to treatment

Yi : Outcome for unit i

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Potential outcome for unit i with treatmentYi (0) Potential outcome for unit i without treatment

Pre-treatment covariates Xi

τi : The treatment effect

τi = Yi (1)− Yi (0)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 13 / 66

Page 56: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential Outcomes – Aspirin Example

Definitions:

Ti : Unit assigned to:

Ti =

{1 Receive Aspirin0 Receive Placebo

(Ti = 1) (Ti = 0)

Yi : Outcome for unit i – Patient hasheadache, or not

(Yi = 1) (Yi = 0)

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Headache (or not) for unit i with AspirinYi (0) Headache (or not) for unit i with placebo

Pre-treatment covariates XiIllustrated potential outcomes here and later courtesy of Erin Hartman

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 14 / 66

Page 57: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential Outcomes – Aspirin Example

Definitions:Ti : Unit assigned to:

Ti =

{1 Receive Aspirin0 Receive Placebo

(Ti = 1)

I-2(Ti = 0)

PLACEBO

Yi : Outcome for unit i – Patient hasheadache, or not

(Yi = 1) (Yi = 0)

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Headache (or not) for unit i with AspirinYi (0) Headache (or not) for unit i with placebo

Pre-treatment covariates XiIllustrated potential outcomes here and later courtesy of Erin Hartman

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 14 / 66

Page 58: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential Outcomes – Aspirin Example

Definitions:Ti : Unit assigned to:

Ti =

{1 Receive Aspirin0 Receive Placebo

(Ti = 1)

I-2(Ti = 0)

PLACEBO

Yi : Outcome for unit i – Patient hasheadache, or not

(Yi = 1) (Yi = 0)

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Headache (or not) for unit i with AspirinYi (0) Headache (or not) for unit i with placebo

Pre-treatment covariates XiIllustrated potential outcomes here and later courtesy of Erin Hartman

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 14 / 66

Page 59: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential Outcomes – Aspirin Example

Definitions:Ti : Unit assigned to:

Ti =

{1 Receive Aspirin0 Receive Placebo

(Ti = 1)

I-2(Ti = 0)

PLACEBO

Yi : Outcome for unit i – Patient hasheadache, or not

(Yi = 1) (Yi = 0)

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Headache (or not) for unit i with AspirinYi (0) Headache (or not) for unit i with placebo

Pre-treatment covariates XiIllustrated potential outcomes here and later courtesy of Erin Hartman

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 14 / 66

Page 60: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Potential Outcomes – Aspirin Example

Definitions:Ti : Unit assigned to:

Ti =

{1 Receive Aspirin0 Receive Placebo

(Ti = 1)

I-2(Ti = 0)

PLACEBO

Yi : Outcome for unit i – Patient hasheadache, or not

(Yi = 1) (Yi = 0)

Potential outcomes for unit i :

Yi (Ti ) =

{Yi (1) Headache (or not) for unit i with AspirinYi (0) Headache (or not) for unit i with placebo

Pre-treatment covariates XiIllustrated potential outcomes here and later courtesy of Erin Hartman

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 14 / 66

Page 61: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is random in the potential outcomes framework?

Note that potential outcomes are thought of as fixed, and that they, andthe difference between them, can vary by arbitrary amounts for each uniti . There is some true distribution of potential outcomes across thepopulation.

Treatment assignment is the source of randomness

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 15 / 66

Page 62: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What is random in the potential outcomes framework?

Note that potential outcomes are thought of as fixed, and that they, andthe difference between them, can vary by arbitrary amounts for each uniti . There is some true distribution of potential outcomes across thepopulation.

Treatment assignment is the source of randomness

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 15 / 66

Page 63: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data Problem

Definition: Observed Outcome

Yi = Ti ∗ Yi (1) + (1− Ti ) ∗ Yi (0)

Inherently, since we cannot observe both treatment and control for unit i ,thus we only observe Yi , causal inference suffers from a missing dataproblem.

No methodology allows us to simultaneously observe both potentialoutcomes, Yi (1) and Yi (0), making τi unobservable–and unidentifiablewithout additional assumptions (Fundamental Problem of Causal InferenceHolland (1986))

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 16 / 66

Page 64: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data Problem

Definition: Observed Outcome

Yi = Ti ∗ Yi (1) + (1− Ti ) ∗ Yi (0)

Inherently, since we cannot observe both treatment and control for unit i ,thus we only observe Yi , causal inference suffers from a missing dataproblem.

No methodology allows us to simultaneously observe both potentialoutcomes, Yi (1) and Yi (0), making τi unobservable–and unidentifiablewithout additional assumptions (Fundamental Problem of Causal InferenceHolland (1986))

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 16 / 66

Page 65: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data Problem

Definition: Observed Outcome

Yi = Ti ∗ Yi (1) + (1− Ti ) ∗ Yi (0)

Inherently, since we cannot observe both treatment and control for unit i ,thus we only observe Yi , causal inference suffers from a missing dataproblem.

No methodology allows us to simultaneously observe both potentialoutcomes, Yi (1) and Yi (0), making τi unobservable–and unidentifiablewithout additional assumptions (Fundamental Problem of Causal InferenceHolland (1986))

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 16 / 66

Page 66: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data ProblemExample: Asprin’s Impact on Headaches

Patient Pill Headache Status Age Academici Ti Yi (

PLACEBO

)Yi (I-2

) Yi X1i X2i

1 25 Y

2 55 N

3 62 Y

4 80 N

5 32 Y

6 45 B

......

......

......

......

n 71 N

(Randomly)Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 17 / 66

Page 67: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data ProblemExample: Asprin’s Impact on Headaches

Patient Pill Headache Status Age Academici Ti Yi (

PLACEBO

)Yi (I-2

) Yi X1i X2i

1 25 Y

2 55 N

3 62 Y

4 80 N

5 32 Y

6 45 B

......

......

......

......

n 71 N

Step 1: (Randomly) sample unitsStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 17 / 66

Page 68: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data ProblemExample: Asprin’s Impact on Headaches

Patient Pill Headache Status Age Academici Ti Yi (

PLACEBO

)Yi (I-2

) Yi X1i X2i

1I-2

25 Y

2 55 N

3 62 Y

4PLACEBO

80 N

5I-2

32 Y

6 45 B

......

......

......

......

nPLACEBO

71 N

Step 2: Randomly assign treatmentStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 17 / 66

Page 69: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data ProblemExample: Asprin’s Impact on Headaches

Patient Pill Headache Status Age Academici Ti Yi (

PLACEBO

)Yi (I-2

) Yi X1i X2i

1I-2

25 Y

2 55 N

3 62 Y

4PLACEBO

80 N

5I-2

32 Y

6 45 B

......

......

......

......

nPLACEBO

71 N

Step 3: Measure revealed potential outcomeStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 17 / 66

Page 70: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference is a Missing Data ProblemExample: Asprin’s Impact on Headaches

Patient Pill Headache Status Age Academici Ti Yi (0) Yi (1) Yi X1i X2i

1 1 0 0 0 25 Y

2 0 1 55 N

3 1 1 62 Y

4 0 1 1 1 80 N

5 1 0 1 1 32 Y

6 1 0 45 N

......

......

......

......

n 0 0 0 0 71 N

(Randomly)Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 18 / 66

Page 71: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Built in Assumptions

The notation implies three assumptions:

No simultaneity

No interferenceI We are implicitly stating that the potential outcomes for that unit are

unaffected by the treatment status of other unitsI If this is not true, the number of potential outcomes for unit i growsI Ex: in an experiment with 3 units, if the potential outcomes for unit i

depend on the treatment assignment of units j and k, the potentialoutcomes for unit i are defined by Yijk :

Y100 Y000

Y110 Y010

Y101 Y001

Y111 Y011

Same version of the treatment

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 19 / 66

Page 72: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Built in Assumptions

The notation implies three assumptions:

No simultaneity

No interferenceI We are implicitly stating that the potential outcomes for that unit are

unaffected by the treatment status of other unitsI If this is not true, the number of potential outcomes for unit i growsI Ex: in an experiment with 3 units, if the potential outcomes for unit i

depend on the treatment assignment of units j and k, the potentialoutcomes for unit i are defined by Yijk :

Y100 Y000

Y110 Y010

Y101 Y001

Y111 Y011

Same version of the treatment

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 19 / 66

Page 73: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Built in Assumptions

The notation implies three assumptions:

No simultaneity

No interferenceI We are implicitly stating that the potential outcomes for that unit are

unaffected by the treatment status of other unitsI If this is not true, the number of potential outcomes for unit i growsI Ex: in an experiment with 3 units, if the potential outcomes for unit i

depend on the treatment assignment of units j and k, the potentialoutcomes for unit i are defined by Yijk :

Y100 Y000

Y110 Y010

Y101 Y001

Y111 Y011

Same version of the treatment

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 19 / 66

Page 74: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Built in Assumptions

The notation implies three assumptions:

No simultaneity

No interferenceI We are implicitly stating that the potential outcomes for that unit are

unaffected by the treatment status of other unitsI If this is not true, the number of potential outcomes for unit i growsI Ex: in an experiment with 3 units, if the potential outcomes for unit i

depend on the treatment assignment of units j and k, the potentialoutcomes for unit i are defined by Yijk :

Y100 Y000

Y110 Y010

Y101 Y001

Y111 Y011

Same version of the treatment

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 19 / 66

Page 75: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Combined, the previous assumptions give us

Stable Unit Treatment Value Assumption (SUTVA)

Potential violations:I feedback effectsI spill-over effects, carry-over effectsI different treatment administration

We also need to assume Positivity 0 < p(Ti ) < 1∀i

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 20 / 66

Page 76: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Combined, the previous assumptions give us

Stable Unit Treatment Value Assumption (SUTVA)

Potential violations:I feedback effectsI spill-over effects, carry-over effectsI different treatment administration

We also need to assume Positivity 0 < p(Ti ) < 1∀i

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 20 / 66

Page 77: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Combined, the previous assumptions give us

Stable Unit Treatment Value Assumption (SUTVA)

Potential violations:I feedback effectsI spill-over effects, carry-over effectsI different treatment administration

We also need to assume Positivity 0 < p(Ti ) < 1∀i

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 20 / 66

Page 78: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Combined, the previous assumptions give us

Stable Unit Treatment Value Assumption (SUTVA)

Potential violations:I feedback effectsI spill-over effects, carry-over effectsI different treatment administration

We also need to assume Positivity 0 < p(Ti ) < 1∀i

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 20 / 66

Page 79: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification by assumption

Homogeneity

If the potential outcomes are constant across all i (i.e. Yi (1) andYi (0) are the same for all individuals), then cross-sectionalcomparisons of treatment and control groups will recover τi

If the potential outcomes are constant across time (i.e. Yi (1) andYi (0) are time-invariant for all individuals), then pre-/ post-treatmentcomparisons will recover τi

This is highly implausible in most social science

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 21 / 66

Page 80: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification by assumption

Homogeneity

If the potential outcomes are constant across all i (i.e. Yi (1) andYi (0) are the same for all individuals), then cross-sectionalcomparisons of treatment and control groups will recover τi

If the potential outcomes are constant across time (i.e. Yi (1) andYi (0) are time-invariant for all individuals), then pre-/ post-treatmentcomparisons will recover τi

This is highly implausible in most social science

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 21 / 66

Page 81: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification by assumption

Homogeneity

If the potential outcomes are constant across all i (i.e. Yi (1) andYi (0) are the same for all individuals), then cross-sectionalcomparisons of treatment and control groups will recover τi

If the potential outcomes are constant across time (i.e. Yi (1) andYi (0) are time-invariant for all individuals), then pre-/ post-treatmentcomparisons will recover τi

This is highly implausible in most social science

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 21 / 66

Page 82: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Identification by assumption

Homogeneity

If the potential outcomes are constant across all i (i.e. Yi (1) andYi (0) are the same for all individuals), then cross-sectionalcomparisons of treatment and control groups will recover τi

If the potential outcomes are constant across time (i.e. Yi (1) andYi (0) are time-invariant for all individuals), then pre-/ post-treatmentcomparisons will recover τi

This is highly implausible in most social science

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 21 / 66

Page 83: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 84: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 85: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 86: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 87: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 88: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?Identification by randomization:

If treatment is randomized, then treatment is unrelated to any and allunderlying characteristics, observed and unobserved

Randomization therefore means treatment assignment is independentof the potential outcomes Yi (1) and Yi (0), i.e.

{Yi (0),Yi (1)}⊥⊥Ti

This is sometimes called unconfoundedness or ignorability

Another way of thinking of it: The distributions of the potentialoutcomes (Yi (1), Yi (0)) are the same for the treatment and controlgroup.

Yet another way of thinking of it: The treatment and control groupare exchangeable, or balanced (on observables and unobservables) onaverage

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 22 / 66

Page 89: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Identification by conditional independence:

If treatment is not randomized, then treatment may be relatedunderlying characteristics, observed and unobserved, which are relatedto the potential outcomes

Therefore, we need to assume that treatment assignment isindependent of the potential outcomes Yi (1) and Yi (0), conditionalon some pre-treatment characteristics X , i.e.

{Yi (0),Yi (1)}⊥⊥Ti | Xi

Conditioning set should yield Yi (0),Yi (1) and Ti conditionallyindependent

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 23 / 66

Page 90: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Identification by conditional independence:

If treatment is not randomized, then treatment may be relatedunderlying characteristics, observed and unobserved, which are relatedto the potential outcomes

Therefore, we need to assume that treatment assignment isindependent of the potential outcomes Yi (1) and Yi (0), conditionalon some pre-treatment characteristics X , i.e.

{Yi (0),Yi (1)}⊥⊥Ti | Xi

Conditioning set should yield Yi (0),Yi (1) and Ti conditionallyindependent

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 23 / 66

Page 91: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

How do we proceed?

Identification by conditional independence:

If treatment is not randomized, then treatment may be relatedunderlying characteristics, observed and unobserved, which are relatedto the potential outcomes

Therefore, we need to assume that treatment assignment isindependent of the potential outcomes Yi (1) and Yi (0), conditionalon some pre-treatment characteristics X , i.e.

{Yi (0),Yi (1)}⊥⊥Ti | Xi

Conditioning set should yield Yi (0),Yi (1) and Ti conditionallyindependent

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 23 / 66

Page 92: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Some Estimands of Interest

Sample average treatment effect (SATE)1n

∑ni=1(Yi (1)− Yi (0))

Population average treatment effect (PATE)1N

∑Ni=1(Yi (1)− Yi (0))

Population average treatment effect for the treated (PATT)E(Yi (1)− Yi (0) | Ti = 1)

Population conditional average treatment effect (CATE)E(Yi (1)− Yi (0) | Xi = x)

Treatment effect heterogeneity: Zero ATE doesn’t mean zero effectfor everyone

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 24 / 66

Page 93: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Some Estimands of Interest

Sample average treatment effect (SATE)1n

∑ni=1(Yi (1)− Yi (0))

Population average treatment effect (PATE)1N

∑Ni=1(Yi (1)− Yi (0))

Population average treatment effect for the treated (PATT)E(Yi (1)− Yi (0) | Ti = 1)

Population conditional average treatment effect (CATE)E(Yi (1)− Yi (0) | Xi = x)

Treatment effect heterogeneity: Zero ATE doesn’t mean zero effectfor everyone

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 24 / 66

Page 94: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Some Estimands of Interest

Sample average treatment effect (SATE)1n

∑ni=1(Yi (1)− Yi (0))

Population average treatment effect (PATE)1N

∑Ni=1(Yi (1)− Yi (0))

Population average treatment effect for the treated (PATT)E(Yi (1)− Yi (0) | Ti = 1)

Population conditional average treatment effect (CATE)E(Yi (1)− Yi (0) | Xi = x)

Treatment effect heterogeneity: Zero ATE doesn’t mean zero effectfor everyone

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 24 / 66

Page 95: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Some Estimands of Interest

Sample average treatment effect (SATE)1n

∑ni=1(Yi (1)− Yi (0))

Population average treatment effect (PATE)1N

∑Ni=1(Yi (1)− Yi (0))

Population average treatment effect for the treated (PATT)E(Yi (1)− Yi (0) | Ti = 1)

Population conditional average treatment effect (CATE)E(Yi (1)− Yi (0) | Xi = x)

Treatment effect heterogeneity: Zero ATE doesn’t mean zero effectfor everyone

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 24 / 66

Page 96: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Some Estimands of Interest

Sample average treatment effect (SATE)1n

∑ni=1(Yi (1)− Yi (0))

Population average treatment effect (PATE)1N

∑Ni=1(Yi (1)− Yi (0))

Population average treatment effect for the treated (PATT)E(Yi (1)− Yi (0) | Ti = 1)

Population conditional average treatment effect (CATE)E(Yi (1)− Yi (0) | Xi = x)

Treatment effect heterogeneity: Zero ATE doesn’t mean zero effectfor everyone

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 24 / 66

Page 97: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Three Big Assumptions

1 SUTVA

2 Positivity

3 (Conditional) Ignorability

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 25 / 66

Page 98: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 99: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 100: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 101: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 102: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 103: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 104: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 105: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 106: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Selection Problem

Why is this difficult? selection bias

The core idea is that the people who get treatment might lookdifferent from those who get control and thus they are not goodcounterfactuals for each other.

Let’s look at what we get from a naive difference in means with abinary treatment:

E [Yi |Di = 1]− E [Yi |Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)|Di = 1]− E [Yi (0)|Di = 1] + E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]

= E [Yi (1)− Yi (0)|Di = 1]︸ ︷︷ ︸Average Treatment Effect on Treated

+E [Yi (0)|Di = 1]− E [Yi (0)|Di = 0]︸ ︷︷ ︸selection bias

Naive estimator = Average Treatment Effect on Treated + SelectionBias

Selection bias: how different the treated and control groups are interms of their potential outcome under control.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 26 / 66

Page 107: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 27 / 66

Page 108: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 27 / 66

Page 109: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 27 / 66

Page 110: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 27 / 66

Page 111: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Makes Us Care About Assignment Mechanisms

Assignment Mechanism

“The process that determines which units receive which treatments, hence whichpotential outcomes are realized and thus can be observed, and, conversely, whichpotential outcomes are missing.”(Imbens and Rubin, 2015, p. 31)

Key Assumptions:

Individualistic assignment: Limits the dependence of a particularunit’s assignment probability on the values of the covariates andpotential outcomes for other units

Probabilistic assignment: Requires the assignment mechanism toimply a non-zero probability for each treatment value, for every unit

Unconfounded assignment: Disallows dependence of the assignmentmechanism on the potential outcomes

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 27 / 66

Page 112: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 28 / 66

Page 113: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 28 / 66

Page 114: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 115: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditions

What is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 116: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?

Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 117: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 118: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 119: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 120: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 121: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 122: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 123: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 124: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

What Gets to Be a Cause?

We can imagine a world where individual i is assigned to treatment andcontrol conditionsWhat is the Hypothetical Experiment?Problem: Immutable (or difficult to change) characteristics

- Effect of gender on promotion

- Effect of race on income

Consider causal effect of gender on promotion:

- Do we mean gender reassignment surgery?

- Do we mean randomly assigning at birth? (a lot of other stuffdifferent)

- one idea: manipulate perceptions–women evaluated differently onpaper

No Causation Without Manipulation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 29 / 66

Page 125: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 126: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 127: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society

- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 128: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims

- What facet of institutionalized racism (or its consequences) causesracial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 129: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 130: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 131: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 132: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 133: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 134: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment

- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 135: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Caveats and Implications

- Does not dismiss claims of discrimination on immutablecharacteristics as legitimate

- Pervasive effects of racism/sexism in society- Suggests: we need a different empirical strategy to evaluate claims- What facet of institutionalized racism (or its consequences) causes

racial disparities?

- Correlation problem (1) :

- Regression models can estimate coefficients for immutablecharacteristics

- But are necessarily imprecise: what do scholars have in mind in models?

- Design Principle:

- Pretend you’re God designing experiment- If that experiment does not exist, be concerned about interpretation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 30 / 66

Page 136: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 137: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:

Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 138: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)

Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 139: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 140: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 141: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 142: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 143: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effects

Move the goal posts:Focus on estimating Average Treatment Effect (ATE)Suppose we have N observations in population (i = 1, . . . ,N)

ATE =1

N

N∑i=1

(Yi (1)− Yi (0))

= E [Y (1)− Y (0)] Average over population!!!

- Population parameter

- It is fixed and unchanging

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 31 / 66

Page 144: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 145: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 146: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 147: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 148: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 149: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Estimating ATE under Random Assignment

Estimator for ATE:

ATE = Average (Treated Units)− Average (Control Units)

=

∑Ni=1 Yi (1)Ti∑N

i=1 Ti

−∑N

i=1 Yi (0)(1− Ti )∑Ni=1(1− Ti )

=N∑i=1

[Yi (1)Ti

nt− Yi (0)(1− Ti )

nc]

= E [Y (1)|T = 1]− E [Y (0)|T = 0]

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 32 / 66

Page 150: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 33 / 66

Page 151: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 33 / 66

Page 152: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATE?

E [Y1i − Y0i ] = 1/4× (6 +−1 + 0 + 3) = 2

Note: Average effect is positive, but τi are negative for some units!

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 33 / 66

Page 153: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 34 / 66

Page 154: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 34 / 66

Page 155: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Average Treatment Effect on the Treated

Imagine a study population with 4 units:

i Di Y1i Y0i τi1 1 10 4 62 1 1 2 -13 0 3 3 04 0 5 2 3

What is the ATT and ATC?

E [Y1i − Y0i |Di = 1] = 1/2× (6 +−1) = 2.5

E [Y1i − Y0i |Di = 0] = 1/2× (0 + 3) = 1.5

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 34 / 66

Page 156: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 35 / 66

Page 157: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 35 / 66

Page 158: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 35 / 66

Page 159: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 35 / 66

Page 160: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Naive Comparison: Difference in Means

Comparisons between observed outcomes of treated and control units canoften be misleading.

units which select treatment may not be like units which selectcontrol.

i.e. selection into treatment is often associated with the potentialoutcomes

this means we have violated the assumption of unconfoundness(Y (1),Y (0))⊥D

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 35 / 66

Page 161: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 162: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 163: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 164: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 165: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 166: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 167: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Selection Bias

Example: Church Attendance and Political Participation

Church goers likely to differ from non-Church goers on a range ofbackground characteristics (e.g. civic duty)

Given these differences, turnout for churchgoers could be higher thanfor non-churchgoers even if church had zero mobilizing effect

Example: Gender Quotas and Redistribution Towards Women

Countries with gender quotas are likely countries where women arepolitically mobilized.

Given this difference, policies targeted towards women would be morecommon in quota countries even if these countries had not adoptedquotas.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 36 / 66

Page 168: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 169: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 170: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 171: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 172: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 173: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Assignment Mechanism

Since missing potential outcomes are unobservable we must makeassumptions to fill in, i.e. estimate missing potential outcomes.

In the causal inference literature, we typically make assumptions about theassignment mechanism to do so.

Types of Assignment Mechanisms

random assignment

selection on observables

selection on unobservables

Most statistical models of causal inference attain identification of treatmenteffects by restricting the assignment mechanism in some way.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 37 / 66

Page 174: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No causation without manipulation?

Always ask:what is the experiment I would run if I had infinite resources and power?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 38 / 66

Page 175: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No causation without manipulation?

Always ask:what is the experiment I would run if I had infinite resources and power?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 38 / 66

Page 176: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Causal Inference WorkflowCausal Inference Workflow

Data Estimator Estimate

Identification Strategy

Quantity of Interest

Ideal Experiment

Causal Relationship

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 39 / 66

Page 177: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 178: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 179: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 180: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 181: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 182: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summing Up: Neyman-Rubin causal model

Useful for studying the “effects of causes”, less so for the “causes of effects”.

No assumption of homogeneity, allows for causal effects to vary unit by unit

I No single “causal effect”, thus the need to be precise about the targetestimand.

Distinguishes between observed outcomes and potential outcomes.

Causal inference is a missing data problem: we typically make assumptionsabout the assignment mechanism to go from descriptive inference to causalinference.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 40 / 66

Page 183: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 184: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 185: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 186: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 187: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 188: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]

- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 189: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study

- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 190: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 191: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 192: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Summary: Observational Studies and Causal Inference

Experimental studies:

- Treatment under control of analyst

Observational

- Units (people, countries) control their treatment status

- Selection: treatment and control groups differ systematically

- E [Y (1)|T = 1] 6= E [Y (1)|T = 0], E [Y (0)|T = 0] 6= E [Y (0)|T = 1]- Observables: things we can see, measure, and use in our study- Unobservables: not observables (big problem)

- Naive difference in means will be biased

- Many, many, potential strategies for limiting bias

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 41 / 66

Page 193: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 194: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 195: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 196: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 197: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 198: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 199: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Avoiding Common Areas of Confusion

‘a causal claim is a statement about what didnt happen’

contribution not attribution: we care about a difference which doesntmake it the main reason, nor does it imply a morality claim, it doesntmake T the reason it happened, it doesnt mean that T is responsiblefor Y

T can ‘cause’ Y if it is neither necessary nor sufficient

If you know that on average A causes B and B causes C this doesntmean you know that A causes C (example A→B for one subgroup,B→C for second subgroup, still no A→C)

estimation of causal effects does not require identical treatment andcontrol groups

you need a clear counterfactual to have a well-defined causal effect(hence no causation without manipulation). For example of ‘therecession was caused by Wall Street’ may make intuitive sense but isit well-defined?

http://egap.org/methods-guides/10-things-you-need-know-about-causal-inference

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 42 / 66

Page 200: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 43 / 66

Page 201: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 43 / 66

Page 202: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 203: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 204: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 205: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 206: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 207: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Graphical Models

A general framework for representing causal relationships based ondirected acyclic graphs (DAG)

The work we discuss here comes out of developments by Judea Pearland others

Particularly useful for thinking through issues of identification.

Provides a graphical representation of the models and a set of rules(do-calculus) for identifying the causal effect.

Nice software that takes the graph and returns an identificationstrategy: DAGitty at http://dagitty.net

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 44 / 66

Page 208: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 209: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 210: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 211: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 212: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 213: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 214: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Components of a DAG

T

X

Z

U

Y

M

nodes → variables(unobserved typically called U or V)

(directed) arrows → causal effects

absence of nodes → no common causesof any pair of variables

absence of arrows → no causal effect

positioning conveys no mathematicalmeaning but often is orientedleft-to-right with causal ordering forreadability.

dashed lines are used in contextdependent ways

all relationships are non-parametric

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 45 / 66

Page 215: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 216: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 217: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 218: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 219: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 220: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Relationships in a DAG

T

X

Z

U

Y

M

Parents (Children): directly causing(caused by) a node

Ancestors (Descendants): directly orindirectly causing (caused by) a node

Path: a route that connects thevariables (path is causal when all arrowspoint the same way)

Acyclic implies that there are no cyclesand a variable can’t cause itself

Causal Markov assumption: conditionon its direct causes, a variable isindependent of its non-descendents.

We will talk in depth about two types ofrelationships: confounders and colliders

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 46 / 66

Page 221: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 47 / 66

Page 222: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 47 / 66

Page 223: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 47 / 66

Page 224: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Confounders

X

T Y

X is a confounder (or common cause)

Even without a causal effect or directed edge between T and Y theywill have a marginal associational relationship

Conditional on X , T and Y are unrelated in this graph.

We can think of conditioning on a confounder as blocking the flow ofassociation.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 47 / 66

Page 225: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 48 / 66

Page 226: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 48 / 66

Page 227: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders

X

T Y

X is now a collider because two arrows point into it

In this scenario T and Y are not marginally associated

If we control for X they become associated and create a connectionbetween T and Y

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 48 / 66

Page 228: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders are scary because you can induce dependence

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 49 / 66

Page 229: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 230: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and Z

We can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 231: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door path

A back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 232: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)

Two paths from T to Y here:1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 233: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 234: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on Y

We want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 235: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

From Confounders to Back-Door Paths

X

T

Z

Y

Identify causal effect of T on Y by conditioning on X , Z or X and ZWe can formalize this logic with the idea of a back-door pathA back-door path is “a path between any causally ordered sequenceof two variables that begins with a directed edge that points to thefirst variable.” (Morgan and Winship 2013)Two paths from T to Y here:

1 T → Y (directed or causal path)2 T ← X → Z → Y (back-door path)

Observed marginal association between T and Y is a composite ofthese two paths and thus does not identify the causal effect of T on YWe want to block the back-door path to leave only the causal effect

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 50 / 66

Page 236: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 51 / 66

Page 237: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 51 / 66

Page 238: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 51 / 66

Page 239: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Colliders and Back-Door Paths

Z

YTV

U

Z is a collider and it lies along a back-doorpath from T to Y

Conditioning on a collider on a back-doorpath does not help and in fact causes newassociations

Here we are fine unless we condition on Zwhich opens a path T ← V ↔ U → Y(this particular case is called M-bias)

So how do we know which back-door pathsto block?

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 51 / 66

Page 240: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 241: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 242: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 243: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 244: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 245: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

D-Separation

Graphs provide us a way to think about conditional independencestatements. Consider disjoint subsets of the vertices A, B and C

A is D-separated from B by C if and only if C blocks every path froma vertex in A to a vertex in B

A path p is said to be blocked by a set of vertices C if and only if atleast one of the following conditions holds:

1 p contains a chain structure a→ c → b or a fork structure a← c → bwhere the node c is in the set C

2 p contains a collider structure a→ y ← b where neither y nor itsdescendents are in C

If A is not D-separated from B by C we say that A is D-connected toB by C

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 52 / 66

Page 246: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 247: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)

2 X D-separates every path between T and Y that has an incomingarrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 248: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 249: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 250: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 251: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Backdoor Criterion

Generally we want to know if we can nonparametrically identify theaverage effect of T on Y given a set of possible conditioning variablesX

Backdoor Criterion for X1 No node in X is a descendent of T

(i.e. don’t condition on post-treatment variables!)2 X D-separates every path between T and Y that has an incoming

arrow into T (backdoor path)

In essence, we are trying to block all non-causal paths, so we canestimate the causal path.

Backdoor criterion is just one way to identify the effect: but its themost popular approach in the social sciences and what we are tryingto do 99% of the time.

We will see some other approaches late in the semester.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 53 / 66

Page 252: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 253: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 254: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 255: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 256: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 257: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal process

Note that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 258: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 259: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 260: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Thoughts on DAGs and Potential Outcomes

Two very different languages for talking about and thinking aboutcausal inferences

Potential outcomes is very focused on thinking about the treatmentassignment mechanism.

Potential outcomes is also less of a “foreign language” for moststatisticians, but in my experience lumps together a lot ofidentification assumptions in opaque ignorability conditions.

Graphical Models with DAGs are very visually appealing but theoperations on the graph can be challenging

DAGs very helpful for thinking through identification and the entirecausal processNote that both are about non-parametric identification and notestimation. This is good and bad.

I Good: provides a very general framework that applies in non-linearscenarios and interactions

I Bad: identification results for identification only holds when variable iscompletely controlled for (which may be difficult!)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 54 / 66

Page 261: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Fun with a Bundle of Sticks

Sen and Wasow (2016) “Race as a Bundle of Sticks:Designs that Estimate Effects of Seemingly ImmutableCharacteristics” Annual Review of Political Science.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 55 / 66

Page 262: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 56 / 66

Page 263: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 56 / 66

Page 264: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 56 / 66

Page 265: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

No Causation Without Manipulation

One of the difficulties that students have with causal inference is theneed for manipulation or an ideal experiment.

In many areas the key variables are immutable such as race or gender

Sen and Wasow argue that we can improve our empirical work on thisby seeing race/ethnicity as a composite variable or ‘a bundle of sticks’which can be manipulated separately

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 56 / 66

Page 266: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 267: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 268: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulated

I without the capacity to manipulate the question is arguably ill-posedand the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 269: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 270: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatment

I everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 271: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfying

I this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 272: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 273: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstable

I there is substantial variance across treatments which is a SUTVAviolation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 274: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Trouble with Race As Treatment

There are three problems with race as a treatment in the causal inferencesense

1 Race cannot be manipulatedI without the capacity to manipulate the question is arguably ill-posed

and the estimand is unidentified

2 Everything else is post-treatmentI everything else comes after race which is perhaps unsatisfyingI this also presumes we are only interested in the total effect

3 Race is unstableI there is substantial variance across treatments which is a SUTVA

violation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 57 / 66

Page 275: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Bundle of Sticks

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 58 / 66

Page 276: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

The Bundle of Sticks

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 58 / 66

Page 277: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 278: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”

b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 279: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”

c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 280: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 281: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)

I Audit/Correspondence Studies (Pager 2003, Bertrand andMullainathan 2004)

I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 282: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)

I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 283: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)

I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 284: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)

I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 285: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 1: Exposure Studies

Approach

a) “one or more elements of race is identified as a relevant cue”b) “subjects are treated by exposure to the racial cue”c) “unit of analysis is the individual or institution being exposed”

ExamplesI Psychology (Steele 1997 on stereotype threat)I Audit/Correspondence Studies (Pager 2003, Bertrand and

Mullainathan 2004)I Survey Experiments with Racial Cues (Mendelberg 2001)I Field Experiments with Racial Cues (Green 2004, Enos 2011)I Observational Studies (Greiner and Rubin 2010, Wasow 2012)

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 59 / 66

Page 286: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 2: Within-Group Studies

Approach: identify variation within the racial group along constitutiveelement.

Example: Sharkey (2010) exploiting temporal variation in localhomicides in Chicago to identify a significant neighborhood effect ofproximity to violence on cognitive performance of African-Americanchildren

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 60 / 66

Page 287: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Design 2: Within-Group Studies

Approach: identify variation within the racial group along constitutiveelement.

Example: Sharkey (2010) exploiting temporal variation in localhomicides in Chicago to identify a significant neighborhood effect ofproximity to violence on cognitive performance of African-Americanchildren

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 60 / 66

Page 288: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Concluding Thoughts

We can study race with causal inference, it just takes very careful design.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 61 / 66

Page 289: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 62 / 66

Page 290: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

1 Making Arguments

2 Potential Outcomes

3 Average Treatment effects

4 Graphical Models

5 Fun With A Bundle of Sticks

6 Visualization

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 62 / 66

Page 291: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

An Intro Motivation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 63 / 66

Page 292: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

An Intro Motivation

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 63 / 66

Page 293: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 294: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 295: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 296: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposes

I drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 297: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/dataset

I presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 298: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidence

I exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 299: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 300: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved

1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 301: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal

2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 302: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest

3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 303: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 304: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 305: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Visualization

Visualization is hard but ultimately extremely important

It is absurd that we spend months collecting data, weeks analyzing itand five minutes slapping it into an unreadable table.

Visualization can be used for many purposesI drawing people into a topic/datasetI presenting evidenceI exploration/model checking

Three steps involved1 clearly define the goal2 estimate quantities of interest3 present those quantities in a compelling way

Good design involves thinking carefully about the audience

I strongly recommend Kieran Healy’s new visualization book — greatsummary of the fundamentals plus R code.

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 64 / 66

Page 306: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

●●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●●●●●●●

●●

●●●

Source: American Time Use Survey

Pooling years 2003−2017 for

people of all ages. Unweighted.

N = 303 on Christmas

0

50

100

Dec 01 Dec 15 Jan 01 Jan 15 Feb 01

Date

Min

utes

spe

nt a

wak

e w

ith p

aren

ts

People spend more time with parents on Christmas

Source: Ian Lundberg

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 307: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

●●●

●●

● ●

●● ●

● ●●

Management occupations

Business and financialoperations occupations

Computer and mathematicalscience occupations

Architecture andengineering occupations

Life, physical, andsocial science occupations

Community and socialservice occupations

Legal occupations

Education, training,and library occupations

Arts, design,entertainment, sports,and media occupations

Healthcare practitionerand technical occupations

Healthcare supportoccupations

Protectiveservice

occupations

Food preparationand serving

related occupations

Building and grounds cleaningand maintenance occupations

Personal care andservice occupations

Sales and relatedoccupations

Office and administrativesupport occupations Farming, fishing, and

forestry occupations

Construction and extractionoccupations

Installation, maintenance,and repair occupations

Productionoccupations

Transportation and materialmoving occupations

●●●●

●●

● ●

● ● ●● ●● ●● ●

Management occupations

Business and financialoperations occupations

Computer and mathematicalscience occupations

Architecture andengineering occupations

Life, physical, andsocial science occupations

Community and socialservice occupations

Legal occupations

Education, training,and library occupations

Arts, design,entertainment, sports,and media occupations

Healthcare practitionerand technical occupations

Healthcare supportoccupations

Protectiveservice

occupations

Food preparationand serving

related occupations

Building and grounds cleaningand maintenance occupationsPersonal care and

service occupations

Sales and relatedoccupations

Office and administrativesupport occupations

Farming, fishing, andforestry occupations

Construction and extractionoccupations

Installation, maintenance,and repair occupations

Productionoccupations

Transportation and materialmoving occupations

Weekday time with kids Weekend day time with kids

0 1 2 3 4 0 1 2 3 4

0

2

4

6

Mean occupational weekend day hours

Hou

rs w

ith o

wn

child

ren

unde

r 18

Fathers in occupation ● ● ● ●500 1000 1500 2000

Source: Ian Lundberg

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 308: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: New York TimesStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 309: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: New York TimesStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 310: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: New York TimesStewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 311: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: Olivia Walch

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 312: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: The Pudding

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 313: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: Kieran Healy

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 314: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Examples

Source: Kieran Healy

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 65 / 66

Page 315: Week 9: Regression in the Social Sciences...Week 9: Regression in the Social Sciences Brandon Stewart1 Princeton November 14 and 19, 2018 1These slides are heavily in uenced by Matt

Reading

Angrist and Pishke Chapter 2 (The Experimental Ideal) Chapter 3(Regression and Causality)

Morgan and Winship Chapters 3-4 (Causal Graphs and ConditioningEstimators)

Hernan and Robins Chapter 3 Observational Studies

Optional: Hernan (2018) “The C-word: Scientific euphemisms do notimprove causal inference from observational data” American Journalof Public Health.

Optional: Elwert and Winship (2014) “Endogenous selection bias:The problem of conditioning on a collider variable” Annual Review ofSociology

Optional: Morgan and Winship Chapter 11 Repeated Observationsand the Estimation of Causal Effects

Stewart (Princeton) Week 9: Regression in the Social Sciences November 14 and 19, 2018 66 / 66