339
Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart 1 Princeton August 31-September 4, 2020 1 These slides are heavily influenced by Matt Blackwell and Adam Glynn with contributions from Justin Grimmer and Matt Salganik. Illustrations by Shay O’Brien. Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 1 / 70

Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Soc400/500: Applied Social Statistics

Week 1: Introduction and Probability

Brandon Stewart1

Princeton

August 31-September 4, 2020

1These slides are heavily influenced by Matt Blackwell and Adam Glynn with contributionsfrom Justin Grimmer and Matt Salganik. Illustrations by Shay O’Brien.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 1 / 70

Page 2: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 3: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 4: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 5: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 6: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 2 / 70

Page 7: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 3 / 70

Page 8: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 3 / 70

Page 9: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 10: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 11: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

I

I . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 12: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.

I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 13: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statistics

I . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 14: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysis

I . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 15: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative research

I . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 16: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 17: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome and Introductions

The tale of two classes: Soc400/Soc500 Applied Social Statistics

II . . . am an Assistant Professor in Sociology.I . . . am trained in political science and statisticsI . . . do research in methods and statistical text analysisI . . . love doing collaborative researchI . . . talk very quickly

Your PreceptorsI Emily CantrellI Alejandro Schugurensky

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 4 / 70

Page 18: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 19: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 20: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 21: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 22: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 23: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Overview

Goal: train you in statistical thinking.

Fundamentally a graduate course for sociologists, but also usefulfor research in other fields, policy evaluation, industry etc.

Difficult course but with many resources to support you.

When we are done you will be able to teach yourself many things

Syllabus is a useful resource including philosophy of the class.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 5 / 70

Page 24: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 25: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 26: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 27: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 28: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 29: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 30: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Specific Goals

critically read and reason about quantitative social science usinglinear regression techniques.

conduct, interpret, and communicate results from analysis usingmultiple regression.

explain the limitations of observational data for making causalclaims and distinguish between identification and estimation.

understand the logic and assumptions of several modern designsfor making causal claims.

write clean, reusable, and reliable R code in tidyverse style.

feel empowered working with data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 6 / 70

Page 31: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why R?

It will give you super powers(but not at first)

It is free and open source

It is the de facto standard inmany applied statistical fields

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 7 / 70

Page 32: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why R?

It will give you super powers(but not at first)

It is free and open source

It is the de facto standard inmany applied statistical fields

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 7 / 70

Page 33: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why R?

It will give you super powers(but not at first)

It is free and open source

It is the de facto standard inmany applied statistical fields

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 7 / 70

Page 34: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why R?

It will give you super powers(but not at first)

It is free and open source

It is the de facto standard inmany applied statistical fields

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 7 / 70

Page 35: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why RMarkdown?

Artwork by @allison horst

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 8 / 70

Page 36: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 9 / 70

Page 37: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 9 / 70

Page 38: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 39: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 40: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.

I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 41: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.

I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 42: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.

I course focus on how to reason about statistics, not justmemorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 43: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 44: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 45: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 46: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Mathematical Prerequisites

No formal pre-requisites.

Balance of rigor and intuition.I no rigor for rigor’s sake.I we will tell you why you need the math, but also feel free to ask.I course focus on how to reason about statistics, not just

memorize guidelines.

We will teach you any math you need as we go along

Crucially though—this class is not about innate statisticalaptitude, it is about effort.

We all come from very different backgrounds. Please havepatience with yourself and with others.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 10 / 70

Page 47: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to Learn

Pre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 48: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 49: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 50: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 51: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 52: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 53: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 54: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectureslearn broad topics (4–8 videos a week, ≈2.5 hours)

Pre-Recorded Preceptlearn data analysis skills, get targeted help on assignments

Perusallan annotation platform for videos

Course Meetingscome together and discuss material

Edask questions of us and your classmates

Office Hoursask us even more questions, but (sort of) in-person

Problem Setsreinforce understanding of material, practice

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 11 / 70

Page 55: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 56: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 57: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 58: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 59: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Problem Sets

Schedule (due Friday at 5PM eastern)

Grading and solutions

Collaboration policy

You may find these difficult. Start early and seek help!

Most important part of the class

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 12 / 70

Page 60: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to Learn

Pre-Recorded Lectures

Pre-Recorded Precept

Perusall

Course Meetings

Ed

Office Hours

Problem Sets

Instructor Office Hours

Final Exam Prep

External Consulting

Individual and Group Tutoring

Your Job: work hard and get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 13 / 70

Page 61: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectures

Pre-Recorded Precept

Perusall

Course Meetings

Ed

Office Hours

Problem Sets

Instructor Office Hours

Final Exam Prep

External Consulting

Individual and Group Tutoring

Your Job: work hard and get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 13 / 70

Page 62: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectures

Pre-Recorded Precept

Perusall

Course Meetings

Ed

Office Hours

Problem Sets

Instructor Office Hours

Final Exam Prep

External Consulting

Individual and Group Tutoring

Your Job: work hard and get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 13 / 70

Page 63: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Ways to LearnPre-Recorded Lectures

Pre-Recorded Precept

Perusall

Course Meetings

Ed

Office Hours

Problem Sets

Instructor Office Hours

Final Exam Prep

External Consulting

Individual and Group Tutoring

Your Job: work hard and get help when you need it!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 13 / 70

Page 64: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Staying in Touch

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 14 / 70

Page 65: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Staying in Touch

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 14 / 70

Page 66: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Staying in Touch

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 14 / 70

Page 67: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 68: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 69: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 70: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 71: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Reading

Think of the lecture slides as primary reading.

If you want material to read, come talk to me aboutrecommendations.

Suggested Books (more in the syllabus!):I Angrist and Pischke. 2008. Mostly Harmless EconometricsI Aronow and Miller. 2019. Foundations of Agnostic StatisticsI Blitzstein and Hwang. 2019. Introduction to Probability

A somewhat obvious tip: don’t skip the math!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 15 / 70

Page 72: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 73: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 74: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 75: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 76: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 77: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 78: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advice from Prior Generations

Ask questions if you don’t know what’s going on!

Investing a considerable amount of time in getting familiar withR and its various tools will pay off in the long run!

Go over the lecture slides each week. This can be hard when youfeel like you’re treading water and just staying afloat, but I wishI had done this regularly.

It’s challenging but very doable and rewarding if you put the timein. There are plenty of resources to take advantage of for help.

I found it helpful to read through the lecture slides again after Ihad opened the problem set. It made it easier to createconnections between what we went through and how to do it.

Go over your psets and the pset solutions the moment they aregraded as a habit and figure out what you don’t know.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 16 / 70

Page 79: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 80: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 81: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 82: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 83: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 84: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Outline of Topics

Outline in reverse order:

Causal Inference:inferring counterfactual effect given association.

Regression:estimate association.

Inference:estimating things we don’t know from data.

Probability:learning what data we would expect if we did know the truth.

Probability → Inference → Regression → Causal Inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 17 / 70

Page 85: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 86: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 87: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 88: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 89: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 90: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Attribution and Thanks

My philosophy on teaching: don’t reinvent the wheelcustomize, refine, improve.

Huge thanks to those who have provided slides particularly:Matt Blackwell, Adam Glynn, Justin Grimmer, Jens Hainmueller,Erin Hartman, Kevin Quinn

Also thanks to those who have discussed with me at lengthincluding Dalton Conley, Chad Hazlett, Gary King, Kosuke Imai,Matt Salganik and Teppei Yamamoto.

Previous generations of preceptors have also been incredibleimportant: Clark Bernier, Elisha Cohen, Ian Lundberg, SimoneZhang, Alex Kindel, Ziyao Tian, Shay O’Brien.

Shay O’Brien for many hand-drawn illustrations.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 18 / 70

Page 91: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Welcome To Class!

Be sure to read the syllabus for more details.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 19 / 70

Page 92: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 20 / 70

Page 93: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 21 / 70

Page 94: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 21 / 70

Page 95: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 96: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 97: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 98: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem

2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 99: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method

3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 100: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 101: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 102: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 103: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

What is Statistics?

Branch of mathematics studying collection and analysis of data

The name statistic comes from the word state

The arc of developments in statistics

1) an applied scholar has a problem2) they solve the problem by inventing a specific method3) statisticians generalize and export the best of these methods

Relatively recent field (started at end of 19th century)

Goal: principled guesses based on stated assumptions.

In practice, an essential part of research, policy making, politicalcampaigns, selling people things. . .

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 22 / 70

Page 104: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why study probability?

It enables inference.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 23 / 70

Page 105: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why study probability?

It enables inference.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 23 / 70

Page 106: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

In Picture Form

Data generatingprocess

Probability

Inference

Observeddata

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 24 / 70

Page 107: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

In Picture Form

Data generatingprocess

Probability

Inference

Observeddata

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 24 / 70

Page 108: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

In Picture Form

Data generatingprocess

Probability

Inference

Observeddata

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 24 / 70

Page 109: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

In Picture Form

Datagenerating

processObserved data

probability

inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 25 / 70

Page 110: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?I it tells us what the world would look like under a certain

assumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 111: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?I it tells us what the world would look like under a certain

assumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 112: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.

I hypotheticals let us ask- is the observed relationship happeningby chance or is it systematic?

I it tells us what the world would look like under a certainassumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 113: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?

I it tells us what the world would look like under a certainassumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 114: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?I it tells us what the world would look like under a certain

assumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 115: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Statistical Thought Experiments

We start with probability.

Allows us to contemplate world under hypothetical scenarios.I hypotheticals let us ask- is the observed relationship happening

by chance or is it systematic?I it tells us what the world would look like under a certain

assumption.

Most of the probability material is in the first two weeks but wewill return to these ideas periodically through the semester.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 26 / 70

Page 116: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 117: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup

(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 118: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 119: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment

(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 120: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 121: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical

(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 122: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 123: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result

(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 124: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 125: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: The Lady Tasting Tea

The Story Setup(lady discerning about tea)

The Experiment(perform a taste test)

The Hypothetical(count possibilities)

The Result(boom she was right)

This became the Fisher Exact Test.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 27 / 70

Page 126: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 127: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 128: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 129: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 130: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Note on Fisher and the History of Statistics

The statistician in that story was Sir Ronald Fisher, arguably themost influential statistician of the 20th century.

Besides founding key areas of statistics, Fisher was also one ofthe founders of population genetics.

He was also a eugenicist and a racist.

Statistics has been used intermittently as a force for progressand a force against progress.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 28 / 70

Page 131: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Preview: Connecting Theory and Evidence

“[Variables] empirically perform as theoretically predicted,by displaying statistically significant

effects net of other variables in the right direction”

Lundberg, Johnson, and Stewart. Setting the Target: Precise Estimandsand the Gap Between Theory and Empirics

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 132: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Preview: Connecting Theory and Evidence

“[Variables] empirically perform as theoretically predicted,

by displaying statistically significanteffects net of other variables in the right direction”

Lundberg, Johnson, and Stewart. Setting the Target: Precise Estimandsand the Gap Between Theory and Empirics

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 133: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Preview: Connecting Theory and Evidence

“[Variables] empirically perform as theoretically predicted,by displaying statistically significant

effects net of other variables in the right direction”

Lundberg, Johnson, and Stewart. Setting the Target: Precise Estimandsand the Gap Between Theory and Empirics

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 134: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Preview: Connecting Theory and Evidence

“[Variables] empirically perform as theoretically predicted,by displaying statistically significant

effects net of other variables in the right direction”

Lundberg, Johnson, and Stewart. Setting the Target: Precise Estimandsand the Gap Between Theory and Empirics

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 135: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

The target tautology:

Research goals are defined by hypotheses about model coefficients

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 136: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

The target tautology:

Research goals are defined by hypotheses about model coefficients

The goal is only defined within the statistical model

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 137: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

The target tautology:

Research goals are defined by hypotheses about model coefficients

The goal is only defined within the statistical model

It becomes impossible to reason about other estimation strategies

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 138: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

The target tautology:

Research goals are defined by hypotheses about model coefficients

The goal is only defined within the statistical model

It becomes impossible to reason about other estimation strategies

Solution:

State the research goalseparately from the estimation strategy

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 139: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

The target tautology:

Research goals are defined by hypotheses about model coefficients

The goal is only defined within the statistical model

It becomes impossible to reason about other estimation strategies

Solution:

State the research goalseparately from the estimation strategy

Our diagnosis for the sourceof many methodological problems

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 140: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Connecting Theory and Evidence

Theory orgeneral goal

Theoreticalestimands

Empiricalestimands

Estimationstrategies

Seta specific

target

Linkto observable

data

Learnhow to estimate from

data we observe

By argument By assumption By data

Example tools: Target population,Causal contrast

Directed Acyclic Graphs,Potential outcomes

OLS regression,Machine learning

Evidence pointsto new questions

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 29 / 70

Page 141: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 142: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 143: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 144: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 145: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Statistics as a field (the good, the bad and the ugly)

The probability and inference loop

Connecting theory and evidence through estimands

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 30 / 70

Page 146: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 31 / 70

Page 147: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 32 / 70

Page 148: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

1 Course StructureOverviewWays to LearnFinal Details

2 Core IdeasWhat is Statistics?Preview: Connecting Theory and Evidence

3 Introduction to ProbabilityWhat is Probability?Sample Spaces and EventsProbability Functions

4 Three Big Ideas in ProbabilityMarginal, Joint and Conditional ProbabilityBayes’ RuleIndependence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 32 / 70

Page 149: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

From ‘Probably’ to Probability

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 33 / 70

Page 150: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 151: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 152: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 153: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 154: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Why Probability?

Helps us envision hypotheticals

Describes uncertainty in how the data is generated

Estimates probability that something will happen

Thus: we need to know how probability gives rise to data

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 34 / 70

Page 155: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 156: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 157: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 158: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 159: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 160: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 161: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Intuitive Definition of Probability

While there are several interpretations of what probability is, mostmodern (post 1935 or so) researchers agree on an axiomaticdefinition of probability.

3 Axioms (Intuitive Version):

1 The probability of any particular event must be non-negative.

2 The probability of anything occurring among all possible eventsmust be 1.

3 The probability of one of many mutually exclusive eventshappening is the sum of the individual probabilities.

All the rules of probability can be derived from these axioms.To state them formally, we first need some definitions.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 35 / 70

Page 162: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 163: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 164: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 165: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 166: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S =

{{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 167: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads},

{heads, tails}, {tails, heads}, {tails, tails}}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 168: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails},

{tails, heads}, {tails, tails}}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 169: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads},

{tails, tails}}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 170: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 171: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.

(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 172: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Sample Spaces

To define probability we need to define the set of possible outcomes.

The sample space is the set of all possible outcomes, and is oftenwritten as S.

For example, if we flip a coin twice, there are four possible outcomes,

S ={{heads, heads}, {heads, tails}, {tails, heads}, {tails, tails}

}

Thus the table in Lady Tasting Tea was defining the sample space.(Note we defined illogical guesses to be prob= 0)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 36 / 70

Page 173: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Running Visual Metaphor

Imagine that we sample one apple from a bag.Looking in the bag we see:

The sample space is:

S ==Ω { }, ,,

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 37 / 70

Page 174: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Running Visual Metaphor

Imagine that we sample one apple from a bag.

Looking in the bag we see:

The sample space is:

S ==Ω { }, ,,

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 37 / 70

Page 175: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Running Visual Metaphor

Imagine that we sample one apple from a bag.Looking in the bag we see:

The sample space is:

S ==Ω { }, ,,

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 37 / 70

Page 176: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Running Visual Metaphor

Imagine that we sample one apple from a bag.Looking in the bag we see:

The sample space is:

S ==Ω { }, ,,Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 37 / 70

Page 177: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

EventsEvents are subsets of the sample space.

For example, if

S ==Ω { }, ,,then

{ }, ,

{ }and

are both events.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 38 / 70

Page 178: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

EventsEvents are subsets of the sample space.

For example, if

S ==Ω { }, ,,

then

{ }, ,

{ }and

are both events.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 38 / 70

Page 179: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

EventsEvents are subsets of the sample space.

For example, if

S ==Ω { }, ,,then

{ }, ,

{ }and

are both events.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 38 / 70

Page 180: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Events Are a Kind of Set

Sets are collections of things, in this case collections of outcomes

One way to define an event is to describe the common property thatall of the outcomes share. We write this as

{ω|ω satisfies Property they share},

Example:

If A = {ω|ω has a leaf }:

A,A, A, A

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 39 / 70

Page 181: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Events Are a Kind of Set

Sets are collections of things, in this case collections of outcomes

One way to define an event is to describe the common property thatall of the outcomes share. We write this as

{ω|ω satisfies Property they share},

Example:

If A = {ω|ω has a leaf }:

A,A, A, A

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 39 / 70

Page 182: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Events Are a Kind of Set

Sets are collections of things, in this case collections of outcomes

One way to define an event is to describe the common property thatall of the outcomes share. We write this as

{ω|ω satisfies Property they share},

Example:

If A = {ω|ω has a leaf }:

A,A, A, A

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 39 / 70

Page 183: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Events Are a Kind of Set

Sets are collections of things, in this case collections of outcomes

One way to define an event is to describe the common property thatall of the outcomes share. We write this as

{ω|ω satisfies Property they share},

Example:

If A = {ω|ω has a leaf }:

A,A, A, A

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 39 / 70

Page 184: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Complement

A complement of event A, denoted Ac , is also a set.

Ac , is everything else not in A.

{ }, ,

{ }and

are complements.

Ac = {ω ∈ S|ω /∈ A}.

Important complement: Sc = ∅, where ∅ is the empty set.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 40 / 70

Page 185: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Complement

A complement of event A, denoted Ac , is also a set.

Ac , is everything else not in A.

{ }, ,

{ }and

are complements.

Ac = {ω ∈ S|ω /∈ A}.

Important complement: Sc = ∅, where ∅ is the empty set.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 40 / 70

Page 186: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Complement

A complement of event A, denoted Ac , is also a set.

Ac , is everything else not in A.

{ }, ,

{ }and

are complements.

Ac = {ω ∈ S|ω /∈ A}.

Important complement: Sc = ∅, where ∅ is the empty set.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 40 / 70

Page 187: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Complement

A complement of event A, denoted Ac , is also a set.

Ac , is everything else not in A.

{ }, ,

{ }and

are complements.

Ac = {ω ∈ S|ω /∈ A}.

Important complement: Sc = ∅, where ∅ is the empty set.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 40 / 70

Page 188: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Unions and Intersections

The union of two events, A and B is the event that A or B occurs:

=

{ }, ,A ∪ B = {ω|ω ∈ A or ω ∈ B}.

The intersection of two events, A and B is the event that both A andB occur:

=

{ }A ∩ B = {ω|ω ∈ A and ω ∈ B}.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 41 / 70

Page 189: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Unions and Intersections

The union of two events, A and B is the event that A or B occurs:

=

{ }, ,A ∪ B = {ω|ω ∈ A or ω ∈ B}.

The intersection of two events, A and B is the event that both A andB occur:

=

{ }A ∩ B = {ω|ω ∈ A and ω ∈ B}.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 41 / 70

Page 190: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Unions and Intersections

The union of two events, A and B is the event that A or B occurs:

=

{ }, ,A ∪ B = {ω|ω ∈ A or ω ∈ B}.

The intersection of two events, A and B is the event that both A andB occur:

=

{ }A ∩ B = {ω|ω ∈ A and ω ∈ B}.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 41 / 70

Page 191: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Operations on Events

We say that two events A and B are disjoint or mutually exclusive ifthey don’t share any elements or that A ∩ B = ∅.

An event and its complement A and Ac are by definition disjoint.

Sample spaces can have infinite events where we will often write thedifferent events using subscripts of the same letter: A1,A2, . . .A∞(e.g. imagine an event that was the count of some object)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 42 / 70

Page 192: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Operations on Events

We say that two events A and B are disjoint or mutually exclusive ifthey don’t share any elements or that A ∩ B = ∅.

An event and its complement A and Ac are by definition disjoint.

Sample spaces can have infinite events where we will often write thedifferent events using subscripts of the same letter: A1,A2, . . .A∞(e.g. imagine an event that was the count of some object)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 42 / 70

Page 193: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Operations on Events

We say that two events A and B are disjoint or mutually exclusive ifthey don’t share any elements or that A ∩ B = ∅.

An event and its complement A and Ac are by definition disjoint.

Sample spaces can have infinite events where we will often write thedifferent events using subscripts of the same letter: A1,A2, . . .A∞(e.g. imagine an event that was the count of some object)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 42 / 70

Page 194: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Operations on Events

We say that two events A and B are disjoint or mutually exclusive ifthey don’t share any elements or that A ∩ B = ∅.

An event and its complement A and Ac are by definition disjoint.

Sample spaces can have infinite events where we will often write thedifferent events using subscripts of the same letter: A1,A2, . . .A∞(e.g. imagine an event that was the count of some object)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 42 / 70

Page 195: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events.

nonnegativity

2) P(S) = 1

normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 196: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events.

nonnegativity

2) P(S) = 1

normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

1. P( ) = -.5

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 197: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1

normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

1. P( ) = -.5

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 198: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1

normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 199: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1 normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 200: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1 normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).

additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

3. P( ) = P( ) P( )+when and aremutually exclusive.

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 201: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1 normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

3. P( ) = P( ) P( )+when and aremutually exclusive.

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 202: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Probability FunctionA probability function P(·) is a function defined over all subsets of asample space S that satisfies the following three axioms:

1) P(A) ≥ 0 for all A in the setof all events. nonnegativity

2) P(S) = 1 normalization

3) if events A1,A2, . . . aremutually exclusive thenP(⋃∞

i=1 Ai) =∑∞

i=1 P(Ai).additivity

1. P( ) = -.5

2. P( ) ={ }, , , 1

3. P( ) = P( ) P( )+when and aremutually exclusive.

All the rules of probability can be derived from these axioms.(See Blitzstein & Hwang, Def 1.6.1.)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 43 / 70

Page 203: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 204: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 205: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective Interpretation

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 206: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be.

But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 207: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...

I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 208: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat you

I There is a correct way to update your beliefs given yourassumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 209: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 210: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency Interpretation

I Probability is the relative frequency with which an event wouldoccur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 211: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 212: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

A Brief Word on Interpretation

Massive debate on interpretation:

Subjective InterpretationI Example: The probability of drawing 5 red cards out of 10

drawn from a deck of cards is whatever you want it to be. But...I If you don’t follow the axioms, a bookie can beat youI There is a correct way to update your beliefs given your

assumptions about the data generating process.

Frequency InterpretationI Probability is the relative frequency with which an event would

occur if the process were repeated a large number of timesunder similar conditions.

I Example: The probability of drawing 5 red cards out of 10drawn from a deck of cards is the frequency with which thisevent occurs in repeated samples of 10 cards.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 44 / 70

Page 213: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 214: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 215: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 216: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 217: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

We Covered. . .

Events and Sample Spaces

Probability Functions and Three Axioms

Next: Three Big Ideas derived from the axioms that provide therules of working with probability.

See you next time!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 45 / 70

Page 218: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Where We’ve Been and Where We’re Going...

Last WeekI living that class-free, quarantine life

This WeekI course structureI core ideasI introduction to probabilityI three big ideas in probability

Next WeekI random variablesI joint distributions

Long RunI probability → inference → regression → causal inference

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 46 / 70

Page 219: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 220: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 221: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 222: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 223: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 47 / 70

Page 224: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 225: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A).

P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 226: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 227: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur.

This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 228: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 229: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Marginal and Joint Probability

So far we have only considered situations where we are interested inthe probability of a single event A occurring. We’ve denoted thisP(A). P(A) is sometimes called a marginal probability.

Suppose we are now in a situation where we would like to express theprobability that an event A and an event B occur. This quantity iswritten as P(A ∩ B), P(B ∩ A), P(A,B), or P(B ,A) and is the jointprobability of A and B .

P( ), = P( ) P( )=

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 48 / 70

Page 230: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

P( ), = ?

P( ) = ?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 49 / 70

Page 231: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

P( , ) = 4/10

P( ) = 7/10

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 49 / 70

Page 232: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 233: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”

If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 234: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 235: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 236: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 237: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 238: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability

The “soul of statistics”If P(A) > 0 then the probability of B conditional on A is

P(B |A) =P(A,B)

P(A)

This implies that

P(A,B) = P(A)P(B |A) = P(B)P(A|B)

Hopefully this second formulation is intuitive!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 50 / 70

Page 239: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 240: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 241: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Conditional Probability: A Visual Example

P( )| =P( ),

P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 51 / 70

Page 242: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Law of Total Probability (LTP)

With 2 Events:

P(B) = P(B ,A) + P(B ,Ac)

= P(B |A)P(A) + P(B |Ac)P(Ac)

= P( ) P( )+P( )

= P( )| x P( ) + P( )| x P( )

In general, if {Ai : i = 1, 2, 3, . . . } forms a partition of the samplespace, then

P(B) =∑i

P(B ,Ai)

=∑i

P(B |Ai)P(Ai)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 52 / 70

Page 243: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Law of Total Probability (LTP)With 2 Events:

P(B) = P(B ,A) + P(B ,Ac)

= P(B |A)P(A) + P(B |Ac)P(Ac)

= P( ) P( )+P( )

= P( )| x P( ) + P( )| x P( )

In general, if {Ai : i = 1, 2, 3, . . . } forms a partition of the samplespace, then

P(B) =∑i

P(B ,Ai)

=∑i

P(B |Ai)P(Ai)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 52 / 70

Page 244: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Law of Total Probability (LTP)With 2 Events:

P(B) = P(B ,A) + P(B ,Ac)

= P(B |A)P(A) + P(B |Ac)P(Ac)

= P( ) P( )+P( )

= P( )| x P( ) + P( )| x P( )

In general, if {Ai : i = 1, 2, 3, . . . } forms a partition of the samplespace, then

P(B) =∑i

P(B ,Ai)

=∑i

P(B |Ai)P(Ai)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 52 / 70

Page 245: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Law of Total Probability (LTP)With 2 Events:

P(B) = P(B ,A) + P(B ,Ac)

= P(B |A)P(A) + P(B |Ac)P(Ac)

= P( ) P( )+P( )

= P( )| x P( ) + P( )| x P( )

In general, if {Ai : i = 1, 2, 3, . . . } forms a partition of the samplespace, then

P(B) =∑i

P(B ,Ai)

=∑i

P(B |Ai)P(Ai)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 52 / 70

Page 246: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote).

We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 247: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 248: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 249: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 250: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 251: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not.

Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 252: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 253: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 254: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 255: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Voter MobilizationSuppose that we have put together a voter mobilization campaignand we want to know what the probability of voting is after thecampaign: P(vote). We know the following:

P(vote|mobilized) = 0.75

P(vote|not mobilized) = 0.15

P(mobilized) = 0.6 and so P(not mobilized) = 0.4

Note that mobilization partitions the data. Everyone is eithermobilized or not. Thus, we can apply the LTP:

P(vote) =P(vote|mobilized)P(mobilized)+

P(vote|not mobilized)P(not mobilized)

=0.75× 0.6 + 0.15× 0.4

=.51

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 53 / 70

Page 256: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 54 / 70

Page 257: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 54 / 70

Page 258: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 54 / 70

Page 259: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 54 / 70

Page 260: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule

Often we have information about P(B |A), but want P(A|B).

When this happens, always think: Bayes’ rule

Bayes’ rule: if P(B) > 0, then:

P(A|B) =P(B |A)P(A)

P(B)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 55 / 70

Page 261: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule

Often we have information about P(B |A), but want P(A|B).

When this happens, always think: Bayes’ rule

Bayes’ rule: if P(B) > 0, then:

P(A|B) =P(B |A)P(A)

P(B)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 55 / 70

Page 262: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule

Often we have information about P(B |A), but want P(A|B).

When this happens, always think: Bayes’ rule

Bayes’ rule: if P(B) > 0, then:

P(A|B) =P(B |A)P(A)

P(B)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 55 / 70

Page 263: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule

Often we have information about P(B |A), but want P(A|B).

When this happens, always think: Bayes’ rule

Bayes’ rule: if P(B) > 0, then:

P(A|B) =P(B |A)P(A)

P(B)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 55 / 70

Page 264: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 265: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 266: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 267: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Bayes’ Rule Mechanics

P( )| =P(

P( ))| P( )

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 56 / 70

Page 268: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 57 / 70

Page 269: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 270: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 271: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 272: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132

I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 273: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868

I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 274: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378

I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 275: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 276: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 277: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note that the Census collects information on the distribution ofnames by race.

For example, Washington is the most common last name amongAfrican-Americans in America:

I P(AfAm) = 0.132I P(not AfAm) = 1− P(AfAm) = .868I P(Washington|AfAm) = 0.00378I P(Washington|not AfAm) = 0.000061

We can now use Bayes’ Rule

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 58 / 70

Page 278: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 279: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 280: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=

P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 281: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 282: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 283: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Race and Names

Note we don’t have the probability of the name Washington.

Remember that we can calculate it from the LTP since the setsAfrican-American and not African-American partition the samplespace:

P(AfAm|Wash) =P(Wash|AfAm)P(AfAm)

P(Wash)

=P(Wash|AfAm)P(AfAm)

P(Wash|AfAm)P(AfAm) + P(Wash|not AfAm)P(not AfAm)

=0.132× 0.00378

0.132× 0.00378 + .868× 0.000061

≈ 0.9

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 59 / 70

Page 284: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 60 / 70

Page 285: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 60 / 70

Page 286: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 60 / 70

Page 287: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Three Big Ideas

Marginal, joint, and conditional probabilities

Bayes’ rule

Independence

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 60 / 70

Page 288: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Independence

Intuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 289: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive Definition

Events A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 290: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 291: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 292: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 293: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 294: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 295: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional Independence

P(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 296: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 297: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

IndependenceIntuitive DefinitionEvents A and B are independent if knowing whether A occurredprovides no information about whether B occurred.

Formal Definition

P(A,B) = P(A)P(B) =⇒ A⊥⊥B

With all the usual > 0 restrictions, this implies

P(A|B) = P(A)

P(B |A) = P(B)

Conditional IndependenceP(A,B |C ) = P(A|C )P(B |C ) =⇒ A⊥⊥B |C

Independence is a massively important concept in statistics.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 61 / 70

Page 298: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Independence, the Heroic Assumption

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 62 / 70

Page 299: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Independence, the Heroic Assumption

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 62 / 70

Page 300: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Independence, the Heroic Assumption

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 62 / 70

Page 301: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Independence, the Heroic Assumption

Deploy with Caution!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 62 / 70

Page 302: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam Filter

Suppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 303: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 304: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 305: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 306: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 307: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Advanced Example: Building a Spam FilterSuppose we have an email i , (i = 1, . . . ,N) which we represent witha series of J indicators for whether or not it contains a set of words

x i = (x1i , x2i , . . . , xJi)

We want to classify these into one of two categories: spam or not

{Cspam,Cnot}

We have a set of labeled documents Y = (Y1,Y2, . . . ,YN) whereYi ∈ {Cspam,Cnot}.

Goal: Use what we’ve learned to build a model which can classifyemails into spam and not spam.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 63 / 70

Page 308: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 309: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 310: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 311: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 312: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 313: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam FilterFor each document, we will get to see x i (the words in thedocument), and we would like to infer the category.

In other words what we want is P(Cspam|x i).

Let’s use Bayes’ Rule!

P(Cspam|x i) =P(x i |Cspam)P(Cspam)

P(x i)

=P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

We used the law of total probability to work out the bottom.

Now there are only 4 pieces we need (2 for spam, 2 for not)

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 64 / 70

Page 314: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 315: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 316: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 317: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 318: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Baseline Prevalence

Let’s plug in some estimates based on our labeled emails.

Intuitively, P(Cspam) is the probability that a randomly chosen emailwill be spam.

P(Cspam) =No. Spam Emails

No. Emails

Because ‘not spam’ is the complement of spam we know that:

P(Cnot spam) = 1− P(Cspam)

Note: this estimate is only good if our labeled emails are a randomsample of all emails! More on this in future weeks.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 65 / 70

Page 319: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language Model

Now we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 320: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 321: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?

No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 322: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 323: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 324: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 325: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Language ModelNow we need P(x i |Cspam which we call the language model becauseit represents the probability of seeing any combination of the J wordsthat we are counting from the emails.

Can we use the same strategy as before (just counting up emails)?No! Remember x i is a vector of J words, that is 2J possibilities!

We will make the heroic assumption of conditional independence:

P(x i |Cspam) =J∏

j=1

P(xij |Cspam)

Intuition: count the proportion of spam emails containing each word.

Called Naıve Bayes classifier because the conditionalindependence assumption is naıve.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 66 / 70

Page 326: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 327: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 328: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 329: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 330: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 331: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Estimating the Naıve Bayes Classifier

P(Cspam|x i ) =P(x i |Cspam)P(Cspam)

P(x i |Cspam)P(Cspam) + P(x i |Cnot spam)P(Cnot spam)

The Naıve Bayes Procedure:

Learn what spam emails look like to create a function that letsus plug in an email and get out a probability, P(x i |Cspam)

Guess how much spam there is overall, P(Cspam)

Plug in values of x i for new emails to score them by whetherthey are spam or not.

. . . Profit?

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 67 / 70

Page 332: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam Filter

This was a really advanced example (it is okay if you didn’tfollow all of it!).

Draws on all the probabilistic concepts we have introduced:I Bayes’ RuleI Law of Total ProbabilityI Conditional Independence

Shares the basic structure of many models particularly in use ofconditional independence.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 68 / 70

Page 333: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam Filter

This was a really advanced example (it is okay if you didn’tfollow all of it!).

Draws on all the probabilistic concepts we have introduced:I Bayes’ RuleI Law of Total ProbabilityI Conditional Independence

Shares the basic structure of many models particularly in use ofconditional independence.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 68 / 70

Page 334: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam Filter

This was a really advanced example (it is okay if you didn’tfollow all of it!).

Draws on all the probabilistic concepts we have introduced:I Bayes’ RuleI Law of Total ProbabilityI Conditional Independence

Shares the basic structure of many models particularly in use ofconditional independence.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 68 / 70

Page 335: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

Example: Building a Spam Filter

This was a really advanced example (it is okay if you didn’tfollow all of it!).

Draws on all the probabilistic concepts we have introduced:I Bayes’ RuleI Law of Total ProbabilityI Conditional Independence

Shares the basic structure of many models particularly in use ofconditional independence.

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 68 / 70

Page 336: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

This Week in Review

Course logistics

Core ideas in statistics

Foundations of probability

Three big probability concepts

Going Deeper:

Blitzstein, Joseph K., and Hwang, Jessica. (2019). Introductionto Probability. CRC Press. http://stat110.net/

Next week: random variables!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 69 / 70

Page 337: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

This Week in Review

Course logistics

Core ideas in statistics

Foundations of probability

Three big probability concepts

Going Deeper:

Blitzstein, Joseph K., and Hwang, Jessica. (2019). Introductionto Probability. CRC Press. http://stat110.net/

Next week: random variables!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 69 / 70

Page 338: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

This Week in Review

Course logistics

Core ideas in statistics

Foundations of probability

Three big probability concepts

Going Deeper:

Blitzstein, Joseph K., and Hwang, Jessica. (2019). Introductionto Probability. CRC Press. http://stat110.net/

Next week: random variables!

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 69 / 70

Page 339: Soc400/500: Applied Social Statistics Week 1: Introduction ...Soc400/500: Applied Social Statistics Week 1: Introduction and Probability Brandon Stewart1 Princeton August 31-September

References

Enos, Ryan D. “What the demolition of public housing teachesus about the impact of racial threat on political behavior.”American Journal of Political Science (2015).

Lundberg, Ian, Rebecca Johnson, and Brandon M. Stewart.”Setting the target: Precise estimands and the gap betweentheory and empirics.” (2020).

Salsburg, David. The Lady Tasting Tea: How StatisticsRevolutionized Science in the Twentieth Century (2002).

Stewart (Princeton) Week 1: Introduction and Probability August 31-September 4, 2020 70 / 70