38
(AN INTRODUCTION) REGRESSION ANALYSIS FOR DATA-JOURNALISM Camila Salazar School of Data Fellow @milamila07

Skillshare - Regression Analysis for Data Journalism

Embed Size (px)

Citation preview

Page 1: Skillshare - Regression Analysis for Data Journalism

(AN INTRODUCTION)

REGRESSION ANALYSIS FOR DATA-JOURNALISM

Camila SalazarSchool of Data Fellow

@milamila07

Page 2: Skillshare - Regression Analysis for Data Journalism

Outline1. Target audience2. A step beyond descriptive statistics3. What is regression analysis?4. Example: the effect of education on wages5. Other types of regression analysis useful in data

journalism.6. Using regression models in data journalism

Page 3: Skillshare - Regression Analysis for Data Journalism

TARGET AUDIENCE

Page 4: Skillshare - Regression Analysis for Data Journalism

Target audience• Data journalists

• School of Data Fellows

• People with basic knowledge of statistics

• Journalism students

Page 5: Skillshare - Regression Analysis for Data Journalism

A STEP BEYOND DESCRIPTIVE STATISTICS

Page 6: Skillshare - Regression Analysis for Data Journalism

So you are in the newsroom...There’s a big debate in you country about the importance of education

Your editor asks you to make a story about the importance of education

Page 7: Skillshare - Regression Analysis for Data Journalism

First step: descriptive statisticsYou find data about education in your country and start

calculating the descriptive statistics.

Page 8: Skillshare - Regression Analysis for Data Journalism

Descriptive statisticsWith descriptive statistics you find:

-How many people has a college degree.

-Unemployment according to the level of education.

Page 9: Skillshare - Regression Analysis for Data Journalism

And...You interview young people that are still in highschool that don’t want to go to college. And you want to convince them with your story how could they improve their future earnings if they go to college.

You can’t answer this question using descriptive statistics :(

Page 10: Skillshare - Regression Analysis for Data Journalism

But...You can calculate how much an extra year of schooling increases wages using regression analysis!

Page 11: Skillshare - Regression Analysis for Data Journalism

WHAT IS REGRESSION ANALYSIS?

Page 12: Skillshare - Regression Analysis for Data Journalism

What is regression analysis?

Regression analysis is a statistical tool for the

investigation of relationships between variables.

Page 13: Skillshare - Regression Analysis for Data Journalism

What is regression analysis?

It helps you explain how the value of a dependent

variable (Y) changes when and independent variable

(X) is varied, holding all other variables fixed.

Page 14: Skillshare - Regression Analysis for Data Journalism

What is regression analysis?

For example:

Health (Y)

Vegetables consumption (X), exercise (X), sleep (X)

dependent variableindependent variables

Page 15: Skillshare - Regression Analysis for Data Journalism

The linear regression It’s a method for modeling the linear relationship between a dependent variable Y and one or more explanatory variables.

dependent variable independent

variable

error term

coefficient

We are interested in estimating B (the

coefficient). It captures the effect X has on Y,

holding all other factors fixed.

Page 16: Skillshare - Regression Analysis for Data Journalism

The linear regressionFor example you want to explain the effect of education on

wages.

Wage EducationExperience

Variation in wage that has to do with educationVariation in wage that has

to do with experience

Page 17: Skillshare - Regression Analysis for Data Journalism

What is a linear regression?• You have to formulate a hypothesis about the

relationships of interest. • Have some theory behind your assumptions.• There are some essential assumptions and

statistical properties of the regression that you have to consider. Wage

Page 18: Skillshare - Regression Analysis for Data Journalism

EXAMPLE: THE EFFECT OF EDUCATION ON

WAGES

Page 19: Skillshare - Regression Analysis for Data Journalism

Example• Database with 994 observations. • 3 variables: wage (in dollars), experience, years of

education.• The equation to estimate:

Wage

Page 20: Skillshare - Regression Analysis for Data Journalism

Example

Wage

Page 21: Skillshare - Regression Analysis for Data Journalism

Example: coefficients

Wage

An additional year of education increases wage by $161.68, holding all other factors fixed.

An additional year of experience increases wage by $16.54, holding all other factors fixed.

Page 22: Skillshare - Regression Analysis for Data Journalism

Example: p-value

Wage

P-Value

But, what is the p-value?

Page 23: Skillshare - Regression Analysis for Data Journalism

Example: p-value

Wage

With statistics you can’t be 100% certain.

A relatively simple way to interpret P values is to think of them as representing how likely a result would occur by chance.

Page 24: Skillshare - Regression Analysis for Data Journalism

Example: p-value

Wage

Null-hypothesis: is a hypothesis which the researcher tries to disprove, reject or nullify.

“Education has NO explanatory power over wages”“Men are NOT taller than women on average”

To test the null-hypothesis we use the p-value.

Page 25: Skillshare - Regression Analysis for Data Journalism

Example: p-value

Wage

The p-value is the probability of being wrong when rejecting the null hypothesis

If your p-value is small < 0.05 you have strong evidence to reject the null hypothesis.

“Men are significantly taller than women, p=0.01.” That means there is a 1% chance that men are NOT actually taller than women and this result happened only because of random chance.

Page 26: Skillshare - Regression Analysis for Data Journalism

Example

Wage

P-Value

It tells you if the coefficient is statistically significant.With a low p-value (less than 10%, 5% or 1%) you can reject the null hypothesis that the coefficient is equal to zero (it has no explanatory power). In this case,

the coefficients are significant. That means that education and experience have explanatory power on wage.

Page 27: Skillshare - Regression Analysis for Data Journalism

Example

Wage

R-squared: This indicates how well the explanatory variables explain the variability of the dependent variable.

In this case: 33.8% of the variability of wage is explained by the years of education and years of

experience.

Page 28: Skillshare - Regression Analysis for Data Journalism

OTHER TYPES OF REGRESSION ANALYSIS

Page 29: Skillshare - Regression Analysis for Data Journalism

The logistic regression

Wage

Imagine you want to estimate the probability that a person with a college degree is employed.

The linear regression wouldn’t be very useful.

Page 30: Skillshare - Regression Analysis for Data Journalism

The logistic regression

Wage

Is a regression model where the dependent variable (Y) is categorical. For example (binary):

1= unemployed, 0= employedIt is used to estimate the probability of a binary response based

on one or more independent variables.

Page 31: Skillshare - Regression Analysis for Data Journalism

The logistic regression

Wage

Explanatory variables:

-Age-Education-Family income-Ocuppation

Logistic regression

Employed

Unemployed

The model would tell you, for example, that a person with a college degree is three times more likely to be employed that a person that only went to highschool.

Page 32: Skillshare - Regression Analysis for Data Journalism

The logistic regression

Wage

• The coefficients can not be interpreted as the rate of change in the dependent variable.

• You check the sign of the coefficients.

• You can calculate marginal effects or odds ratio (logit).

Page 33: Skillshare - Regression Analysis for Data Journalism

USING REGRESSION MODELS IN DATA

JOURNALISM

Page 34: Skillshare - Regression Analysis for Data Journalism

Some examples"Does School Pay Off? How Much?" - El Financiero (Costa Rica),

winner of the Data Journalism Awards 2014.

http://www.elfinancierocr.com/gnfactory/especiales/2015/calculadorasalarial/

Wage

Page 36: Skillshare - Regression Analysis for Data Journalism

Some advice• Statistical analysis can be complex. If you’re not

sure find advice with an expert! • Be transparent with your methodology.• Study a lot! • https://www.coursera.org/ Free courses!

Wage

Page 37: Skillshare - Regression Analysis for Data Journalism

References-Wooldridge (2010). Introductory Econometrics

-Long (1997). Regression models for categorical and limited dependent variables

-Costa Rica National Survey of Income and Spending (2004).

Wage

Page 38: Skillshare - Regression Analysis for Data Journalism

THANKS :) @milamila07

schoolofdata.org