Statistics for Social and Behavioral Sciences Session #5: The
Regression Line (Agresti and Finlay, Chapter 9) Prof. Amine
Ouazad
Slide 2
Statistics Course Outline P ART I. I NTRODUCTION AND R ESEARCH
D ESIGN P ART II. D ESCRIBING DATA P ART III. D RAWING CONCLUSIONS
FROM DATA : I NFERENTIAL S TATISTICS P ART IV. : C ORRELATION AND C
AUSATION : R EGRESSION A NALYSIS Week 1 Weeks 2-4 Weeks 5-9 Weeks
10-14 This is where we talk about Zmapp and Ebola! Firenze or
Lebanese Express? Where we are right now! Describing associations
between two variables
Slide 3
Last Session Descriptive statistics summarize data, to make it
easier to assimilate the information. Measuring the distribution of
a variable Mean, Median. Range, standard deviation. Applies both to
bell-shaped and non bell-shaped distributions (e.g. the superstar
distribution). Bell-shaped distributions. Empirical rule applies!
Measuring associations Contingency table. Scatter plot.
Slide 4
Outline 1.Scatter plot, linear relationship Unemployment and
Crime 1.The regression line What is the relationship between height
and weight? 2.Warning: Correlation is not causation Spurious
relationships Next session:Bivariate analysis Chapter 9 of A&F,
continued
Slide 5
Unemployment Crime ? Is there really a link? ATLANTIC CITY -
"With the layoffs the city is going to have, we'll have to expect
that increase in crime." With an increase in unemployment and crime
typically going hand-in-hand, Atlantic City PBA President Paul
Barbere believes a challenging time lies ahead for the Atlantic
City Police Department. That's going require us to respond to more
calls for service, more calls for services requires more time out
of service for our patrol units, with fewer patrol units, it's
going to be difficult," said Barbere. This potential spike in crime
comes during a time when Barbere says the department is already
short-handed. "With the police department, we're running about 30
men and women short of what the ordinance calls for."
Slide 6
Unemployment Crime ? On the boardwalk, the potential for more
crime has the valuable tourist the city relies on questioning what
lies ahead. "They should have a plan designed for that, because
they certainly don't want to dissuade people from coming here,"
said Yvette Dilworth of Queens, New York. "I don't know what
Atlantic City is going to do to prepare for that but obviously when
you're losing jobs the crime rate could come up," said Chris
Mascioli of Camden County. "So yeah I'm concerned about it." In
addition to the potential increase in calls stemming from
unemployment, police will also have to keep an eye on the newly
vacant casinos. "We'll have to maintain a certain staff to keep
mechanicals going and to ensure the integrity and safety of the
buildings themselves. That's not to say people won't try to break
in," said Barbere And even with less officers and more unemployment
in the city, Barbere is confident the department is capable of
rising to the challenge. "The men and women of the Atlantic City
Police Department are well trained and have been dealing with this
staffing for sometime now, said Barbere. So it's nothing they can't
handle."
Slide 7
United States data Data set: County Characteristics 2000-2007.
Observation: County. Number of observations? Variables: Unemployed
persons, 2005. Number of Murders reported to police, 2004.
Comments? Self Check Observational data Experimental data
Unemployed persons Categorical variable Quantitative variable
Unemployed persons Discrete variable Continuous variable Number of
murders Categorical variable Quantitative variable Number of
murders Discrete variable Continuous variable Survey data Online
data Administrative data
Slide 8
Scatter plot Number of murders reported to police Number of
observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max:
1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133
Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236
P25:285P75:1683 Which is the response variable and which is the
explanatory variable?
Slide 9
Distribution of Murders Kind of distribution Bell shaped
Superstar distribution (Spotify) The Empirical Rule applies True
False County Name Murders in 2004 Los Angeles County 1038 Wayne
County 415 Harris County 346 Philadelphia County 330 Maricopa
County 281 Dallas County 278 Baltimore city 276
Slide 10
Scatter plot Number of murders reported to police Number of
observations: 2,957 Mean: 5.07 Median:0 Std. Dev:28.30 Min: 0 Max:
1,038 P25:0P75:2 Unemployed persons Number of observations: 3,133
Mean: 2,414.56 Median:665 Std. Dev:7,985 Min: 4 Max: 256,236
P25:285P75:1683
Slide 11
Linear Relationship? y = + x Murders = + Unemployed + 20,000
unemployed + 20,000 unemployed An increasing relationship,
>0
Slide 12
What a Linear Relationship Implies A increase in the number of
unemployed raises the number of murders by * the increase. A
decline in the number of unemployed raises the number of murders by
* the decline. An increase in the number of unemployed by, say,
10,000, raises the number of murders by the same amount regardless
of whether there were initially 0 murders or 300 murders. No gang
formation? A decline in the number of unemployed by, say, 10,000,
lowers the number of murders by the same amount regardless of
whether there were initially 0 murders or 300 murders. Shouldnt it
be tougher to lower the number of murders than to raise it? This is
a model, a simplification of the world
Slide 13
What we can do with a linear relationship Extrapolate Predict.
With more local data (census block, census tract, ZIP code level)
With individual data. (Minority report style, possible with Danish
or Swedish data). Interpolate Fill in the gaps. When data is
missing.
Slide 14
The Los Angeles Police Department, like many urban police
forces today, is both heavily armed and thoroughly computerised.
The Real-Time Analysis and Critical Response Division in downtown
LA is its central processor. Rows of crime analysts and
technologists sit before a wall covered in video screens stretching
more than 10 metres wide. Multiple news broadcasts are playing
simultaneously, and a real-time earthquake map is tracking the
regions seismic activity. Half-a-dozen security cameras are focused
on the Hollywood sign, the citys icon. In the centre of this video
menagerie is an oversized satellite map showing some of the most
recent arrests made across the city a couple of burglaries, a few
assaults, a shooting. On a slightly smaller screen the divisions
top official, Captain John Romero, mans the keyboard and zooms in
on a comparably micro-scale section of LA. It represents just 500
feet by 500 feet. Over the past six months, this sub-block section
of the city has seen three vehicle burglaries and two property
burglaries an atypical concentration. And, according to a new
algorithm crunching crime numbers in LA and dozens of other cities
worldwide, its a sign that yet more crime is likely to occur right
here in this tiny pocket of the city. The algorithm at play is
performing whats commonly referred to as predictive policing. Using
years and sometimes decades worth of crime reports, the algorithm
analyses the data to identify areas with high probabilities for
certain types of crime, placing little red boxes on maps of the
city that are streamed into patrol cars. Burglars tend to be
territorial, so once they find a neighbourhood where they get good
stuff, they come back again and again, Romero says. And that
assists the algorithm in placing the boxes. The dashboard for New
York Police Department's 'Domain Awareness System'. Photograph:
Shannon Stapleton/Reuters
Slide 15
Outline 1.Scatter plot, linear relationship Back to height and
weight. 1.The regression line What is the relationship between
height and weight? 2.Warning: Correlation is not causation Spurious
relationships Next session:Bivariate analysis Chapter 9 of A&F,
continued
Slide 16
Finding the regression line Any line is imperfect
Slide 17
Finding the regression line Which line is the right one? A line
is entirely determined by the choice of and . An essential formula.
Notice the difference between b and , between a and . x is the
explanatory variable y is the response variable If y increases when
x increases, then b>0 If y decreases when x increases, then
b