35
Collaborative Online Social Media Observatory: Crime Sensing Through Social Media Matt Williams & Luke Sloan Pete Burnap, Omer Rana, William Housley, Adam Edwards, Jeffrey Morgan, Vincent Knight, Rob Procter, Alex Voss Cardiff University, University of Warwick, University of St Andrews tweet: @ cosmos_project web: www.cosmosproject.net

Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

  • Upload
    nedra

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

Collaborative Online Social Media Observatory: Crime Sensing Through Social Media Matt Williams & Luke Sloan Pete Burnap, Omer Rana, William Housley, Adam Edwards, Jeffrey Morgan, Vincent Knight, Rob Procter, Alex Voss Cardiff University, University of Warwick, University of St Andrews - PowerPoint PPT Presentation

Citation preview

Page 1: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Collaborative Online Social Media Observatory: Crime Sensing Through

Social MediaMatt Williams & Luke Sloan

Pete Burnap, Omer Rana, William Housley, Adam Edwards, Jeffrey Morgan, Vincent Knight, Rob Procter, Alex Voss

Cardiff University, University of Warwick, University of St Andrews

tweet: @cosmos_project web: www.cosmosproject.net

Page 2: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• COSMOS overview & update• Project Objectives• Key Literature• Research Questions• Data & Sampling• Sensing Crime & Disorder• Exploring Relationships– Time– Space

• Modelling Strategy• Methodological Considerations

Outline

Page 3: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

What is COSMOS?

• Aim to establish a coordinated interdisciplinary UK response to “Big Social Data”

• Collaboration between the universities of Cardiff, Warwick and St. Andrews

• Additional input from Edinburgh, UCL, Leeds, Manchester and Wolverhampton

• Brings together social, computer, political, health and mathematical scientists to study the methodological, theoretical, and empirical dimensions of Big Data in technical, social and policy contexts

• Developing a research programme to address next-generation research questions that focus upon the challenges posed by big social data to government, digital economy and civil society

• Development of new methodological tools and technical/data solutions for UK academia

Page 4: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

What is COSMOS?

• COSMOS has attracted 17 research grants amounting to over £1.25M in funding from the ESRC/EPSRC/AHRC/JISC and £500K from the public and private sectors (DoH/FSA/HPC Wales).

• A significant proportion of these funds have been awarded to collect and analyse social media data in the contexts of tension, hate speech, crime, urban safety, security and suicide

Page 5: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Selection of research projectsDigital Social Research Tools, Tension Indicators and Safer Communities: A demonstration of COSMOS (ESRC DSR)

COSMOS: Supporting Empirical Social Scientific Research with a Virtual Research Environment (JISC)

Small items of research equipment at Cardiff University (EPSRC)

Hate Speech and Social Media: Understanding Users, Networks and Information Flows (ESRC Google)

Social Media and Prediction: Crime Sensing, Data Integration and Statistical Modelling (ESRC NCRM)

Understanding the Role of Social Media in the Aftermath of Youth Suicides (Department of Health)

Scaling the Computational Analysis of “Big Social Data” & Massive Temporal Social Media Datasets (HPC Wales)

Digital Wildfire: (Mis)information flows, propagation and responsible governance, (ESRC Global Uncertainties)

Public perceptions of the UK food system: public understanding and engagement, and the impact of crises and scares (ESRC/FSA)

2011

2016

Page 6: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

COSMOS

Page 7: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

COSMOS

Page 8: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

COSMOS

Page 9: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

COSMOS

Page 10: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

COSMOS Infrastructure

COSMOS Desktop• Small local datasets• Users’ API credentials• Local analysis• Sept ‘14 launch

COSMOS Cloud• Scalable storage

• Massive datasets• Scalable compute

• On-demand nodes• Fast search & retrieve• Fast analysis

• Workflow management• Collaboration support• 2015 launch

Page 11: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• To evaluate the utility of crime and disorder related tweets in predicting patterns of crime in six London boroughs

• To develop an automated machine classifier for identifying tweets containing crime and disorder terms

• To develop statistical models that take into account temporal and spatial variation

• To compare conventional predictive models of crime with models containing social media derived data

Project Objectives

Page 12: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• The interoperability afforded by COSMOS through spatial linkage enables us to identify associations between online and offline phenomena

• Social media is already being used as a preferential means of updating the public about crime in the US and Europe (Johnson 2012, Crookes 2010, Danef 2012, Philips 2011, Rawlinson 2012)

• Allowing the reporting of emergencies on Twitter is being considered in the UK

• A near ten-fold rise in crime related communication in 2012 (Warrell 2012)

• Complaints originating from social media make up "at least half" of calls passed on to front-line officers (College of Policing, 2014)

• 6,000 officers currently being trained to deal with social media evidence

Literature I

Page 13: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Social and computational researchers have already begun to ‘repurpose’ social media data in their ‘predictive’ efforts

• Tumasjan et al. (2010) measured Twitter sentiment in relation to candidates in the German general election concluding that this source of data was as accurate at predicting voting patterns as traditional polls

• Asur & Huberman (2010) correlated frequency and sentiment related to movies on Twitter with their revenue, claiming that this method of prediction was more accurate than the Hollywood Stock Market

• Sakaki et al. (2010) found that the analysis of Twitter data produced estimates of the centres of earthquakes more accurately than conventional methods

• These studies illustrate how social media generates naturally occurring data that can be used to complement and augment conventional curated and administrative data

Literature II

Page 14: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Another notable example is the association of social media and crime, such as the riots during August 2011 (Procter et al. 2013a)

• Malleson and Andresen (2014) use Twitter to estimate changing populations densities as alternative to Census for identification of violent crime hotspots

• Gerber (2014) looks at the relationship in US between reported crime and the prevalence of multiple topics on Twitter

Literature III

Page 15: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Can crime and disorder related content on Twitter enhance our understanding of and our ability to predict crime patterns?

• If so, is Twitter content a better predictor of certain major crime types then others?

• Can this form a data be used as a alternative measure of feelings of insecurity in local communities?

Research Questions

Page 16: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Comparative case study of London and Cardiff (this presentation focuses on London):

– Recorded crime (lat/long, HO crime category), split by month Aug 2013 to Aug 2014

– Collecting 100% of geotagged UK Tweets (approx 500k per day)

– Census data including ethnic composition, educational attainment, employment, income, health, religiosity (ONS API)

NOTE: COSMOS archive contains all UK tweets since Sept 2011 (not all of which are geotagged) but potential for identification of higher

(mundane) geographies…

Data & Sampling

Page 17: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• We need to identify tweets in our sample that relate to signatures of crime and disorder using key-word detection of ordinary language

• 500K tweets a day means that it is unfeasible to do this manually

• Develop machine classifier to identify tweets referencing crime and disorder

• References to anxiety, environmental deterioration, anti-social behaviour, night-time establishments etc.

• Use crowd-sourcing and human coders to develop a lexicon and algorithm…

Sensing Crime & Disorder I

Page 18: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Reduce sample of UK Tweets to London & Cardiff

Take random subsample (every nth tweet) and send for crowd-sourced human coding

Use 50% of human-annotated dataset to train classifier through machine learning

Validate classifier using remaining 50% of dataset (test precision and recall)

Run classifier over whole London and Cardiff dataset

Human coders identify tweets that contain (and do not contain) crime/disorder terms

INPUT: all UK geocoded tweets

OUTPUT: All London and Cardiff tweets with crime/disorder flag

Sensing Crime & Disorder II

Page 19: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Simple correlation between tweets about crime/disorder and occurrence of recorded crime is too simplistic

• At what spatial and temporal level can social media be used to inform operational decision making?

• At what spatial and temporal level do we try to match tweets and crime?

• How to integrate existing curated data?

Exploring Relationships

Page 20: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Certain variables are fixed (e.g. socio-economic characteristics of areas)

• Crime and tweets are locomotive (by the second!)

• Investigate relationship between tweets and crime/disorder at different levels of time:– Annual?– Monthly?– Days of the week?– Time of day?– Seasonal variations in crime type?

Exploring Relationships – Time I

Page 21: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Simple frequency of reported crime

commencement time varies depending on

time of day (June 2013 data)…

Exploring Relationships – Time II

Page 22: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Type of crime, as proportion of all crime, differs by time of day…

Exploring Relationships – Time III

Page 23: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Greater variability across time of day for some crimes more than

others (June 2013 data)…

Exploring Relationships – Time IV

Page 24: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Clearly time of day is important

• More tweets during daytime might mean that we can more accurately predict daytime crime

• Likely that Twitter data is better for predicting some crime types than others (explicit and hidden)

• How to account for ‘lag’ e.g. ‘the house down the road was burgled last night’

Exploring Relationships – Time V

Page 25: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Size of London results in huge internal variance in crime type and rates

• Crime and tweets are point data that can be located in any geography (from OA to LA)

• Investigate relationship between tweets and crime/disorder at different levels of space:– City wide– Boroughs– Wards

Exploring Relationships – Space I

Page 26: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Borough level geography is

too high, variance largely

due to population

density (plus commuter ands

tourism movement)

Exploring Relationships – Space II

Page 27: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Knightsbridge & Hyde Park

Soho & Covent Garden

Exploring Relationships – Space III

Page 28: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Red = >14% ‘never worked’ or ‘long-term unemployed

Dark Green = <5% ‘never worked’ or

‘long term unemployed’

Exploring Relationships – Space III

Page 29: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Commuter and tourism patterns matter, although more people = more crimes = more tweets?

• Reduction in social media use for those living in deprived areas? Less likely to tweet about crime despite being more likely to know about it?

• Could go down to OA, but number of tweets and reported crimes per case/unit is cut

Exploring Relationships – Space IV

Page 30: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• A ward-based example:– One month of data– Treated as cross-sectional– Crime and tweets aggregated over month– Single time point allows inc. of ward characteristics– One ward = single case

• Use existing known predictors of crime to specify model, measure success

• Add tweet data to model and see if ‘prediction’ rate is significantly higher

• i.e. does social media data [x] enable better explanation of variance in crime rates [y]?

Modelling Strategy I

Page 31: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Simple logit model ignores temporal order and spatial data

• Fixed effects model would account for changes over time (but fixed factors such as ward demographics would be excluded)

• Random effects model would enable inclusion of non-time variant predictors (but stringent assumptions)

• Spatial point data allows use to take into account spatial correlation (kernal density estimation?)

• Multilevel model would account for both ward and borough level variance

Modelling Strategy II

Page 32: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Could control for time and space through dummy variables

• p-values and standard errors can be poorly estimated for dummy variables in single level models (Snijders & Bosker 2012)

• Not feasible to have a dummy variable for every hour of the day

• Suggested way forward:– Test for spatial variance (MLM)

• Ward and borough level– Test for temporal variance (FE/RE)

• Time of day, day of week and month

• If amount of spatial and temporal variance is significant then it must be accounted for in a multi-level longitudinal model (Yu et a. 2010)

Modelling Strategy II

Page 33: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

• Asynchronous relationship between tweeting about crime/disorder and experiencing/witnessing it?

• Commencement and finish time of a crime are rarely the same (e.g. events)

• Difference between when something happened and when it was reported

Methodological Considerations

Page 34: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Discussion

Page 35: Collaborative Online Social Media Observatory: Crime Sensing Through Social Media

Collaborative Online Social Media Observatory: Crime Sensing Through

Social MediaMatt Williams & Luke Sloan

Pete Burnap, Omer Rana, William Housley, Adam Edwards, Jeffrey Morgan, Vincent Knight, Rob Procter, Alex Voss

Cardiff University, University of Warwick, University of St Andrews

tweet: @cosmos_project web: www.cosmosproject.net