21
Data Science in Government

Public policy in the ‘big data’ age: Martin Ralphs presentation

Embed Size (px)

Citation preview

Page 1: Public policy in the ‘big data’ age: Martin Ralphs presentation

Data Sciencein

Government

Page 2: Public policy in the ‘big data’ age: Martin Ralphs presentation

Government Data Science Partnership

Raise awareness of data science potential

Embed new approaches and new skills and improve existing capability

Engage with departments to understand opportunities and issues

Build and support a cross-government data science community to share expertise

Break down technical barriers and understand ethical issues

Learning by doing

Government Innovation GroupGovernment Digital Service

Page 3: Public policy in the ‘big data’ age: Martin Ralphs presentation

What is data science?

3

Data science

Volume

Velocity

Variety

New approach

New technology

New ApproachA ‘data first’ mindset; exploring

the data to find insights & potential improvements using new & innovative techniques

New technologyNew, low priced storage in the

cloud, with unrestricted technology capable of running

software which can gain speedy insights

Page 4: Public policy in the ‘big data’ age: Martin Ralphs presentation

How can data science improve government policy and operations?

4

Data visualisation

New data sets and collection methods

Machine learning

Social media Webscraping

Prediction

Clustering

Unstructured data

Real time data

Interactive web appsReal-time feeds

Personalisation

Page 5: Public policy in the ‘big data’ age: Martin Ralphs presentation

Data sources for official statisticsSurveys – e.g. of businesses and households

Census – every 10 years

Administrative data – by-product of government processes

Big Data?“Data that is difficult to collect, store or process within the conventional systems of statistical organizations. Either, their volume, velocity, structure or variety requires the adoption of new statistical software processing techniques and/or IT infrastructure to enable cost-effective insights to be made.”

(UNECE, 2013)

Page 6: Public policy in the ‘big data’ age: Martin Ralphs presentation

Big data sources

Social media: posts, pictures and videos

Purchase transaction records

Mobile phone GPS and cell tower signalsHigh volume administrative

& transactional records

Sensors gathering information: e.g. climate, traffic, internet of things etc.

Digital satellite images

Page 7: Public policy in the ‘big data’ age: Martin Ralphs presentation

7

New data sets and collection methods

Social media Webscraping

Real time dataWeb scraping supermarket prices

● Prices collection currently manual● Web scraping offers more detailed, more frequent

data at lower cost● Web scraped data provides an opportunity to gain

experience in processing high volume price data

Page 8: Public policy in the ‘big data’ age: Martin Ralphs presentation

8

Prototype web scrapers

● 3 supermarkets● 35 CPI/RPI item categories● Written using Python (scrapy)● Daily collection (around 6500 price quotes)● Item counts monitored daily

Page 9: Public policy in the ‘big data’ age: Martin Ralphs presentation

9

Classification challenge

“This is a dessert apple”

“This is fruit juice (not orange)”

“This is fruit juice (not orange)” and not a dessert apple!

Tesco Mango Juice Drink 1ltr

Tesco Pure Apple Juice 2 Litre

Training Set

Supervised machine learning

Page 10: Public policy in the ‘big data’ age: Martin Ralphs presentation

10

Price quote distributions

Whisky

Onions

Page 11: Public policy in the ‘big data’ age: Martin Ralphs presentation

Price Indices Publication 1st September 2015

http://bit.ly/1PRKMGx

Page 12: Public policy in the ‘big data’ age: Martin Ralphs presentation

“The real finding of the initial research was not that inflation is too high, but the method of collecting prices matters rather a lot”

Paul Johnson, IFS

Page 13: Public policy in the ‘big data’ age: Martin Ralphs presentation

Smart metersRationale: Smart-type electricity meter data to model occupancy or household composition with energy use profiles

Support more efficient field operations (in 2011, £6.6m spent trying to enumerate vacant properties)

Data from smart meter trials in Great Britain and Republic of Ireland

A range of potential methods identified

Significant issues around privacy and ethics

Page 14: Public policy in the ‘big data’ age: Martin Ralphs presentation

Electricity: smart meters

14

Half hourly electricity consumption over 7 days at one meter, through 28 consecutive 7 day periods.

Page 15: Public policy in the ‘big data’ age: Martin Ralphs presentation

TwitterRationale: Using geo-located Tweets to explore mobility and migration7 months of geo-located tweets within Great Britain (about 100 million data points)

Can infer place of usual residence

Significant issues around privacy and ethics

Geolocated Tweet penetration rate by local authority

Page 16: Public policy in the ‘big data’ age: Martin Ralphs presentation

Demographics and Twitter data

Page 17: Public policy in the ‘big data’ age: Martin Ralphs presentation

Geo-located Tweet volumes by Device Type Great Britain, 15 August to 31 October 2014

Page 18: Public policy in the ‘big data’ age: Martin Ralphs presentation

18

Predicting Norovirus (ahead of lab reports) using social media

Page 19: Public policy in the ‘big data’ age: Martin Ralphs presentation

19

Machine learning Prediction

ClusteringUnstructured data

Segmentation

Page 20: Public policy in the ‘big data’ age: Martin Ralphs presentation

Thank [email protected]: @GoodPracticeMR

ONS Big Data Project web page http://bit.ly/1OZAOzsGDS Data Science blog http://bit.ly/1QCT5Xs

Page 21: Public policy in the ‘big data’ age: Martin Ralphs presentation

Government Data Program

me

Policies andGovernance

Modern Data InfrastructureData Science

Open Data

Data Leaders Network

Data Steering Group

Inter-Ministerial Group for Digital Transformation

NationalInformationInfrastructure

Common Technology

Services

Platforms and Standards

Registers

Digital Services

Departmental Transformation

Governmentas a

Platform