71
A “data revolution” for development? Giulio Quaggiotto UN Global Pulse Lab Manager, Jakarta @gquaggiotto

A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

A “data revolution” for development?

Giulio Quaggiotto UN Global Pulse

Lab Manager, Jakarta @gquaggiotto

Page 2: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

• The promise • The challenge • “New data” as a practice

Page 3: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 4: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 5: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 6: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

www.transportbuzz.com

Page 7: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

A radically different view…

http://www.youtube.com/watch?v=ONZcjs1Pjmk&noredirect=1

Page 8: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

An example in action

Page 9: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

“The future is already here, it is just not very evenly distributed”

Page 10: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 11: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 12: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 13: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

4 ways to data innovation

1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources, including those sourced from individual citizens. 3. Harnessing advanced technologies, like visualization tools that make data more understandable. 4. “Liberating” data to “unleash the analytical creativity of users” and hold policymakers accountable.

U.N. Deputy Secretary-General Jan Eliasson

Page 14: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

The challenge

Page 15: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Those who have done it

Those who talk about it

New data as a practice

Dev’t sector

Page 16: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 17: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Quiz time

Page 18: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

GLOBAL PULSE: A NETWORK OF LABS

Pulse Lab NYC Est. 2010

Pulse Lab Jakarta Est. 2012 Pulse Lab Kampala

Est. 2013

Page 19: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

New data as a practice

Page 20: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

1. Situational Awareness: Real-time trend analysis of population activities and dynamics can inform the design and targeting of programmes and policies.

1. Early Warning: Predictive analysis and detection of

anomalous events allows rapid response to crises. 1. Programme Evaluation: Real-time feedback from

citizens, and measurement of behavior change, allows for adaptive course corrections in programmes and policies.

BIG DATA FOR DEVELOPMENT: 3 OPPORTUNITIES

Page 21: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 22: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Types of data

Example data sources Global Pulse works with: • Social media data (blogs, forums, social media

streams) • Mobile network data (CDRs, top-ups) • Radio feeds • News media content • Online search • Postal data • GPS data • …

We gain access to this type of data through partnerships with private sector or academia.

Page 23: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 24: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 25: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 26: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 27: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 28: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

People tweet about immunization

• People share informative information on immunization and

vaccinations with their connected friends

Types

No. of tweets

Contents

Shared Links

1453 Is it dangerous to have fever, swelling and pain after vaccines?

962 Is it true MMR vaccine can lead to autism in children?

800+ MoH launched Pentavalent vaccine and booster

Retweets

983 UNICEF reports that 30K-40K children are infected by measles every year in Indonesia.

772 Fever, the initial symptom of measles, will increase within the first five days and then skin rash starts.

755 Polio is a contagious disease, theoretically, which can infect all ages but children are most vulnerable.

Page 29: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

People express opinions

0

500

1000

1500

2000

2500

20

12

-01

-01

20

12

-02

-01

20

12

-03

-01

20

12

-04

-01

20

12

-05

-01

20

12

-06

-01

20

12

-07

-01

20

12

-08

-01

20

12

-09

-01

20

12

-10

-01

20

12

-11

-01

20

12

-12

-01

20

13

-01

-01

20

13

-02

-01

20

13

-03

-01

20

13

-04

-01

20

13

-05

-01

20

13

-06

-01

20

13

-07

-01

20

13

-08

-01

20

13

-09

-01

20

13

-10

-01

20

13

-11

-01

20

13

-12

-01

Nu

mb

er o

f tw

eets

per

day

Reports in Media which prompts spikes in tweets [2013/12/01] Debates about assurance of halal products. [2013/12/03] Uncertainty whether some drugs contain pig substance [2013/12/06] MoH starts consultations related to halal certification. [2013/12/07] Debates over halal certificates of food. [2013/12/12] Confirmation that some drugs and vaccines may contain haram substance. [2013/12/12] MUI urges pharmacologists to replace haram process.

Page 30: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Situational awareness

0

1000

2000

3000

Rank 2012-06-20 2012-10-08 2013-04-28 2013-12-23

1 Autism (213) Death(1030) Fever (1498) Death (224)

2 Death (5) Fever (14) Swelling (1494) Fever (3)

3 Sick (4) Sick (4) Pain (1491) Crying (1)

4 Fever (2) Crying (3) Autism (1011) Autism (1)

5 Crying (1) Fever (3) Fever (4) -

June 2012 Oct 2012 Apr 2013 Dec 2013

“There are some autism cases after

MMR vaccine”

“A baby suddenly died after vaccine ”

“Is it dangerous to have fever, swelling, pain, after vaccine?” “China is

investigating death cases of babies”

Page 31: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Early warning and rapid response

Early warning Rapid response with actionable plan

Disseminate correct information through Twitter via influential users

@dr_piprim @dirgarambe @blogdoktor

……

Detect people concerned about death after vaccine from Twitter

0

200

400

600

800

1000

1200

Number of tweets of ‘death’

Page 32: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Finding extracts

Topic

Brainstorming

Taxonomy

Design

Feasibility

Analysis

Workflow Feasibility Study Results

Promising topics -Permission to work -Perception: appropriateness of work

-Discriminatory job requirement -Double burdens of working women

Enough tweet volume AND

Sufficient relevant tweets

2000

4000

Less promising topics -Cost to access employment -Lack of skills or education

Enough tweet volume OR

Sufficient relevant tweets

100

200

Gender in the workplace

Page 33: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Nowcasting Food Prices and Understanding Coping

Mechanisms Project

Finding extracts

Data Collection

& Refinement

Information

Extraction

Correlation

Study

Workflow Results with Initial Methodologies

Nowcasting food prices Correlation study between (a) Real-word commodity price and (b) Commodity price from Twitter

We confirm that people discuss commodity price in social media and test the feasibility of extracted information.

Understanding coping mechanism

Pattern discovery from (a) Commodity price changes and (b) Consumption patterns Cooking oil is a commodity which directly affects people’s life because it is a basic commodity in Indonesia, differently from ‘condensed milk’.

Page 34: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Sinabung Eruption

(15th Sep, 2013)

Infographics • Location : Karo regency, North Sumatra • Elevation : 2,460 m above sea level • Victims : BNPB (Indonesian National Board of Disaster Management) reported 15 people died, and more than 30,000 people evacuated

Volume Dynamics from Twitter • Period: 14/9/2013 and 10/2/2014

• Total Twitter Posts: 151,448 • Relevant Posts: 117,436 (78%) • More than 10K tweets at the first

eruption

Page 35: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Visualizing Displacement Due to Floods through Mobile Data Partners: WFP, Govt. of Mexico, Univ. of Madrid, Telefonica Project: Visual analytics to support improved targeting of humanitarian assistance during emergencies

Page 36: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 37: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

CDRs population estimate vs census

- state of Tabasco, Mexico

Source: Telefonica

Page 38: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

a) Abidjan b) Liberian border c) Roads to Mali

and Burkina Faso d) Road to Ghana

Ref: arxiv.org/abs/1309.4496: Evaluating Socio-Economic State Of A Country Analyzing Airtime Credit And Mobile Phone Datasets

A real-time map of poverty in Cote d’Ivoire?

Page 39: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 40: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Luminosity as a proxy for GDP

output

Chen & Nordhaus, Using luminosity data as a proxy for economics statistics, 2011

Page 41: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Understanding labour market

flows

Source: Using social media to measure labour market flows, March 2014

Page 42: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

M ashing big data with sensors…

Page 43: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 44: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

… and citizen generated data

Page 45: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

A mobility index to evaluate H1N1 response in Mexico City

Telefonica Research, 2011 (http://www.unglobalpulse.org/publicpolicyandcellphonedata)

Evaluating policies real time?

Page 46: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Real time evaluation of advocacy Finding extracts

Page 47: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Big postal data

200000

300000

400000

500000

600000

700000

800000

Inte

rnat

ion

al P

ost

al T

raff

ic: D

aily

W

eigh

t (k

g)

letter-post

parcel-post

EMS

Letter-Post MA

Letter-Post Trend

Parcel Post MA

Parcel Post Trend

Express MA

Express Trend

• 1+ billion records per year • Real-time scans since early

2010 • Traffic btw 150 countries:

• Letters • Packages • MoneyGram

Page 48: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 49: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Making it happen: New data as a practice

Page 50: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Those who have done it

Those who talk about it

New data as a practice

Dev’t & Gov’t

Page 51: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 52: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Questions Data dive Prototype Pilot Real time

tools

Anatomy of a big data project

Page 53: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Project portfolio

1. Social media for social protection

2. Social media to understand public perception of immunization

3. Signals of discrimination in the workplace

4. Nowcasting food prices and understanding coping mechanisms

Exploration

Active

Category Status Names

Research

projects

Ad-hoc

5. Mapping socio economic vulnerability

6. Maternal health

7. Disaster response/resilience

8. Universal heath coverage/public service monitoring

9. Deforestation

10. Providing Real-Time Insights on Indonesian Post2015 Priorities

Page 54: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Skillset of a data team

Page 55: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 56: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Big Data Access

• Twitter (global, 500 million messages/day) • Orange France Telecom (Ivory Coast, Senegal) • Telenor (Bangladesh – mobile money data) • Telefonica (Mexico, Guatemala) • MTN (Uganda) • Real Impact (Cote d’Ivoire, Rwanda, Zambia) • Universal Postal Union (global postal flow data)

Data Mining & Analysis Technologies

• Amazon Web Services (supercomputing) • DataSift (data filtering) • SAS (analytics & data visualization) • Crimson Hexagon (data analysis)

Data Science Expertise

• Université catholique de Louvain (call records analysis)

• Institut des Systèmes Complexes de Paris Ile-de-France (news media mining & filtering)

• Universidad Politécnica de Madrid (call records analysis)

• Stockholm University (research fellow) • Karolinska Institutet (call records analysis) • University of Sheffield (speech-to-text tools) • Microsoft Research (social media analysis) • Google (search query analysis)

Leveraging Partnerships

Page 57: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Infrastructure example: Geolocaton Augmentation

AWS S3

GeoNames database

UNGP Code

Hadoop MapReduce

+ Plain

tweets

HDFS / Impala

Geolocated tweets

Plain tweets

Page 58: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 59: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 60: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 61: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 62: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Easy-to-use tools for real-time awareness

Page 63: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 64: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Working with us

Horizon scanning

Trainings/capacity building

Secondments & residencies

Joint prototyping

Full research project

Page 65: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,
Page 66: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

http:www.unglobalpulse.org

Giulio Quaggiotto UN Global Pulse

Lab Manager, Jakarta @gquaggiotto

Page 67: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

3 roles for NSOs and big data

1. 3rd party to certify statistical quality of new sources

2. Issue statistical “best practices” in the use of non-traditional sources and the mining of “big data”

1. Use non-traditional sources to augment (and perhaps

replace) official series

Source: Andrew Wyckoff, OECD

Page 68: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

4 big data illusions

1.uncannily accurate results 2.N=all statistical sampling is obsolete 3.causation is passee 4. scientific or statistical models aren’t

needed the end of theory

Source: Tim Harford, FT

Page 69: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Big data technologies

• Big data requires different technologies and infrastructure! – Lots inter-connected computers for data storage – Distributed processing for analytics

One of Google data center (source: http://www.google.com/about/datacenters)

Page 70: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Big data applications

• Private Sector – Item recommendation by Amazon – Movie recommendation by Netflix – Friend recommendation by Facebook – Predicting loss of customers by mobile phone companies – Optimizing Human Resources/hiring practices

• Public Policy

– Tax fraud detection – Twitter earthquake monitoring – Understanding perceptions and implications of policy decisions – “Now-casting” food prices and inflation rates

Big data applications already nearby us!

Page 71: A “data revolution” data revolution for... · 1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources,

Data science

• Data science: a new discipline to analyze big data • Data scientist: a practitioner of data science with various skills to

analyze big data to find patterns, to generate insights, and to visualize results for a communication with others

http://upload.wikimedia.org/wikipedia/commons/4/44/DataScienceDisciplines.png