Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
A “data revolution” for development?
Giulio Quaggiotto UN Global Pulse
Lab Manager, Jakarta @gquaggiotto
• The promise • The challenge • “New data” as a practice
www.transportbuzz.com
A radically different view…
http://www.youtube.com/watch?v=ONZcjs1Pjmk&noredirect=1
An example in action
“The future is already here, it is just not very evenly distributed”
4 ways to data innovation
1. Funding and investment for national statistical capacity, particularly in developing countries. 2. Exploring new data sources, including those sourced from individual citizens. 3. Harnessing advanced technologies, like visualization tools that make data more understandable. 4. “Liberating” data to “unleash the analytical creativity of users” and hold policymakers accountable.
U.N. Deputy Secretary-General Jan Eliasson
The challenge
Those who have done it
Those who talk about it
New data as a practice
Dev’t sector
Quiz time
GLOBAL PULSE: A NETWORK OF LABS
Pulse Lab NYC Est. 2010
Pulse Lab Jakarta Est. 2012 Pulse Lab Kampala
Est. 2013
New data as a practice
1. Situational Awareness: Real-time trend analysis of population activities and dynamics can inform the design and targeting of programmes and policies.
1. Early Warning: Predictive analysis and detection of
anomalous events allows rapid response to crises. 1. Programme Evaluation: Real-time feedback from
citizens, and measurement of behavior change, allows for adaptive course corrections in programmes and policies.
BIG DATA FOR DEVELOPMENT: 3 OPPORTUNITIES
Types of data
Example data sources Global Pulse works with: • Social media data (blogs, forums, social media
streams) • Mobile network data (CDRs, top-ups) • Radio feeds • News media content • Online search • Postal data • GPS data • …
We gain access to this type of data through partnerships with private sector or academia.
People tweet about immunization
• People share informative information on immunization and
vaccinations with their connected friends
Types
No. of tweets
Contents
Shared Links
1453 Is it dangerous to have fever, swelling and pain after vaccines?
962 Is it true MMR vaccine can lead to autism in children?
800+ MoH launched Pentavalent vaccine and booster
Retweets
983 UNICEF reports that 30K-40K children are infected by measles every year in Indonesia.
772 Fever, the initial symptom of measles, will increase within the first five days and then skin rash starts.
755 Polio is a contagious disease, theoretically, which can infect all ages but children are most vulnerable.
People express opinions
0
500
1000
1500
2000
2500
20
12
-01
-01
20
12
-02
-01
20
12
-03
-01
20
12
-04
-01
20
12
-05
-01
20
12
-06
-01
20
12
-07
-01
20
12
-08
-01
20
12
-09
-01
20
12
-10
-01
20
12
-11
-01
20
12
-12
-01
20
13
-01
-01
20
13
-02
-01
20
13
-03
-01
20
13
-04
-01
20
13
-05
-01
20
13
-06
-01
20
13
-07
-01
20
13
-08
-01
20
13
-09
-01
20
13
-10
-01
20
13
-11
-01
20
13
-12
-01
Nu
mb
er o
f tw
eets
per
day
Reports in Media which prompts spikes in tweets [2013/12/01] Debates about assurance of halal products. [2013/12/03] Uncertainty whether some drugs contain pig substance [2013/12/06] MoH starts consultations related to halal certification. [2013/12/07] Debates over halal certificates of food. [2013/12/12] Confirmation that some drugs and vaccines may contain haram substance. [2013/12/12] MUI urges pharmacologists to replace haram process.
Situational awareness
0
1000
2000
3000
Rank 2012-06-20 2012-10-08 2013-04-28 2013-12-23
1 Autism (213) Death(1030) Fever (1498) Death (224)
2 Death (5) Fever (14) Swelling (1494) Fever (3)
3 Sick (4) Sick (4) Pain (1491) Crying (1)
4 Fever (2) Crying (3) Autism (1011) Autism (1)
5 Crying (1) Fever (3) Fever (4) -
June 2012 Oct 2012 Apr 2013 Dec 2013
“There are some autism cases after
MMR vaccine”
“A baby suddenly died after vaccine ”
“Is it dangerous to have fever, swelling, pain, after vaccine?” “China is
investigating death cases of babies”
Early warning and rapid response
Early warning Rapid response with actionable plan
Disseminate correct information through Twitter via influential users
@dr_piprim @dirgarambe @blogdoktor
……
Detect people concerned about death after vaccine from Twitter
0
200
400
600
800
1000
1200
Number of tweets of ‘death’
Finding extracts
Topic
Brainstorming
Taxonomy
Design
Feasibility
Analysis
Workflow Feasibility Study Results
Promising topics -Permission to work -Perception: appropriateness of work
-Discriminatory job requirement -Double burdens of working women
Enough tweet volume AND
Sufficient relevant tweets
2000
4000
Less promising topics -Cost to access employment -Lack of skills or education
Enough tweet volume OR
Sufficient relevant tweets
100
200
Gender in the workplace
Nowcasting Food Prices and Understanding Coping
Mechanisms Project
Finding extracts
Data Collection
& Refinement
Information
Extraction
Correlation
Study
Workflow Results with Initial Methodologies
Nowcasting food prices Correlation study between (a) Real-word commodity price and (b) Commodity price from Twitter
We confirm that people discuss commodity price in social media and test the feasibility of extracted information.
Understanding coping mechanism
Pattern discovery from (a) Commodity price changes and (b) Consumption patterns Cooking oil is a commodity which directly affects people’s life because it is a basic commodity in Indonesia, differently from ‘condensed milk’.
Sinabung Eruption
(15th Sep, 2013)
Infographics • Location : Karo regency, North Sumatra • Elevation : 2,460 m above sea level • Victims : BNPB (Indonesian National Board of Disaster Management) reported 15 people died, and more than 30,000 people evacuated
Volume Dynamics from Twitter • Period: 14/9/2013 and 10/2/2014
• Total Twitter Posts: 151,448 • Relevant Posts: 117,436 (78%) • More than 10K tweets at the first
eruption
Visualizing Displacement Due to Floods through Mobile Data Partners: WFP, Govt. of Mexico, Univ. of Madrid, Telefonica Project: Visual analytics to support improved targeting of humanitarian assistance during emergencies
CDRs population estimate vs census
- state of Tabasco, Mexico
Source: Telefonica
a) Abidjan b) Liberian border c) Roads to Mali
and Burkina Faso d) Road to Ghana
Ref: arxiv.org/abs/1309.4496: Evaluating Socio-Economic State Of A Country Analyzing Airtime Credit And Mobile Phone Datasets
A real-time map of poverty in Cote d’Ivoire?
Luminosity as a proxy for GDP
output
Chen & Nordhaus, Using luminosity data as a proxy for economics statistics, 2011
Understanding labour market
flows
Source: Using social media to measure labour market flows, March 2014
M ashing big data with sensors…
… and citizen generated data
A mobility index to evaluate H1N1 response in Mexico City
Telefonica Research, 2011 (http://www.unglobalpulse.org/publicpolicyandcellphonedata)
Evaluating policies real time?
Real time evaluation of advocacy Finding extracts
Big postal data
200000
300000
400000
500000
600000
700000
800000
Inte
rnat
ion
al P
ost
al T
raff
ic: D
aily
W
eigh
t (k
g)
letter-post
parcel-post
EMS
Letter-Post MA
Letter-Post Trend
Parcel Post MA
Parcel Post Trend
Express MA
Express Trend
• 1+ billion records per year • Real-time scans since early
2010 • Traffic btw 150 countries:
• Letters • Packages • MoneyGram
Making it happen: New data as a practice
Those who have done it
Those who talk about it
New data as a practice
Dev’t & Gov’t
Questions Data dive Prototype Pilot Real time
tools
Anatomy of a big data project
Project portfolio
1. Social media for social protection
2. Social media to understand public perception of immunization
3. Signals of discrimination in the workplace
4. Nowcasting food prices and understanding coping mechanisms
Exploration
Active
Category Status Names
Research
projects
Ad-hoc
5. Mapping socio economic vulnerability
6. Maternal health
7. Disaster response/resilience
8. Universal heath coverage/public service monitoring
9. Deforestation
10. Providing Real-Time Insights on Indonesian Post2015 Priorities
Skillset of a data team
Big Data Access
• Twitter (global, 500 million messages/day) • Orange France Telecom (Ivory Coast, Senegal) • Telenor (Bangladesh – mobile money data) • Telefonica (Mexico, Guatemala) • MTN (Uganda) • Real Impact (Cote d’Ivoire, Rwanda, Zambia) • Universal Postal Union (global postal flow data)
Data Mining & Analysis Technologies
• Amazon Web Services (supercomputing) • DataSift (data filtering) • SAS (analytics & data visualization) • Crimson Hexagon (data analysis)
Data Science Expertise
• Université catholique de Louvain (call records analysis)
• Institut des Systèmes Complexes de Paris Ile-de-France (news media mining & filtering)
• Universidad Politécnica de Madrid (call records analysis)
• Stockholm University (research fellow) • Karolinska Institutet (call records analysis) • University of Sheffield (speech-to-text tools) • Microsoft Research (social media analysis) • Google (search query analysis)
Leveraging Partnerships
Infrastructure example: Geolocaton Augmentation
AWS S3
GeoNames database
UNGP Code
Hadoop MapReduce
+ Plain
tweets
HDFS / Impala
Geolocated tweets
Plain tweets
Easy-to-use tools for real-time awareness
Working with us
Horizon scanning
Trainings/capacity building
Secondments & residencies
Joint prototyping
Full research project
http:www.unglobalpulse.org
Giulio Quaggiotto UN Global Pulse
Lab Manager, Jakarta @gquaggiotto
3 roles for NSOs and big data
1. 3rd party to certify statistical quality of new sources
2. Issue statistical “best practices” in the use of non-traditional sources and the mining of “big data”
1. Use non-traditional sources to augment (and perhaps
replace) official series
Source: Andrew Wyckoff, OECD
4 big data illusions
1.uncannily accurate results 2.N=all statistical sampling is obsolete 3.causation is passee 4. scientific or statistical models aren’t
needed the end of theory
Source: Tim Harford, FT
Big data technologies
• Big data requires different technologies and infrastructure! – Lots inter-connected computers for data storage – Distributed processing for analytics
One of Google data center (source: http://www.google.com/about/datacenters)
Big data applications
• Private Sector – Item recommendation by Amazon – Movie recommendation by Netflix – Friend recommendation by Facebook – Predicting loss of customers by mobile phone companies – Optimizing Human Resources/hiring practices
• Public Policy
– Tax fraud detection – Twitter earthquake monitoring – Understanding perceptions and implications of policy decisions – “Now-casting” food prices and inflation rates
Big data applications already nearby us!
Data science
• Data science: a new discipline to analyze big data • Data scientist: a practitioner of data science with various skills to
analyze big data to find patterns, to generate insights, and to visualize results for a communication with others
http://upload.wikimedia.org/wikipedia/commons/4/44/DataScienceDisciplines.png