30
1 Professor Andy Tatem Population statistics using new forms of data

Data Science Campus launch - Population statistics using new forms of data

Embed Size (px)

Citation preview

PowerPoint Presentation

1

Professor Andy TatemPopulation statistics using new forms of data

This presentation provides an overview of geospatial methods, drawing on big data and other sources, for complimenting traditional sources of demographic data in building evidence bases for monitoring progress towards the SDGs at subnational scales, and designing strategies. The examples come from work being undertaken by the WorldPop project (www.worldpop.org) and the Flowminder Foundation (www.flowminder.org), which both focus on improving the spatial demographic evidence bases for low and middle income countries, partnering with national statistical agencies, UN organizations, mobile network operators and data providers to add value to traditional datasources and integrate them with novel datasources to fill gaps. 1

Key partners and donors

WorldPop: Research program in Dept Geography & Environment focused on methods for improving the demographic evidence base in low/middle income countries

Flowminder: Non-profit foundation working with data providers and international/government agencies to operationalize and scale research in support of vulnerable populations and sustainable development in low/middle income countries

50+ staff; focus on data science and integration in fields of spatial demography and mobility to support disease, disaster and development applications

DFID, DLR, ORNL, UNICEF, ICF2

Population statistics: what do we need?Mapping the denominator

Governments are reliant on accurate and up-to-date data onpopulation numbers anddistributions for planning

-What is the catchment population of my health facility?-How many people will be affected by development x?Targeting vulnerable populations

To improve the health of the poorest and most vulnerable and target interventions to lift them out of poverty, we need to know where they are

-How much vaccine is needed for this ward?-Do the poorest have access to financial support here?

Designing policies around mobility

Data on migration, seasonal movements, commuting are valuable For effective policy andservice delivery strategy design

-How many people commute from x to y?-Where should services for mobile populations be best placed?

Emergency response

Effective response requires rapid and ongoing assessments of numbers of vulnerable peopleaffected and future risks

-How many people were impacted by the flood?-Which people were displaced, and where to?

OECD!3

Data

Data

The challenge20102020203020152025CensusCensusCensus??DataCensus data are valuable, but expensive and collected once a decadeIncreasing need for more timely and detailed dataRegistry, administration data can help fill gapsNot available in many low/middle-income settingsChallenge of tracking progress towards development goalsIn some countries even the most basic census data are lacking????

Years since last census

Here the challenges of relying solely on census data are laid out. How do we measure and monitor in the inter-censal years? Plus it can often take a year+ for census data to appear, plus many countries have not had reliable censuses for many years how do we define a baseline and measure change subnationally and regularly from this baseline?4

What do we have to help us?

Data

Data

20102020203020152025CensusCensusCensusData

SurveySurveySurveySurveySurveySurveySatelliteMobile phone

Geolocated household surveysSatellite and GIS dataMobile phone data

New geospatial datasources can help compliment existing traditional sources to fill gaps and keep a subnational focus. Three types of data will be introduced in the following slides, each of which have the valuable feature of being available more regularly than census data, in inter-censal periods, and each of which is already finding operational use in low and middle income settings. While these datasources are valuable in providing regularly measurements and subnational detail, they do all contain biases and gaps, hence the integration with each other and with more traditional datasources is required to draw on the strengths of each and overcome some of these biases.5

Geolocated Surveys

53.1% of women literate

Low

HighProportion of women who are literate

Increasingly survey clusters in national surveys are using Global Positioning Systems (GPS) to record the precise locations where household surveys were undertaken, and the coordinates are made available e.g. Demographic and Health Surveys (DHS), Living Standards Measurement Study (LSMS). This enables moving from national or broad subnational regions in SDG target metrics (as in the Nigeria national and provincial female literacy example) to a much more fine-grained picture that is provided by the GPS-located survey cluster data, highlighting heterogeneities that are masked in aggregate statistics. The close up for Nigeria shows how integrating with satellite maps of urban areas immediately highlights how literacy is higher in urban areas and decreases with greater remoteness. This links to next slide on satellite/GIS layers that show how such factors can be measured in detail. The global image shows all the GPS-located clusters that are publicly available through the DHS program, highlighting the potential to measure SDG target metrics at fine spatial scales for most of the low/middle income world (when integrated with other datasets in appropriate statistical models to account for small sample sizes, uncertainties etc see later slides).6

Satellite and GIS data

Massive increases in geospatial data collection, processing and availability have occurred over the past decade, with volumes, coverage, variety and quality all on the rise. Examples here of settlements mapped from 0.5m spatial resolution satellite imagery using computer-automated methods, open streetmap buildings constructed through volunteers mapping from detailed satellite imagery, refugee camps mapped digitally with estimated population sizes integrated, topography and landuse data derived from satellite image processing, the density of geolocated tweets in Indonesia providing an indication of population densities, and health facilities/roads mapped in Kenya. These all provide relatively complete and consistent coverage across countries, with often regular updates they also cover factors associated with poverty, demographics etc, and so offer opportunities for integration with geolocated survey data in models to produce predictive maps of key SDG target metrics.7

Mobile phone call detail records

User makes a call from location X

User travels to Y and makes a call

XYCall routed through nearest towerNetwork operator records time and tower of call for billing

Mobility: Changing densities, flows, seasonal/permanent migration

Social networks: Number of contacts, calling patterns

Consumption: Credit purchase frequencies, top-up amountsProtecting confidentiality -Aggregate summaries-Regulator approval-Raw data never leaves operator

Finally, mobile phone ownership has rocketted in the past decade, with the vast majority of the population of even the lowest income countries owning and regularly using cellphones. The last 5 years has seen a rise in the use of call detail records (CDRs). These are records, collected anonymously, of the time, type of communication sent/received, and cell tower that the communication was routed through. The example shows how a user making a call in one location has this routed through the nearest tower and the event recorded. The user later moves to a different location and makes another call this is routed through a different tower and this communication event is again recorded, showing that the user moved from the vicinity of one tower to the vicinity of another evidence of movement. Across the millions of customers, and billions of communications, detailed aggregate patterns of changing population densities, mobility patterns, social networks and spending on credit can be observed without being able to identify individuals or compomise confidentiality. CDRs provide valuable complimentary data on population mobility, social networks and consumption in near real time.8

Data integration

Massive increases in geospatial data collection, processing and availability have occurred over the past decade, with volumes, coverage, variety and quality all on the rise. Examples here of settlements mapped from 0.5m spatial resolution satellite imagery using computer-automated methods, open streetmap buildings constructed through volunteers mapping from detailed satellite imagery, refugee camps mapped digitally with estimated population sizes integrated, topography and landuse data derived from satellite image processing, the density of geolocated tweets in Indonesia providing an indication of population densities, and health facilities/roads mapped in Kenya. These all provide relatively complete and consistent coverage across countries, with often regular updates they also cover factors associated with poverty, demographics etc, and so offer opportunities for integration with geolocated survey data in models to produce predictive maps of key SDG target metrics.9

Thinking in grids

Grids: consistent and comparable format and a framework for integrating differing data types

100 x 100m gridded estimates of population density Administrative unit population totalsGrids: flexibility in summarization to any administrative unit level

Comprehensive Emergency Obstetric and Neonatal Care (CEmONC) Facilities overlaid on grid of women of childbearing agePercentage of women of childbearing age per woreda within 50km of a CEmONCGrids: flexibility in analysis and data integration

10

Census data disaggregation

Aggregate census counts100x100m gridded counts

Census counts generally aggregated at coarse, irregular administrative unit level, making integration and comparisons with other data challengingWomen of childbearing age per 1x1km 2015Integration with satellite/GIS data related to human population distribution patterns to disaggregate counts to regular grids

Mapping populations per grid square provides a detailed and consistent dataset for integration with other datasources e.g. population-environment interactions, healthcare access. Here, we show how aggregated census counts can be disaggregated to a grid square representation through exploiting the relationships between population densities and the range of satellite and GIS datasets presented earlier. The outputs are 100x100m estimates of total numbers of residents and their breakdowns by age and sex, as well as full metadata describing methods, input datasets and accuracy assessments, and are freely available from www.worldpop.org, while the computer code used to produce the outputs is freely available. As the approach is disaggregating counts from census data linked to administrative boundaries, the importance of using reliable census numbers enumerated at a high administrative unit level remain as a key determinant of outputs mapping accuracies. The output datasets have been used in multiple international reports by UN agencies and others, as well as substantial government usage within countries. This is being scaled globally to produce freely available gridded age-sex structured population data.11

www.worldpop.org

12

But...Years since last census

Population estimation in the absence of national census

Mapping buildings and settlements from satellite imagery

150 MicrocensusTraining Clustersand Validation Sets in Kano State

Population estimates per 90x90m grid square

However, the census data disaggregation approach is not ideal where there is a lack of reliable census data. In some countries, the last census was a decade or more ago with substantial changes, so the disaggregation of census counts will still not provide an accurate evidence base for contemporary population distributions. Here, very high resolution satellite imagery (50x50cm grid squares) is processed to extract building and settlements. In urban areas, different types of neighbourhoods that correspond to different population densities can be automatically recognised by computer algorithms and mapped too. Through integration of these building/settlement extractions with small area sample microcensuses, the prediction of population numbers across entire countries can be undertaken with quantification of uncertainty in outputs. Work done in collaboration with the Gates Foundation and Oak Ridge National Labs, is now being used to direct vaccination efforts in Nigeria and similar work is being started in Afghanistan with the government and UNFPA.14

People dont stay still.

15

Modelling changing populations through integration of census and mobile network data

The previous slides present approaches that still just provide a static picture of population distributions, and can be laborious to update. Cellphone data offers solutions here the integration of mobile data enables dynamics to be measured. Through calibration against census data to translate densities of SIM card communications to population totals, mobile data can be used to produce population distribution maps and measure changes dynamically across the course of days, weeks, months or years. This slide shows a short video that shows the results of doing this for France - See this case study for more details: http://flowminder.herokuapp.com/case-studies/dynamic-population-mapping-using-mobile-phone-data16

Pop density change per square km

Namibia Pop: 2.3 millMTC active subscriptions: 2.1 mill

How is this useful?

The France example provided a good validation test, but within WorldPop/Flowminder, we are more focussed on low/middle income countries where data gaps and limitations are more likely to exist. Here the same approach was adapted to Namibia in collaboration with the National Vector Borne Disease Control Program, who were particularly interested in understanding how health facility catchment populations change for estimating fluctuations in resource demands. The slides show pop density change per square kilometer ie huge changes at many times of year are evident, with substantial seasonal variability. WorldPop/Flowminder are working on extending these methods to many other low+middle income countries in collaboration with mobile operators, governments and international agencies.17

Dynamic facility catchment populations

18

Measuring migration

R2 = 0.92Annual migration flow from census dataAnnual migration flow from cellphone data

The value of mobile phone data is further shown in measuring migration. Using Namibia as an example again here, the population fluxes measured using the change of residence census question was compared over the same period as the national census to mobile phone user changes of residences shows strong correlations. This highlights strong potential for ongoing migration statistics production using mobile data.19

Population dynamics across scales

Responding to outbreaks, disasters etc and building resilient systems to mitigate against their impacts requires understanding and measuring the dynamics of populations and how they respond to events. Mobile data can provide unprecedented insights into this across a range of spatial and temporal scales. The maps show temporal changes in population densities across the course of a week (for Accra and surrounding area in Ghana), for Kenyan regions across seasons, and for Chinese New year.20

Resilience, Risk reductionCrisis response

Here, some recent examples of mobile data usage for responding to outbreaks (Ebola W Africa, Cholera Haiti), and natural disasters (Nepal quake displacements). As seen in the previous slides, mobile data can be used to measure population connectivity through movements and communications. Many examples in the literature have shown the value of incorporating human mobility data from mobile data in predicting disease spread, and the West Africa and Haiti examples shown are just two of these. In Nepal, ongoing data feeds enabled monitoring and rapid update of population displacements after the recent earthquakes, with reports delivered to agencies to guide response efforts. Ongoing data feeds are being set up in these regions and many others to ensure that resilient response systems are in place and that rapid responses can be provided just like was done in Nepal.21

-Greater mobility = lower poverty-Wider social network = lower poverty-Large, regular credit top up = lower poverty

-Increasing distance from major roads = increasing poverty-Increasing urbanicity = decreasing poverty

High

LowPoverty rate

Data integration for mapping population characteristics Population characteristics measured in household surveys can be strongly related to features we can measure everywhere

22

Improving mapping of socioeconomic indicatorsGPS-located survey cluster data

Cellphone and satellite data are collected 24/7 = Potential for ongoing monitoring

Here us an example for Bangladesh it shows the types of data we have available through GPS-located surveys, satellite/GIS and cellphone-derived indicators for exploring relationships with cluster level socieconomic metrics and predicting in unsampled locations.23

Maybe these are crazy ivory-tower academic ideas that will never find use?

24

Datasets, methods already in use

Children under 5yrsSyria population distribution Malawi live births

Malaria prevalence 2000-15

Are all of these new datasets and methods just crazy academic ideas? No, theyre already being used to produce health metrics, situation reports, provide development indicators. A small subset of these examples are provided here, with census disaggregation gridded population datasets used to provide the numbers presented in the World Malaria report, humanitarian needs assessments and World Bank reports. Maps of births and pregnancies were used to provide statistics in recent UNFPA midwifery reports, and mapping from GPS-located survey data was used to measure and map changing malaria prevalence in a recent UNICEF-WHO report.25

Datasets, methods already in use

Hurricane responseEarthquake-induced displacements

These data and methods are also already being used to guide vaccination planning, measure financial access, support disaster response. The bottom-up population mapping approaches have been used to underpin polio vaccination planning in northern Nigeria, while the detailed poverty maps have formed part of the Gates Foundations efforts in measuring and mapping access to financial services for the poor. Finally, the Nepal earthquake population displacement analyses fed into reports distributed to all aid and response agencies.26

Datasets, methods already in use

Population living under $2 a daywww.fspmaps.com

Children under 5yrs

National population estimation for election planning

These data and methods are also already being used to guide vaccination planning, measure financial access, support disaster response. The bottom-up population mapping approaches have been used to underpin polio vaccination planning in northern Nigeria, while the detailed poverty maps have formed part of the Gates Foundations efforts in measuring and mapping access to financial services for the poor. Finally, the Nepal earthquake population displacement analyses fed into reports distributed to all aid and response agencies.27

Building expertise

-Data science skills and capacity are increasing everywhere

-Open data and software are accelerating uptake

-Making full use of existing traditional data and complimenting these with new dataset integration is more cost-effective than new data collection

-Local ownership and analysis are key to sustainable implementation

Key messagesIncreasing pressure to produce statistics at finer spatial scales and more regularlyStatisticians and traditional datasources are more valuable than everBut to produce estimates for different geographical scales and time periods, the integration of multiple types of data to compliment traditional sources is often requiredNovel datasets (e.g. phones, satellite) are prone to biases, but each has advantages over census data in terms of the frequency of measurement and spatial precisionMethods to account for biases, reporting uncertainties and providing clear metadata/documentation to inform users are all importantGreat potential in complimenting traditional sources to build strong demographic databases

29

Further information

www.worldpop.orgwww.flowminder.org

@WorldPopProject

@FlowminderE-mail: [email protected]

30