Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
researchICTsolutions
Better data: combining survey, administrative and big data
- Dr. Christoph Stork
�1
Big Data = combining various data sets, including large data‣ Large Data (private sector and administrative
data): - n -> all ( n approaches N)
- all bank accounts - all mobile subscribers - all tax payers - all driver licence holders
‣ Survey Data: ‣ Quarterly LFS South Africa 2016 n= 30,000
�2
Large Data‣ Administrative data:
- medical and tax records - driver licences, civil registry ‣ social security, crime statistics, electricity consumption,
educational statistics ‣ Commercial transactions data:
- Stock exchange data, FX - bank and credit card and super market transactions, - insurance records, loyalty card records…
‣ Sensors and tracking devices: sensors, M2M, satellite, GPS devices…
‣ Online activities / social media: Web scraping of online search activity, online page views, blogs/ FB/ twitter
�3
Example:PopulationdensitychangesinColomboregion:weekday/weekendbasedon100millionCallDetailRecordsperdaygeneratedbySriLankamobileoperators
!4
Wee
kday
Sund
ay
Decrease in Density Increase in Density
Time 18:30Time 12:30Time 06:30
by Sriganesh Lokanathan [email protected]
Big Data
�5
ADMIN DATA
SURVEY
DATA
PRIVATE SECTOR
DATA
Banks Mobile Operators Other Companies
Labour Force Survey National Household Income and Expenditure Survey
Economic Surveys (Informal sector)
Ministry of Finance Social Security Civil Registry
Surveys are snap shots - Big Data is a movie
2000
2001 Census
2011 Census
2006 DHS
2012 LFS
2003/4 NHIES
2006/7 NHIES
2013 LFS
2014 LFS
2016 LFS
Can Big Data be used to: ‣ Fill gaps (interpolate key statistics)? ‣ Reduce frequency of surveys? ‣ Make statistics more accurate? ‣ Reduce sample size (census 4% of population)?
Example of Namibia surveys
Governments need reliable data‣ Survey data can be misleading if sampling is not
not done properly…(US election polling eg) ‣ Big data can be misleading if what is being
measured is not well understood ‣ Mobile operator data does not include info on non-users ‣ Bank data does not generate informal sector income data
‣ Social media as early indicator of an unemployment? ‣ Self reinforcing trends ‣ fake news
�7
researchICTsolutions
Digital and Financial Divide
�8
Finscope 2015 Survey for South Africa
�9
Big and Admin Data may overlook these
Finscope 2015 Survey for South Africa
75% of South Africa may leave only thin digital trace
�11
ZICTA 2015 ICT Survey: Individuals 10+
Zambia
Urban
Rural7%
18%
14%
39%
68%
51%
Active Mobile usersSmartphone owners among mobile phone owners
ZICTA 2015 ICT Survey: Households with working…
Financially included
Access to mobile phone
Access to computer
Access to Internet 11%
8%
78%
57%
21%
15%
82%
61%
Male Female
Finscope 2015: Individual 16+
Zambia
Male bias
Urban bias
�12
Zambia -90 db
researchICTsolutions
Informal Business Surveys
�13
researchICTsolutions �14
researchICTsolutions �15
researchICTsolutions �16
Listing compiled for each EA - The listings serve as sample frames for the simple random selections of households & businesses
Large data itself not enough for policy purposes‣ Large Data available for the rich or middle class ‣ Large data for Internet of Things ‣ Little available for the informal sector or poor
- Outside of coverage areas - Not using tech - Not having bank account and only using cash - No health insurance - No permanent address
‣ Informal sector makes up a large share of our societies
�17
Generating Big Data through Triangulation
�18
ADMIN DATA
SURVEY
DATA
PRIVATE SECTOR
DATA
Estimating Income
Source: BanksFrequency: ContinuousType: Private Sector Large DataWhat: Salaries received (formal)Data: address, age, gender, IDEnrichment: years of work experience, education, skills, sector
Source: Ministry of Finance Frequency: Continuous Type: Admin Large Data What: Pay as you earn tax (formal salaries), income from self employmentData: address, age, gender, IDEnrichment: years of work experience, education, skills
Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Income of formally employedIncome of informally employedIncome of self-employedData: age, gender, work experience, education, location, sector, type of employment (full-time, part-time or occasional)Enrichment: Having a bank accountReceiving salary in bank accountPaying Social Security
�20
LFS LFS
2015 estimate
relationship
2020 re-calibrate
Income data
no of people receiving a
formal salary
Income Data Triangulation2015 LFS Formal Income = 2015 Total salaries declared for tax
= 2015 Total salaries received in bank accounts
2015 LFS Informal Income = 2015 LFS total Income - Total salaries declared for tax
Monthly or quarterly PAYE Tax
Salary transactions data
interpolate
Understanding Informal Sector better
‣ Large / Big does not know much about informal sector
‣ Estimate link between informal and formal sector based on detailed LFSs
‣ Interpolating Labour force statistics based on large data may then also allow you to interpolate informal sector statistics
�21
Detecting Strikes
‣ Bank transaction data classified as salary or wage payments can be screened for temporary interruptions by location
�22
Matching Jobs to skills
Source: Universities, VTCs (SAQA) Frequency: AnnualType: Admin DataWhat: Supply of graduates by field and date
Source: Company Online Portal,Company Survey, Skills Audit, tracer studiesFrequency: Occasional Type: Survey / Demand drivenWhat: Vacancies by skill, address, date
Source: Labour Force Survey Frequency: Annual / OccasionalType: SurveyWhat: Number of unemployed by age, gender, work experience, skills, education and locationEnrichment:Paying Social Security
Source: Social Security Frequency: ContinuousType: Admin DataWhat: Number of unemployedData details: address, age, gender, years of work experience,education, skills
�24
2015 LFS
2020 LFS
2015 estimate
relationship
2020 re-calibrate
Combining Social Security with Labour Force Survey Data
quarterly social security data
interpolate employment and
unemployment numbers
Impact of raising minimum wages in SA‣ Asking businesses? They will always say it will reduce jobs ‣ Triangulation: ‣ Bank and Tax Data: once enriched with demographic info
profile of lower salary spectrum ‣ Labour Force Survey Data: Profile informal wage earners close
to minimum threshold ‣ Determine the Gap between the 2 profiles ‣ Who are those that are below the minimum wage ‣ How likely would higher minimum wage make them formal? ‣ Who would benefit? ‣ Who strong will incentive be to remain informal?
‣ Big Data = allows to measure impact and allow quick reversal if harmful �25
researchICTsolutions �26
Can Big Data be used to fill gaps (interpolate key statistics)?
Yes
Can Big Data be used to reduce frequency of surveys? Can Big Data be used to make statistics more accurate?
Can Big Data be used to reduce sample size?
We will only know once we try
researchICTsolutions
All this assumes access can be negotiated
�27
researchICTsolutions
Thank youDr. Christoph Stork
[email protected] www.researchictsolutions.com
�28