View
220
Download
3
Category
Tags:
Preview:
Citation preview
Big Data .vs. Official Statistics
Yu gyung Kang Director, Statistical Information Portal Division
Statistics Korea
Directors General of the National Statistical Institutes Meeting25~27 September 2013/Hague, Netherlands
Contents
Technology Assessment (TA) in KoreaBig Data Use in Private Sector
• Market Analysis• Suicide Warning System
On-going Projects by KOSTAT• Pilot Project for Mining and Manufacture Survey• E-household Account System• Pilot Project for Price Statistics
Future Challenges
1
Technology Assessment (1)
…Conducted by MSIP of Korea in 2012, under the Article 14 of the Framework Act on Science and Technology
• What is big data?– Data with 3Vs characteristics + Data Management Technology * Gartner’s 3Vs : Volume, Variety and Velocity
Volume Variety Velocity
…….
GB/TB
PBEBZB
Structured Data Unstructured Data
Customer DataSale DataStock DataFinance Data
Video Music Messages
SNS GPS BBS
Low speed(hours to
weeks)
High speed(mins. to seconds)
2
Technology Assessment (2)
• Expected Impact Private Sector Public Sector Individuals
• source of new value creation
• Supporting efficient decision-making
• Providing business chances and jobs
• Improving public ser-vice and its efficiency
• Real-time response to social issues
• Creating new industry and job opportunities
• Improving quality of life with individually tailored service
• Increasing trust in public policies and service
• Aggravating economic inequality
• Possibility of wasting money due to careless massive investment
• Social problems caused by unethical use of data
• Increasing risk of leak-ing gov’t’s secrets
• ‘Big Brother’• Misuse of big data
with error and its neg-ative impact to gov’t policies
• Increase of privacy and security issues
3
Technology Assessment (3)
Policy Recommendations
a. Localize Core Technologies related to big data through gov’t-led R&D
b. Establish Legal and Institutional Basis for standardization of managing, sharing and trading big data
c. Foster pool of Big Data Analysts and Experts through interdis-ciplinary undergraduate and graduate programs
d. Take a Step-By-Step Approach by Setting Priorities in the sec-tors where benefits to the public will be visible.
e. Make Strategies to Protect Privacy
4
Big Data Use in Private SectorCase 1 : Market Analysis by
X5
Which Business would you like to open?
Big Data Use in Private SectorCase 1 : Market Analysis by
Floating Population
ConsumerType
Sales Information
Real Estate
Business Cycle
6
Real Estate 411
…
Korean Statistical Information Service
Big Data Use in Private SectorCase 2 : Suicide Warning System
Weather Forecast
7
Why not
Suicide fore-cast?
• social factors• weather factors• Werther Effect• personal emotion
OECD (2012), OECD Health Statistics
Case 2 : Suicide Warning System
Big Data Use in Private Sector
• Training Set (2008-2009) & Test Set (2010)– Total number of suicide incidents – Economic and weather data
• CPI, unemployment rate, KOSPI(Korean Composite Stock Price Index), day-light hours and temperature
– 150 million posts from about 5 million blogs on NAVER(incl. SNS posts)
• Var1 (# of posts including “suicide”), • Var2 (# of posts including “dysphoria”, “be tired”, “be painful”, or “be ex-
hausted”)
• Model– Dependent Variable : No. of suicide in a given period(3 days)– Independent Variables
• CPI, unemployment rate, KOSPI, daylight hours, temperature• Two variables obtained from the Posts • Celebrity suicide (control variable)• No. of suicide from the previous period8
What should NSOs do?
scientifically collected data .vs. huge amount of data Challenge!
Sample Surveys
Established theoretical basis
Representativeness of target pop-ulation
Relatively slow
Expensive data collection
Big Data
Quantity beats quality
Lack of representativeness of tar-get population
MORE TIMELY
Data already there
9
KOSTAT tried…
Seminars
October 2012~March 2013
Organizes seminars once or twice a month inviting outside big data ex-perts
Aims to raise awareness of big data and its impact on producing official statistics
Pilot Project
December 2012~April 2013
A pilot project on the use of big data in the process of editing exist-ing national statistics
Using media data for examining outliers when producing the Index of Industrial Production(IIP)
10
KOSTAT is doing…1. E-Diary System(household Account System)
• Currently about 48.5% of sample household adopted the e-Diary system
• Respondents can import their expenditure information through online transactions from the banks, credit card companies and major retail stores.
using big data for the conve-nience of re-spondents
11
KOSTAT is doing…
KOSTAT is currently preparing for a pilot project on compiling price index using big data for a specific manufacturing product.
2. Pilot Project of Price Index
Please select specific do-mains(or items) that can
clearly show difference be-tween big data and existing
statisticsi.e. TV or electronic products
Prof. Roberto Rigobon
12
Future Challenges
Can we ignore Big data just because of its representativeness issue in spite of its strengths like timeliness?
Can KOSTAT disallow over 380 statistical agencies to produce official statistics with big data?
13
Maybe Not!Shall make use of big data in producing statistics at some point in the
future as it was the case with transition to administrative data from survey data.
Need to identify the limitations of big data through pilot projects and learn techniques and know how to refine big data based statistics for official statistics.
감사합니다 !Thank you very much!
Recommended