View
23
Download
0
Category
Tags:
Preview:
DESCRIPTION
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION TOOLS PERSPECTIVE. Matjaž Jug, Pavle Kozjek, Tomaž Špeh Statistical Office of the Republic of Slovenia. Overview. Current statistical production cycle in SORS Using the metadata in B laise applications - PowerPoint PPT Presentation
Citation preview
USING THE METADATA IN STATISTICAL PROCESSING CYCLE – THE PRODUCTION
TOOLS PERSPECTIVE
Matjaž Jug, Pavle Kozjek, Tomaž ŠpehStatistical Office of the Republic of
Slovenia
Overview
Current statistical production cycle in SORS
Using the metadata in Blaise applications The role of metadata in automatic editing
system in SAS Metadata connected with the data in
Oracle data warehouse Lessons learnt Questions
Current statistical production cycle
Entry and micro editing (Blaise) Macro and statistical editing (SAS) Storing and analysis (Oracle) Dissemination (PC-Axis) Central metadata stores (Klasje &
Metis)
Using the metadata in Blaise applications
Generation of (high speed) data-entry applications using Gentry (using by non-IT personnel)
Metadata-based transformations between different data structures (EXTRA-FAT, FAT, THIN)
Gentry – tool for generation of the Blaise data-entry application
Questionnaire structure and layout (name, blocks, tables, routing etc.)
Field characteristics (length, data type, constants, other parameters)
Field characteristics
Data type
Gentry – example of generated application
section
header
Data entry for table 12
Transformations
PROVIDER Industry SizeClass QuantityProduct1 ValueEURProduct1 QuantityProduct2 ValueEURProduct2Provider 1 A big 150 300 200 400
PROVIDER PRODUCT Industry SizeClass Quantity ValueEURProvider 1 Product 1 A big 150 300Provider 1 Product 2 A big 200 400
All data for one unit (provider) in one row (EXTRA FAT): suitable for micro editing
PROVIDER PRODUCT Industry SizeClass ContVariables ContObservationsProvider 1 Product 1 A big Quantity 150Provider 1 Product 2 A big ValueEUR 300Provider 1 Product 1 A big Quantity 200Provider 1 Product 2 A big ValueEUR 400
Classification and continuous variables in the columns (FAT): suitable for analysis
Classification variables in the columns and continuous variables in the rows (THIN)
Metadata-based transformation in Blaise
Metadata-based transformation in SAS
The role of metadata in automatic editing system in SAS
General system for automated editing
Process metadata
The role of metadata in automatic editing system in SAS
In order to be general the tool must be able to:- recognize the data which are due to be
subjected to editing and/or imputation; - recognize which editing method should be
applied,- and with what parameters
Process indicators – level 1
Mode of data collection - 1 data provided directly by reporting unit- 2 data from administrative source- 3 data computed from original values- 4 imputed data – imputation of non-response- 5 imputed data – imputation due to invalid
values detected through the editing process- 6 data missing because the unit is not
eligible for the item (logical skip)
Process indicators – level 2
Data status - 1 original value- 2 corrected value
Process indicators – level 3
Method of data correction - 11 correction after telephone contact- 12 data reported at a later stage
Process indicators – level 3
Reporting methods - 11 reporting by mail questionnaire- 12 computer assisted telephone interview(CATI)- 13 telephone interview without computer
assistance- 14 paper assisted personal interview (PAPI)- 15 computer assisted personal interview (CAPI)- 16 paper assisted self interviewing - 17 computer assisted self interviewing- 18 web reporting
Process indicators – level 3
Imputation methods - 10 method of zero values- 11 logical imputation- 12 historical data imputation- 13 mean values imputation- 14 nearest neighbour imputation- 15 hot-deck imputation- 16 cold-deck imputation- 17 regression imputation- 18 method of the most frequent value- 19 estimation of anual value based on infraanual data- 21 stochastic hot-deck (random donor) - 22 regression imputation with random residuals- 23 multiple imputation
Process indicators examples - xy.zz
11.15 means:1 - data provided
directly by reporting unit
11 - original value11.15 - computer
assisted personal interview (CAPI)
42.19 means:4 - imputed data –
imputation of non-response
42 - corrected value42.19 - estimation of
anual value based on infraanual data
Statistical process
Key responders
Other units
SAS
Blaise
OracleSAS
Blaise
Metadata connected with the data in Oracle data warehouse
On-line access to:- Historical data- Data from different phases (not only final
data)- Data for multiple surveys (not only data
marts)- Statistical (variables & classifications) and
process (time stamps, status indicators...) metadata connected with the data
...accessible for third-party tools
Conceptual star scheme for SBS
...
CLASSIFICATION DIMENSIONSMETADATA DIMENSIONS
FACTS_SBS
OBS_VALUEWEIGHT
DIM_NACE
DIM_NUTS
DIM_ORG_FORM
DIM_VARIABLE
DIM_SOURCE
DIM_INDICATORDIM_OBS_UNIT
DIM_TIME
THIN table design
Loading
Transactionalstar
Editingtable
Input tables
Oracle Discoverer
Analytical cubes
Imputations
Uncorrect data
Imputation table
Input data
Editing
Cleandata
Corrected data
Automaticcorrections
Editing form
Manualcorrections
Classificationserver
Business register
Searching
Imputeddata
Imputations
Classifications
Metadataserver
VariablesSources...
Extractions
Analyticaldata
Extracted data
Analysis Results
Controlquery
Analyticalquery
Parameterquery
Lessons learnt
The role of central repositories for metadata- Natural source of conceptual metadata- Metadata have to be exact, complete and consistant- Process metadata should be connected with the
data
Harmonisation of metadata concepts- Local metadata vs. global metadata- The cultural change is needed
Technical considerations- The possibilities for metadata exchange and system
integration are good (XML, SQL)
Questions
Recommended