Upload
cory-francis
View
215
Download
0
Embed Size (px)
Citation preview
Overview
• Introduction• Related projects• Combining data sources• Selective editing – data sources and tools• Selective editing in SDWH Framework• Proposed case studies• Deliverable outcomes and recommendations
Introduction
• Selective editing options for a Statistical Data Warehouse – including options for weighting the importance of different outputs
• UK and Italy• Review or quality assure – Sweden (SELEKT)
• Q1: Would you like to review and give comments? (Yes/No)
Statistical Data Warehouse (SDWH)
• Benefits:– Decreased cost of data access and analysis– Common data model – Common tools– Drive increased use of administrative data– Faster and more automated data management
and dissemination
Statistical Data Warehouse (SDWH)
• Drawbacks:– Can have high cost – maintenance and implement
changes– Tools may need to be developed for statistical
processes– Methodological issues of SDWH framework –
covered by WP2• Phase 1 (SGA-1) “Work in progress” for most
NSIs
Combining data sources
• Many NSIs using admin data or registers to produce statistics
• Advantages include:– Reduction in data collection and statistical production
costs; large amount of data available; re-use data to reduce respondent burden.
• Drawbacks include:– Different unit types (statistical and legal); timeliness;
variable definition discrepancies.• Mixed source usually required
Editing
• UNECE Glossary of terms on Statistical Data Editing:– “an activity that involves assessing and
understanding data, and the three phases of detection, resolving, and treating anomalies…”
• Large amount of literature on:– Editing business surveys– Editing administrative data
Aims and related projects• This deliverable aims to add value by investigating how
to edit (selective editing) when combining sources• Mapping with other projects:
– EssNet on Data Integration– EssNet on Administrative Data– MEMOBUST– EDIMBUS Project (2007)– EUREDIT Project (2000-2003)– BLUE-ETS
• Q2: Do you know of any other relevant projects? (Yes/No)
Editing combined data sources
• SDWH will combine survey, register and admin data sources
• Editing required for:– maintaining business register and its quality;– a specific output and its integrated sources;– Improving the statistical system.
• Part of quality control in SDWH• Split processes for data sources? (e.g. France)
Combined sources - Questions…
• Q3: Do you currently combine data sources? – A. Yes; B. No; C. Unsure.
• Q4: Do you have separate editing processes for each data source? – A. Only survey data edited (do not edit admin data);– B. Data sources edited separately;– C. Data sources edited separately, but units/variables
in both sources edited for coherence;– D. Other.
Selective editing
• Editing – traditionally time consuming and expensive• Selective / significance editing:– Prioritises based on score function that expresses the
impact of their potential error on estimates– Score should consist of risk (suspicion) and influence
(potential impact) components– Divide anomalies into a critical and a noncritical stream
for possible clerical or manual resolution (possibly including follow-up)
– More efficient editing process
Selective editing – Survey and Admin data
• Use as auxiliary data in selective editing score function for survey data (e.g. UK, Italy)
• Use score of differences between data sources to determine which need manual intervention (e.g. France)
• Use scores based on historical data• Apply selective editing to admin data, same
score function as survey data, but weights=1 (e.g. France SBS system)
Selective editing – question
• Q5: Is selective editing used in the processing of admin/register data at your organisation?– A. No;– B. No, but admin data used as auxiliary for
selective editing of survey data;– C. No, but a score function is used to compare
data sources;– D. Yes, selective editing is applied to admin data; – E. Not sure.
Selective editing – tools
• SELEMIX – ISTAT• SELEKT – Statistics Sweden• Significance Editing Engine (SEE) – ABS• SLICE – Statistics Netherlands
• Q6: Are you aware of any other selective editing tools?– A. Yes, I can provide documentation; – B. Yes; – C. No.
Selective editing in SDWH
• Methodological issues:– Survey weight not meaningful in SDWH
• Weight=1?• Several sets of weights tailored for different uses?
– Selective editing data “without purpose”• Importance weight for all potential uses?• Alternative editing approach?
– Scores to compare data sources• Should score functions be used, or all discrepancies be followed up, or
automatically corrected?
– Selective editing of admin data – manual intervention?• Is selective editing appropriate if manual intervention is not possible?• Should automatic correction be applied to admin data identified as
suspicious?
Any solutions? …
• Survey weights used in selective editing score not meaningful– Q7: What do you think would be the best options:• A. Everything in SDHW represents itself and therefore
weights=1• B. Calculate several survey weights for all known uses
of unit data item and incorporate into one global score• C. Calculate separate scores for all outputs, and
combine (max, average, sum) • D. Other – discuss!
Any solutions? …
• Selective editing data “without purpose” – Q8: Is selective editing appropriate if the data will
be used multiple times, with unknown purpose at collection?• A. No;• B. No, another editing approach would be better;• C. Yes, we would use key known/likely outputs to
calculate the score;• D. Yes, I can suggest/recommend a solution;• E. Not sure;
Any solutions? …
• Scores to compare data sources – Q9: Should score functions be used to compare sources, or all
discrepancies be followed up, or automatically corrected?• A. All discrepancies need to be investigated by a data expert;• B. All discrepancies need to be flagged, and can then be corrected
automatically;• C. Scores should be used to flag only significant/influential
discrepancies, which should be investigated by a data expert;• D. Scores should be used to flag only significant/influential
discrepancies, which can then be corrected automatically;• E. Other – discuss?• F. Not sure.
Any solutions? …
• Selective editing of admin data– Q10: Is selective editing appropriate if manual intervention
is not possible?• A. No, only correct for fatal errors, systematic errors (e.g. unit
errors), and suspicious reporting patterns;• B. No, identify all errors/suspicious values and automatically
correct/impute;• C. Yes, identify only influential errors to avoid over
editing/imputing admin source;• D. Yes, as well as fatal errors, systematic errors and suspicious
reporting patterns – to also identify influential errors;• E. Other;• F. Not sure.
Experimental studies
• ISTAT: Prototype DWH for SBS– Use SELEMIX– Combine statistical and admin data sources at micro level to estimate
variables on economic accounts, known domains– Evaluate the quality of model-based selective editing and automatic
correction– Re-use available data for other output
• ONS: Combined sources for STS– Use SELEKT– Monthly business survey and VAT Turnover data– Compare selective editing or traditional editing of admin data
(followed by automatic correction), known domains – Re-use available data for other output
Deliverable outcome - recommendations
• Draft report put on CROS-portal – will include input from this workshop
• Provide recommendations for methodological issues of using selective editing in SDWH– Using best practice from NSIs, and– Outcome from experimental studies.
• Metadata checklist
Metadata requirements• Input to editing:
– Quality indictors (e.g. of data source)– Threshold for selective editing score– Potential publication domains– Question number– Predictor/Expected value for score (e.g. historical data, register data)– Domain total and/or standard error estimate for score– Edit identification– …
• Output from editing:– Raw and edited value– Selective editing score– Error number/description/type– Flag if suspicious– Flag if changed – …