Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Editing process and its quality regarding design and production phases using process metadata and calculation modules
Pauli Ollila 15 September 2015 Work Session on Statistical Data Editing Topic (iv): Evaluation and feedback
Contents - Different phases of creating and using the editing strategy - GSBPM and the phases of creating and using the editing strategy - Generic E&I-process and process step definitions from GSDEM - Designing editing strategy before production - One process step with functions, modules and production actions - Metadata in editing process - Planning, production and development tasks - Problematic situations and unexpected tasks in production - Development tasks at different levels - Conclusions (in a bit straightforward manner)
15 September 2015 Pauli Ollila 2
DIFFERENT PHASES OF CREATING AND USING THE EDITING STRATEGY
3 15 September 2015 Pauli Ollila
1) DESIGNING THE EDITING STRATEGY The phase includes a variety of different planning and decision actions for editing, ranging from the construction of the process flow to selecting some principles to carry out edit rules. 2) CREATING THE REALIZATION SYSTEM OF THE STRATEGY The phase includes the IT choices for carrying out the editing strategy, both containing solutions for different parts in the editing strategy and possibly decisions for interactions between different IT environments and data structures. In principle this phase could also include decisions about non-IT practices belonging to the strategy, e.g. paper questionnaire studies. There might be various ways to carry out the methods, but the corresponding IT solutions for the methods are denoted here as modules. 3) TESTING THE STRATEGY The phase includes operations in the realization (IT) system of the editing strategy with specific test data sets (unedited, edited), in practice mostly from earlier rounds of the production of statistics. The testing procedure may have a systematic structure, but known statistic-dependent problematic areas can also be studied. 4) APPLYING THE STRATEGY IN PRODUCTION The phase contains the implementation of the editing strategy with chosen methods and parameterization in the corresponding IT system with data set(s) to be edited in the production process of statistics. Note that editing in production can also happen in a collection phase, i.e. when we are acquiring the data gradually.
The editing process “contains a number of activities or tasks that aim to assess the plausibility of the data, identify potential problems and perform certain selected actions that intend to remedy the identified problems”. The editing strategy covers in addition to the editing process also many other things connected to the operations dealing with the editing process in different contexts.
1) Designing the edi-ng strategy
2) Crea-ng the realiza-on system of the strategy
3) Tes-ng the strategy
4) Applying the strategy in produc-on
(4.3 for edi,ng during data acquisi,on)
GSBPM AND THE PHASES OF CREATING AND USING THE EDITING STRATEGY
4
Process step definitions in GSDEM
- The editing process can be structured by splitting it up into sub-processes, called process steps, and a process flow that describes the navigation among the process steps during execution.
- An operational data editing process usually contains a considerable number of functions with specified methods that are executed in an organised way. The function types are Review, Selection and Amendment. A function is an instance of one of the three function types that serves a specific purpose in the chain of activities that leads to the edited data.
- Data editing functions specify what action is to be performed in terms of its purpose, but not how it is performed. The latter is specified by the process method. For the purposes of production some computerized methods are parameterized in order to define the process exactly. Methods can be interactive as well.
6 15 September 2015 Pauli Ollila
The connection between four phases of creating and using editing strategy and GSDEM is emphasized via terms process flow, process steps, functions, methods and parameterization.
PROCESS FLOW (here without order)
PROCESS STEP
PROCESS STEP (F) with func-ons defined
FUNCTION FUNCTION
FUNCTION
A SET OF METHODS
FOR FUNCTIONS
PROCESS STEP
PROCESS STEP
PROCESS STEP
PROCESS STEP
PROCESS FLOW
PROCESS STEP (MP) with methods and parameters
PROCESS STEP (M) with methods decided
METHOD METHOD
METHOD
PROCESS STEP
PROCESS STEP
PROCESS STEP
PROCESS STEP
PROCESS STEP
METHOD (with param.)
PLANNING DECIDING PROCESS FLOW WITH IT SOLUTIONS
PROCESS STEP (IT SOLUTION) with method decision and parameter
definition possibility
PROCESS STEP IT
METHOD MODULE
DEVELOPING IT SYSTEM
METHOD (with param.)
METHOD (with param.)
PROCESS STEP IT
PROCESS STEP IT
PROCESS STEP IT
PROCESS STEP IT
METHOD MODULE
METHOD MODULE
NEEDED IN PLANNING: Process flow, Process steps, Func-ons
NEEDED IN IT SOLUTIONS: Modules for Methods, Parameteriza-ons for Modules
NEEDED IN DECIDING: Methods for Func,ons, Parameteriza,ons for Methods
A SET OF MODULES
FOR METHODS
DESIGNING EDITING STRATEGY BEFORE PRODUCTION (ideal situa1on)
ONE PROCESS STEP WITH FUNCTIONS, MODULES AND PRODUCTION ACTIONS
15 September 2015 Pauli Ollila 8
According to the edi-ng process it tells what needs to be done and to be obtained as results.
Includes user ac-ons within the IT environment: parameter defini-ons, interac-ve opera-ons, decisions, possibly prepara-ons and updates.
Provides modules to carry out the tasks given in planning with chosen methods. Supported by a metadata system for user defini-ons and process-‐based parameter produc-on with metadata for monitoring.
15 September 2015 Pauli Ollila 9
INPUT METADATA (needed for the ediAng process, i.e. referenAal metadata) -‐ User-‐made metadata at the produc-on stage (e.g. parameter defini-ons, rules for data treatment) -‐ Imported metadata (predefined metadata with e.g. parameters, auxiliary data*, other non-‐sta-s-cal data*)
-‐ Process-‐made metadata (mainly func-on indicators and derived variables for further use) OUTPUT METADATA (produced in the ediAng process, i.e. paradata) -‐ Func-on indicator metadata (may be used as subsequent input metadata but also informa-on about edi-ng history of a unit)
-‐ Process metrics metadata (metrics describing the process and its quality) -‐ Process informa-on metadata (what happens, what is used, excep-onal situa-ons, warnings …)
METADATA IN EDITING PROCESS NOTE: Subclassifica,on not from the task team work, this is only some general descrip,on (not ”official”)
STEERING THE PROCESS mainly via metadata for parameteriza-on (e.g. edit rules, limits, method-‐dependent parameters) with some decisions and defini-ons during the process. MONITORING AND EVALUATING THE PROCESS mainly via output metadata of process indicators and process informa-on together with experiences and special studies some-mes.
Planning, producAon and development tasks
15 September 2015 Pauli Ollila 10
PLANNING TASKS -‐ Process flow crea-on -‐ IT system for edi-ng -‐ Process step crea-on -‐ Func-on crea-on -‐ Method module prepara-ons -‐ Method selec-ons -‐ Parameteriza-on As seen in previous slides. In many cases some of these tasks aren’t carried out in a very systema,c way.
DEVELOPMENT TASKS -‐ Changes in process flow -‐ Changes in systems carrying out process flow -‐ Changes in process steps -‐ Changes in func-ons -‐ Changes in method modules -‐ Changes in methods -‐ Changes in parameteriza-on Possible development tasks are based on metadata for monitoring and evalua,on together with experiences during the process and special studies. A separate slide describes these tasks.
PRODUCTION TASKS A. PREDEFINED TASKS -‐ Data prepara-ons* -‐ Module prepara-ons* -‐ Parameter defini-ons -‐ Submibng modules -‐ Monitoring process -‐ User decisions* -‐ Interac-ve treatment B. UNEXPECTED TASKS The tasks which should be carried out according to the process flow are predefined. Unexpected tasks are caused by varying problema,c situa,ons (separate slide).
Problema)c situa)ons leading to unexpected tasks in produc)on (examples)
15 September 2015 Pauli Ollila 11
STATISTICAL DATA
IT ENVIRONMENTS INPUT METADATA
IT SYSTEM & MODULES DEFINITIONS
AUXILIARY DATA
EDITING PROCESS IN PRODUCTION
PROCESS
• Insufficiencies or substance changes à extra studies, further data processing, …
• Method- or process-unsuitable situations in data à reclassification, unit aggregation, method revision, …
• Insufficiencies and errors in modules / programs à Error checks, programming, …
• Unsuitable modules for some situations à Improving programs / modules (only if possible in the latter), …
• Wrong definitions for process or modules à finding out right ways to define, adjusting instructions, …
• Location, transfer, conversion and/or preparation problems for data sets à various solutions in the corresponding environment
• Conflicts between statistical and auxiliary data à extra studies, further data processing, …
• Problems in data sets in different time points à harmonization attempts, …
• Insufficient or wrong indicator data à changing procedures or definitions behind the indicator data, adjusting the current process …
• Process not ready for some real situations à making additional modules / programs, …
DEVELOPMENT TASKS AT DIFFERENT LEVELS (1) Changes in process flow • Rather excep-onal, and usually -ed to large-‐scale projects in order to improve the efficiency of the edi-ng process, possibly following the idea of harmonizing the edi-ng process among the sta-s-cs in the sta-s-cal office.
• The edi-ng project at Sta-s-cs Finland opened the possibility to revise the process flow structure in some sta-s-cs, especially including the selec-ve edi-ng process step in it.
Changes in systems carrying out process steps • The substan-al changes in the IT system are quite rare, but they are conducted especially when there is a need for more systema-c processing system with all-‐covering metadata structure and calcula-on of indicators at different levels.
• A SAS EG applica-on called EG EDIT was constructed to u-lize the proper-es of BANFF and SELEKT packages with addi-onal macro modules for the needs in designing and implemen-ng the edi-ng process of various sta-s-cs. The package has a metadata system collec-ng necessary informa-on about the process and defini-ons for the process.
• The defini-onal metadata for steering the process (collected automa-cally from the defini-ons given in EG EDIT) is available in full during the process. The parameteriza-on of the current process is one part of the defini-onal metadata. One feature of the process-‐like project form in EG EDIT is to create process metrics automa-cally when the implementa-on of the applica-on is going on.
15 September 2015 Pauli Ollila 12
DEVELOPMENT TASKS AT DIFFERENT LEVELS (2) Changes in process steps • The process steps can change (or new steps can emerge) if there is a need for revision in some part of the process flow.
• A process step added to the process flow in some sta-s-cs was the automa,c correc,on of observed fatal errors with exact solu,ons, mainly dealing with thousand errors. A more complex process step with new func-onal elements for many sta-s-cs was selec,ve edi,ng.
Changes in func-ons • There may be changes in func-ons appearing in the process steps, when some renewal is carried out in parts of the sta-s-cal process. An example of this is the inclusion of the score calcula-on func-ons to the sta-s-cs in the development project.
Changes in method modules • The modules (procedures, macros, program codes etc.) for methods are subject to development especially in sta-s-cs having non-‐systema-c structure of the IT system. Some-mes the needs for new methods require new module solu-ons or even tool packages.
• EG EDIT has a set of modules in a constant basic structure and the modules are steered with separate defini-on blocks of parameters.
• It is not self-‐evident that the modules will work in all data situa-ons and defini-ons (e.g. Waste sta-s-cs with too detailed waste code levels). Some adjustments are then needed.
15 September 2015 Pauli Ollila 13
DEVELOPMENT TASKS AT DIFFERENT LEVELS (3) Changes in method selec-ons • The methods are changed occasionally in the process steps, mainly due to some new reasoning or studies in or when there are new IT modules available for more sophis-cated methods.
• Finding suitable types of edit rules was an important part in developing efficient error recogni-on in sta-s-cs. In addi-on to rules found based on error studies on suspicious phenomena in the sta-s-cal data, “tricks” for efficient error recogni-on in some cases were found.
Changes in parameteriza-on • Generally at least some changes in parameteriza-on are carried out between different rounds, some of them following the changes in the substance area of the sta-s-cs.
• Probably the most common example is to adjust edit rules or parameters of rules (limits, condi-ons etc.) or to make new rules. When e.g. using simple limits in query edits or parameters for outlier recogni-ons, some monitoring of the development in the field to be studied is considered to be good prac-ce, and there are plans to build this kind of mechanism in the process of some sta-s-cs.
• The parameteriza-on while developing the edi-ng process of the sta-s-cs is rather frequent, especially exact defini-on of the edit rules. This work needs substance knowledge of the sta-s-c in ques-on.
15 September 2015 Pauli Ollila 14
Conclusions (in a bit straighKorward manner, not official) • Construct your edi-ng process flow well with sufficiently defined process steps containing edi-ng func-ons needed in real sta-s-c produc-on. Don’t leave “holes” to the process. Use GSDEM sugges-ons for help in the task.
• Study methods available for func-ons together with experiences and recommenda-ons. • Test the method alterna-ves with real data and adjust parameters to the data situa-on. • If possible, u-lize exis-ng IT modules/programs/applica-on parts. Connect IT personnel to the development of the process and the metadata environment.
• Try to minimize programming and non-‐parameterized data prepara-on and process work in produc-on. Always try to improve the system in order to avoid unexpected tasks in the future. However, process-‐included interac-ve treatment is not an unexpected task.
• Monitor the whole process during conduc-ng, and analyze the metadata describing the process together with experiences for further development.
• Test the methodological choices from -me to -me.
15 September 2015 Pauli Ollila 15