View
216
Download
0
Category
Tags:
Preview:
Citation preview
«
«
CLASSIFICATIONS – a key element in the process of harmonization
«
Isabel Valente (isabel.valente@ine.pt)Statistics Portugal/Metadata Unit
Work Session on Quality management systems (Q2010)
Helsinki– 3 – 6 May, 2010
VariablesSurveys Concepts
Classifications
Thesaurus
DataWarehouseProduction systemsDissemination systems
1 In, Morgado, Isabel, “Metadata and survey documentation Portuguese NSI experience”, European Conference on Quality and Methodology in Official Statistics (Q2004), 24-26, May, 2004, Mainz-Germany.
Fig.1 Macro Architecture of the Statistical Metadata System1
Integrated System of Statistical Classifications (SINE)
conceptual model developed by the Neuchâtel group
SINE main phases
2002-2004- development of the consultation application
- replacement of the existing information on classifications in the Portal
2004 – 2005- enlargement of the information made available
- begin the gradual incorporation of code lists
- start the development of the management application
SINE main phases
2006-2007- consolidation of the management application
- small adjustments' and improvements in the consultation application
Current phase (2008)
- consolidation and improvement of the existing model
- of harmonization of the existing information
SINE main purposes
1. be a reference base about national, communitarian and international classifications for statistical ends
2. be a reference instrument for the classifications management
3. be an instrument for the harmonization and coordination of classifications
General ideasClassifications
• more conceptual• have a formal base• complex structures• big dimension• system of codification• formalized rules about
revisions and changes• versions are defined
Code lists• less conceptual• don’t have a formal base• simple structures • small dimension• could or not have a system of
codification• don’t have formalized rules about
revisions and changes• are not based over the idea of
version • operational lists of internal use of
the institution
Marital status
Degree of relationship with
the representative of the household
Ranks of turnover
Size classes of persons
employed
Sex
Classifications structures which have for base
Communitarian or national regulations
Methodological manuals
Communitarian or international recommendations
Reference structures
Consequence The remaining structures (code lists), whenever possible,
where approach to those structures
Problem encountered
Access to the code lists for the dissemination of data in 1st place
Access to the classification structure which is part of a recommendation or regulation in 2nd place
Another problem
How to distinguish between standard classifications or reference structures from those code lists?
Solution
Trying to find distinctive elements in the versions names
Norms for the writing of names (naming convention)
General form
Main part [+ “,”+formal qualifier] [“+” (“+ informal qualifier +”)”] [+ “-“+ variant n] Qualifier
Examples:- Nomenclature of territorial units for statistics, 2002 version- International standard classification of education, 1997 (levels of education)- Types of dwellings (4)
Specific form: variant
The variant is always the last part of the name and is formed by: “–”+ word “variant” + “variant” number
Examples:- CAE Rev.2 (sections C to E) – variant 1- Classes of net monthly wages (IEFA, €) - variant 1
Constitutent elements of the name version
Rules for the writing of names
reference structures• keep the original and official
name• keep the word “nomenclature" or
“classification” in the name• Informal qualifiers are added to
distinguish national classifications from communitarian ones.
code lists • could or not keep the original
name• couldn't have the word
“nomenclature" or “classification in the name
• informal qualifiers are added to distinguish the code lists
• if variants of a reference structure they keep the name or acronym of that classification
• the names should be general
Another problem
Lack of harmonization in the writing form of classifications and code lists as also in its contents
Internal rules to SINE for the writing of classifications and versions names
Names are initiated by a capital letter, followed by small caps. Exception to that: acronyms, names or words that followed an end point.
examples:• V00011 - Statistical classification of products by activity
in the European Economic Community, 2002 version • V00021 -International standard industrial classification of
all economic activities, revision 3.1
Internal rules to SINE for the writing of classifications and versions names
The names of code lists should use the plural form
example:• V01610 - Types of primary and lower secondary education
Code lists derived from a standard classification have to keep in its own name the acronym or name of the standard classification
examples:• V01675 - CAE Rev. 3 (total, sections C to N) - variant 2 • V01717 - CPA 2008 (legal services) - variant 7
Internal rules to SINE for the writing of classifications and versions names
Those code lists have to include the word variant in its name
example:• V02023 - Activity status (IEFA) - variant 4
Cumulative structures have to include in its name the expression “cumulative”
example:– V02069 - Countries (cumulative - air transport companies)
Internal rules to SINE for the writing of classifications and versions names
• The items labels should be in its extensive form. Abbreviations should be avoided. Exception to that: acronyms or names.
• Items labels are initiated by a capital letter, followed by small caps.
Problems with the names
• People give different names to the same things according with the perspective that is followed
• We should harmonize the expressions used avoiding to name the same things in a differently way
Problems with the names
Types of flow
Type of rail freight traffic
Type of movement in port
Type of traffic on the enterprise
Version Code Label
00811 T Total
00811 1 National
00811 2 International
Problems with the names
• However when we have too many versions of the same classification we need elements to distinguish between them.
Lists of countries
• compulsory harmonization of codes and labels of the items according with the Norm ISO alpha 2.
• the names of countries in Portuguese must be in accordance with the version approved by the Statistical Council.
• groupings of countries used in code lists had been centrally created and managed in order to establish a consistent and harmonized base of reference for this end.
• codes are always independent of the used language so they remain unchangeable in translations.
Activities or products code lists
• code lists derived from standard classifications had to keep codes and labels equal to those ones when equal.
• if different should have different codes and labels.
• for the aggregation of consecutive categories, codes are connected by a hyphen (i.e.: C-D).
• for the aggregation of non-consecutive categories connection is done by the particle “+” (i.e.: A+C).
Other code lists
• In code lists that integrate the same classification and without a standard classification for reference is tried to find the structure that is more including.
• Once found that structure it passes to be the reference structure. New code lists that appear are approached to that structure.
V00253 - Activity status, 2005 Code Label
1 Actives
11 Employed
12 Unemployed
121 Unemployed seeking first job
122 Unemployed seeking new job
2 Inactives
21 Pupils/students
22 Homemakers
23 Retired
24 Permanent disabled for work
25 Others
Other code lists
• For other code lists where it is not possible to find a standard and in which the categories little varied is promoted to keep unchangeable the codes and labels for the categories that where kept unchangeable.
Other code lists
• Use in code lists of certain codes for certain situations
• total codified with T • residual values preferential with 9, or
finished in 9
• promoted the use of codes and labels of structures already inserted in SINE in detriment of new codifications and formularizations.
Age groups
• ONU, Standard international age classification
• five-year and ten year age groups, with the boundaries generally beginning at multiples of five and ten and ending at four and nine
• ages separated by a hyphen, preceded and followed by a space, thus simplifying the use of particles and becoming them more generalist
Other size classes
• consecutive classes should be explicitly clear, so they should not repeat equal values in different classes
• in all items should be explicit what is the target of quantification (i.e.: years, euro, person, etc.).
• minimum and maximum thresholds should use normalized expressions:
– In the lower class “Less than” (i.e.: Less than 30 years).
– In the higher class “and more” following the last value immediately used (i.e.: 65 and more years).
– The signals “<”, “>”, “≤” and” ≥” should not be used
Other size classes
• numerical values higher than the thousand have to be separated by a space in order to make the reading between hundreds, thousands, tens of thousands, millions, etc., easier (10 000 000)
• or alternatively be adopt in its substitution powers of 10 (106)
Conclusions
• SINE gave to know what exist about classifications• widened the term to code lists• make classifications structures available:
– in a normalized format – in an easy way – at any time– in accordance with the users needs
Conclusions
• Because of that it was possible:
– the detection and correction of errors of writing– harmonization in the form of writing of codes and labels– to implement some harmonization procedures and rules– to improve the clarity and the precision of the terms used– to improve the integration between code lists and standard
classifications– harmonization of codes and labels between code lists– reduction of the number of code lists needed by the creation of
generic and transversal structures– Time profits– Bigger integration between the different metadata subsystems
Conclusions
Classifications systems are a key element for the improvement of the quality and coherence of
the existing metadata
the existing information
Recommended