Alessandra Capobianchi and Luisa Franconi Istat -Division for Information Technology and Methodology - Italy
Ntts 2009Brussel 18-20 Febbruary 2009
Cell suppression in linked tables from structural business statistics using Tau Argus 3.3.0: a conceptual framework
What are linked tables?
NTTS 2009
Brussel 18-20 Febbruary
2009
Tables presenting data on the same response variable sharing some categories of at least one explanatory variable are said “Linked tables”.
Such esplanatory variable is called “linked variable”.
Motivation
NTTS 2009
Brussel 18-20 Febbruary
2009
- EUROSTATSince now Eurostat was in charge of protecting tables requested by SBS Regulations and performed a global confidentiality treatment. From 2008 Eurostat will not treat the confidentiality of the Sbs tables and each NIS has to take care on his own of such protection process
- SBS tablesCommunity Structural Business Statistics (SBS) are a set of hierarchical linked tables with spanning variable that present different levels of the hierarchy in different tables
- SoftwareTau-argus version 3.3.0 available at the website of the Essnet project http://neon.vb.cbs.nl/casc/..%5Ccasc%5Ctau.htm .Currently it doesn’t deal with hierarchical linked tables that present different levels of the hierarchy.
-NEEDTo develop a scheme to cope with such problem
• Community Structural Business Statistics (SBS) are collected within the framework of Council Regulation (EC, EURATOM) No. 58/97 of December 1996.
• Definitions and table breakdowns are specified in a series of Commission and Council Regulations.
• We focus our attention on the first four annexes covering the 'business economy‘ (Annex1), industry (Annex2), distributive trades (Annex3) and construction (Annex4).
Aim- Achive the protection of the Structural Business Statistics deriving from such annexes
Community Structural Business Statistics NTTS 2009
Brussel 18-20 Febbruary
2009
Conceptual scheme
NTTS 2009
Brussel 18-20 Febbruary
2009
The protection process of the SBS linked tables can be divided into three
steps:
1. Translate the legal framework into a set of tables create a set of tables for Argus
2. Analyse the links between tables establish an order in the protection of the tables 3. Apply Tau-Argus to each table according to the order previously
established maintain coherences in the suppression pattern
The tables we focus on are those related to:
annual enterprise statistics (at 4 digit Nace code)
annual enterprise statistics (at 3 digit Nace code) by size classes
annual enterprise statistics (at 2 digit Nace code) by region (NUTS2)
The main statistical unit is the enterprise even if some statistics are produced also for KAU and for local unit
Enterprise=the smallest combination of legal units that is an organisational unit producing goods or services
Kau= kind-of-activity unit that groups all the parts of an enterprise contributing to the performance of an activity at the class level (four digits) of NACE
Local Unit=an enterprise or part thereof (e.g. a workshop, factory, warehouse, office, mine or depot) situated in a geographically identified place
1.Traslation
NTTS 2009
Brussel 18-20 Febbruary
2009
However, such general scheme comprising three types of tables presents some relevant differences:
• the first table is replicated with KAU as response unit
• in the second table the variable size class presents two different classifications for different sectors (C-F and G-K);
• for sector G:• the regional table is released at NACE 3 level instead of NACE 2; • only for this sector there is the additional table relating to NACE 3 by turnover in classes.
1.Translation : some peculiarities
NTTS 2009
Brussel 18-20 Febbruary
2009
The tables considered in the general scheme need to be split into several tables that are homogeneous in the level of the classifying variables and response unit.
1.Translation: the set of tables
NTTS 2009
Brussel 18-20 Febbruary
2009
Definition of spanning variables for each table to be processed by Argus in order to fulfil SBS regulations
Tables processed by Argus
Classifying variable
Response unit
Annex
Tab1.1 NACE 4 Enterprise Annex 1A, Annex 2A, Annex 4A and Annex 3B
Tab1.2 NACE 4 (KAU) Annex 2E and Annex 4E
Tab2.1 NACE 3 by size class1
Enterprise Annex 2D and Annex 4D
Tab2.2 NACE 3 by size class2
Enterprise Annex 3C and Annex 1B
Tab2.3 NACE 3 by NUTS-2 local unit Annex 3E
Tab2.4 NACE 3 by Turnover in classes
Enterprise Annex 3C
Tab3.1 NACE 2 by NUTS-2 local unit Annex 1C, Annex 2F and Annex 4F
The analysis of the levels of the hierarchy of the “linked variable” implies the definition of a scheme of relationships that provides the order of the processing of the tables from the most detailed level of the hierarchy to the most aggregated.
That’s because more detailed cells of the table will contribute to the construction of marginal cells in other tables that present a lower level of the hierarchy of the linked variable.
Common cells need to present a coherent suppression pattern.
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
The most detailed tables is Tab1.1. That table is relative to all enterprises classified according to classes of NACE classification in 4 digit codes.
That table will be called the “starting table”.
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
The next table to be processed should present the hierarchical level of the “linked variable” immediately higher than the starting table.
In SBS Community statistics there are two tables: Tab2.1 and Tab2.2:
- present 3 digit NACE code as classifying variable.
- are related to different sectors of the economy and no link exist between the two tables
- present two different classifications of the variable size class relative to different sectors of NACE code
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
The next level of the hierarchy of the “linked variable” is 2 digit NACE code
In SBS Community statistics there is theTab3.1 that present this hierarchical level and is related to sectors C to K excluding G of the NACE classification.
-Tab3.1 presents marginal derived from tab2.1 and tab2.2 for sectors C to K excluding G
-The response unit are different but enterprises either coincide with local units or comprise of more than one local unit
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
Some relevant differences are presented for the sector G.
-The regional table is released at NACE 3 level instead of NACE 2; Tab2.3 and is linked only to tab2.2.
-There is an additional table released at NACE 3 level by turnover in classes; Tab2.4 linked to tab3.2
Also for sectors C-F the table at NACE 4 level is presented not only for enterprises but also for KAU; Tab1.2.
The tables tab1.1 and tab1.2 coincide almost perfectly so it has been decided to apply Tab1.1 pattern of suppression to Tab1.2.
2. Analysis of the Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
2. Analysis of the Set of Linked
Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
The order generated by the analysis of the links between the tables as described in the previous scheme aims to identify common cells in subsequent tables.
Common cells need to present a coherent suppression pattern.
Tau-Argus software allows to fix a setting of a priory information for cells selected by the user. Such flexibility of the software can be used to impose coherent suppression patterns to a set of tables
3. Protection phase: a priory information
NTTS 2009
Brussel 18-20 Febbruary
2009
This “A priori” information is organised in an history file.
In the history file the user, before the protection phase, can assign to predetermined cells of the table, one of the following protection “status” .
3. Protection phase: a priory information
NTTS 2009
Brussel 18-20 Febbruary
2009
Alphanumeric code
Meaning Action to be taken by Argus
U Unsafe The cell has to be protected.
S Safe The cell is not at risk; it can be used as secondary suppression.
P Protected The cell cannot to be used as secondary suppression.
The protection process using secondary cell suppression starts from:
Tab1.1; the most detailed table.
This table is protected by Argus according to the rules established by the Member State.
In order to communicate to Argus the information related to the protection of this starting table we ask the software to save the “status” information relative to each single cell of the protected tab1.1.
Five different output status are allowed by Tau-Argus;
3.Protection of a Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
The second step is to protect the second table of the scheme tab2.1
All the suppression applied to the previous table tab1.1 to the common marginal cells, have to be replicated in the current table tab2.1
This will be done by creating an history file for the current table (tab2.1) containing a priori information that will impose to Argus the constraints stemming from the protection of the previous table (tab1.1).
Different types of constraints may arise for each of the common cell.
3.Protection of a Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
3.Protection of a Set of Linked Tables
NTTS 2009
Brussel 18-20 Febbruary
2009
Output status of the common cell in the previous table
Meaning A priori information of the common cell of the current table
Action taken by Argus
1 Safe P This cell will not be selected as secondary suppression
5 Unsafe ─ Argus will recognise a primary suppression
11 Secondary suppression
U Set to manually unsafe in the current table i.e. to be protected
14 Missing value ─
9 Manually unsafe
U Set to manually unsafe in the current table i.e. cell to be protected
10 Manually safe P This cell will not be selected as secondary suppression
Output status in the previous table, meaning and corresponding status to be applied in the a priori information for the current table.
3.Process is applied following the relationship scheme
NTTS 2009
Brussel 18-20 Febbruary
2009
Tab1.1•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab2.1 and tab2.2•Select common cells (tab1.1 with tab2.1; tab1.1 and tab2.2) and create the “History files” (H2.1 , H2.2)
Tab2.1
• Apply History file H2.1•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab3.1 •Select common cells (tab2.1 with tab3.1) and create the “History file” (H3.1_1)
Tab2.2
•Apply History file H2.2•Apply Tau Argus•Create .saf file•Convert information contained in the .saf file into a priori information for tab3.1 •Select common cells (tab2.2 with tab3.1) and create the “History file” (H3.1_2)
Tab3.1
• Apply History files H3.1_1 and H3.1_2•Apply Tau Argus•Create .saf file
Conclusions and Further Work
Conclusion
•We describe a process to protect linked hierarchical tables from SBS using Tau-Argus
•We have successfully aaplied the process to the Italian sample of SBS
Further work
•With the entry into force of the new SBS regulation pertaining changes due to the adoption of the new classification of economic activities, NACE rev.2, more work need to be done.
•Study protection pattern harmonised between subsequent years need to be carefully tuned so that coherence is maintained not only within a year but also for successive years.