24
ECA Standard Operating Procedure for Data Collection, Compilation and Dissemination Prepared by ACS Draft, version 3.2.6, 2 nd December 2014

ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

ECA Standard Operating Procedure for Data Collection,Compilation and Dissemination

Prepared by ACS

Draft, version 3.2.6, 2nd December 2014

Page 2: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page i

ContentsAcronyms and abbreviations ..............................................................................................................................ii

Background and objectives of the document..................................................................................................... 1

I. Standard operating procedures .................................................................................................................. 1

I.1. Actors involve in this SOP and global organization.......................................................................... 1

I.2. Procedure for the creation of the National data working group ........................................................ 2

I.3. Procedures for data collection ........................................................................................................... 2

I.4. Procedure for data verification and validation................................................................................... 4

I.5. Procedures for data dissemination ..................................................................................................... 5

II. Procedures for data collection, compilation and validation for country statistical notes .......................... 6

III. Procedures for data collection, compilation and validation for ad hoc activities .................................. 6

IV. Procedures for data collection, compilation and validation for regular flagship production................. 7

V. Risk analysis on implementing this SOP and mitigating strategies........................................................... 7

Annexes ........................................................................................................................................................... 10

Annex 1: Thematic Assignments................................................................................................................. 10

Annex 2: Country Assignments................................................................................................................... 11

Annex 3: ECA Data Management Protocol ................................................................................................ 13

Page 3: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page ii

Acronyms and abbreviations

ACS Africa Centre for Statistics

AfDB African Development Bank

ECA United Nations Economic Commission for Africa

EDC ECA Data Committee

FAO Food and Agricultural Organization

NDWG National Data Working Group

NSDS National Statistical Development System

NSIS National Statistical Information System

NSO National Statistical Office

REC Regional Economic Communities

SOP Standard Operating Procedure

SRDC Sub Regional Data Centre

SRO Sub Regional Office

Page 4: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 1

Background and objectives of the documentThis document should be read in conjunction with the Data Management Protocol (DMP), which hasdefined in general terms the roles and responsibilities for data collection, management anddissemination in ECA. The aim of the Standard Operating Procedures (SOP) document is to expandon those and agree on a workflow to be followed by all the actors and stakeholders. It is a set ofinstructions and steps to follow to collect, centralize, validate and disseminate statistical data at ECAlevel.

The data concerned here are statistical and geospatial data used for all ECA publications, reports,briefs and others knowledge products which shall be stored in a corporate database for one stop accessby all.

This SOP is based on the premise that the data used for all policy and knowledge products in ECAshall be derived from the same corporate database maintained by the African Centre for Statistics(ACS). It aims to ensure that all operations are done consistently in order to maintain quality controlof processes and to serve as a reference document on data collection in ECA. It covers data collection,compilation and dissemination for current activities, country statistical notes and ad hoc activities.

I. Standard operating proceduresThis section covers the main procedures for data collection, verification, validation and disseminationfor current activities including the updating of ECA database.

I.1.Actors involve in this SOP and global organizationThe data collection, compilation and dissemination for current activities involve many actors: suchas, countries, SRO, ACS and other development partners.

At country level, we propose to create a National Data Working Group (NDWG) based on FAO’sexperience with CountrySTAT Technical Working Group, that have worked successfully foragricultural statistics; and focal points for ECA-AUC-AfDB joint activities that have principallyserved for the compilation of the Africa Statistical Year Book. These NDWGs will be led by theNational Statistical Offices (NSOs), who will provide their secretariats. NDWG members will comefrom the main actors of National Statistical System (NSS). If necessary, it may also need to involveUN Country Team and other data providers, such as Regional Economic Communities (RECs).

With ECA’s restructuring, the responsibilities of the sub regional offices have been redefined toemphasize their roles in collecting and disseminating data about their sub regions. The sub regionaldatacenters (SRDCs) will be the main interface between ECA and the national data sources, includingthe RECs. As such, they will be the focal point at regional level for data collection, compilation anddissemination.

ACS will ensure coordination, monitoring, tracking and evaluation of all activities including data-related activities of SRDCs and NDWGs, and will ensure data governance, data quality and datacomparability.

Page 5: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 2

I.2.Procedure for the creation of the National data working groupThis procedure aims at specifying steps for the creation of a national data working group withincountries.

SRDCs and ACS will evaluate the existing statistical coordination mechanism at the nationallevel and explore ways of linking the NDWG to the existing coordination structures, whichdeal mostly with policy and strategic issues.

SRDCs and ACS will map the existing data working groups in the countries and identify onethat is best suited to our requirements. It is expected that this will be the one linked to theSteering Group overseeing the implementation of the National Strategy for Development ofStatistics (NSDS).

SRDCs will initiate the constitution of the NDWGs in close collaboration with NSOs andFAO to build and link the NDWGs with NSDS.

With the assistance of the ECA-AUC-AfDB data focal points, identify and document sectoralfocal points that provide data to the NSOs. This list would be made available as part of theECA database so that data users who need clarification on sectoral statistics can contact therelevant focal points directly.

Undertake an advocacy mission to the country to meet the senior officials of NSOs andsectoral data providers to explain the NDWG concept and get their support, before dealingwith technical staff. During the visit, agree on the membership of the proposed NDWG andfix a date for a national workshop.

Conduct the national consultative workshop and launch the NDWG.

I.3.Procedures for data collectionThere are two broad types of data sources. One is datasets published by National Statistical System,which are regarded as official statistics used by the respective countries. The second is datasetsreleased by regional and international organizations which are harmonized and comparable. TheNDWG will be responsible for the collection of official statistics. SRDCs will receive the datasetsfrom the NDWGs and harvest those already available at RECs level. ACS will harvest datasetsreleased by international organizations. Wherever possible, the data will be available by variousdesegregations such as gender, age, rural/urban, geography etc.

The underlying principle is to harvest data as and when they are available or published in country andother regional sources, which may be available in the form of online download, questionnaire, webservices and even printed publications.

Procedure 1: Evaluate institutional and technical capacity in data collection

ACS assesses capacity and training needs in SRDC and countries (in collaboration andcoordination with IDEP);

ACS and SRDCs identify RECs and country needs in building capacities in the area relatedto their activities (data quality, standards and classification, concepts and definition, metadataframework, data dissemination, data quality and validation, geospatial informationmanagement) and propose a program to build capacities in collaboration with IDEP;

ACS initiates proposals to SRDCs and countries for capacity building and/or training.

Procedure 2: Evaluate the used of international standards

ACS identifies international standards, classifications, concepts and definitions to be usedwhich can be considered as reference standards for the project;

SRDCs in collaboration with ACS build a matrix of national standards, classifications,concepts and definitions used in statistical activities in countries;

Page 6: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 3

ACS and SRDCs identify gaps for each country and set up a program to encourage countriesto use references and harmonized standards and in particular to address these gaps.

Procedure 3: Developing data collection tools

ACS develops metadata framework. Twice a year, at the beginning and middle of the year, ACS in collaboration with ECA

divisions and SROs identify anticipated data needs of ECA users from programme and projectwork plans.

From the identified needs, ACS and ECA divisions and SROs reviews the lists of core andnon-core indicators, identifying new data needs.

For each indicator, ACS and SRDCs identify all possible data sources, indicating theirreliability and lead times for data harvesting.

ACS and SRDCs review existing data harvesting tools such as questionnaires, data entry anddata control rules, and identify where revisions or updates are required, and new ones thatmay be needed.

ACS implements enhancements or updates to existing tools and develops new ones asnecessary.

Procedure 4: Produce a data harvesting calendar and include partners needs

SRDCs, in collaboration with NDWGs, make an inventory of indicative data release dates inthe countries and regional levels and build a tentative data release calendar for each indicatorby source, focal points, with contact details;

SRDCs, in collaboration with NDWGs document the institutional bulk data users that receivedata regularly

SRDC, in collaboration with NDWGs, review the quality checks and processing/formattingrequirements of the bulk users; contact them to clarify needs.

ACS, in collaboration with SRDC and NDWG, consulting institutional users as necessary,customize tools for applying the checks and processing.

SRDC supports NDWG to prepare data for institutional users as and when released, harvestingfor ECA database at the same time.

Procedure 5: Data harvesting at country levelThe main responsibility for this activity is with the NDWGs under the coordination of each SRDC.Data will be harvested as and when they are available or published in country and other sources. Theymay be available in form of online downloads, questionnaire, web services and even printedpublications.

NDWGs members collect data and metadata from their respective sources using datacollection tools sent to the group by the SDRCs;

Periodicity of data collection will be linked to release calendar.

Procedure 6: Data harvesting at SRO levelThis activity is led by the SRDCs which build relationship with regional institutions who can providedata. Data collection is done using data collection tools developed by ACS. The procedure is asfollows:

Generic questionnaire with associated metadata is sent to focal point who will fill and returnit to the corresponding SRDC;

Page 7: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 4

SRDC collects additional datasets directly from the online regional databases, publications orother dissemination formats and media. To facilitate this, cooperation agreements can beestablished with data providers. In this case, the text is to be prepared by ACS in cooperationwith SROs.

Physical datasets are collected with appropriate metadata; Periodicity of data collected is linked to release calendar.

Procedure 7: Data harvesting at ACS levelThe procedure for harvesting data from international sources that are mandated to harmonise anddisseminate them, and for non-core indicators for which data are not readily available are as follows:

Based on the data needs identified during the review of data needs, ACS identifiesinternational data sources and their tentative release calendars.

ACS determines if special cooperation agreements are required for any of the datasets andarranges for such agreements.

Statistical Assistants collect physical datasets with appropriate metadata. Concerning indicators for which data are not readily available, ACS will work with

international agencies, AUC, AfDB and other specialized organization to developmethodologies for collecting such data, including engaging in pilot projects, and, dependingon availability of resources, supporting countries with previously planned statistical surveysand censuses or undertaking special surveys.

I.4.Procedure for data verification and validation

Procedure 8: Produce data verification rulesThis procedure aims to specify actors who are defining and implementing data verification rules.

ACS and SRDCs professionals provide rules for data quality checks, including redundancy indata collection tools for more quality check and integrity check to be performed at data entryfor each indicator and value. These rules will be done in collaboration with domains specialistfrom other ECA’s substantive divisions;

ACS and SRDCs professionals define validation criteria that clearly and systematicallyconfirm whether the data satisfy the requirements of completeness, integrity, arithmetic andcongruence, as well as guarantee the overall quality of the data;

ACS Statistical Assistants integrate them in electronic questionnaires and data collectiontools.

Procedure 9: Ensuring data qualityThe objective of this procedure is to specify who is doing what in the validation and data qualitycontrol. This procedure gives rule to follow for ensuring quality and to validate data.

All actors involved in the process run verification rules and validation criteria defined in theprevious procedure for data harvested at their level;

At ACS level, professional staff will validate data harvested by Statistical Assistants and dataprovided by NDWGs by verifying the coherence in terms of information and the realities ofthe country, as well as by avoiding data discrepancies between indicators.

Page 8: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 5

At SRO, SRDCs staff will validate data harvested from RECs and by NDWGs.

If a dataset failed the verification rules and validation criteria, any change in data collectedshould be discussed back to the data provider before making the change;

ACS and SROs statistical assistants and professionals keep track of metadata, includingclassification, methods and concepts used when data are harvested;

If the data for one indicator are significantly different according to data sources, official datafrom the country have to be given primacy and accepted into the database.

ACS shall liaise with subject specialists in other divisions to define and continuously reviewdata validation criteria and implement software tools to check for completeness, consistencyand arithmetic integrity and other key checks in data verification and data validation.

Procedure 10: Data transmissionThis procedure describes data transmission from NDWGs to SRO and from SROs to ECA corporatedatabase

After verification and validation by the NDWGs, the datasets have to be transmitted by emailor FTP to the SRDCs for a new second level verification.

After a verification and validation of datasets at the SRDC level, data are entered into theSRDC workspace in the corporate database.

ACS is automatically notified by the system to run the final validation tests. At this stage, dataare not visible to end user before final data validation.

ACS runs a final validation test and posts the data for end users.

I.5.Procedures for data dissemination

Procedure 11: Disseminating data via the ECA databaseThis procedure explains how ECA databases updated with harvested data.

As soon as data are harvested and validated, SRDCs and ACS will ingest them into ECAdatabase.

ACS ensures quality and validate ingested data and metadata prior to make them available forthe public, based on rules and criteria developed in previous procedures;

Official data from the country have to be given primacy and accepted into the database. There,however, should be provision for making quality statements if necessary;

In consultation with discipline specialists, ACS develops standard reports and dashbords,including infographics and specific analyses to provide supplemental information for anyECA’s analysis. Users will be offered a menu of available report products to choose from.

Options will be available for users to request custom dashboards, which will then be added tothe menu of available reports for other users to choose from.

Procedure 12: Improving data dissemination via the ECA database

DTS integrates a feedback interface in the databank to collect users’ feedback; DTS analyses periodically feedback data and use results to improve all the process;

ACS conducts regular user surveys in order to better understand their needs.

Page 9: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 6

II. Procedures for data collection, compilation and validation for country statistical notes

These procedures aim at determining actors and their roles, and the process for producting countrystatistical notes.

Procedure 13: Production of Country statistical notes

ACS provides an initial template with an initial set of indicators;

ACS revises template based on the input from all sections quarterly; ACS assigns country and thematic areas to professionals; Statistical Assistants compile and review datasets under guidance of professionals and section

chiefs. This includes discussion on sources, definitions, classification, validation process etc.; Statistical Assistants submit to central repository at the Data Technology Section (DTS)

weekly (Monday) on progress on data collection; Professional and Statistical Assistants sign off the dataset submitted to the central repository

quarterly; DTS do basic and automatic validation using macros and scripts; ACS professionals, in consultation with SROs, prepare explanatory narratives for countries

assigned to them; ACS approves the country statistical draft;

ACS is in charge of the final production and dissemination of the document; To be more efficient, these procedures must be redesigned in milestones by the coordinator

of this activity as designated by ACS’s Director.

III. Procedures for data collection, compilation and validation for ad hoc activitiesThis section covers the main procedures for data collection, verification, validation and disseminationfor ad hoc activities including specifics statistical publications for a particular event.

Procedure 14: Ad hoc activities

The applicant contact ACS to specify objectives, expected results and to define deadline; ACS appoints a project manager and sets up a joint team with the requesting division or office; The joint team prepares document template including proposed indicators and infographics in

collaboration with professionals based on countries and thematic assignments; DTS reviews the template proposed by the joint team and develops necessary software tools. ACS reviews the indicators in the template against the indicators database, clarifies definitions

and metadata with requesting division/office, and identifies indicators for which data do notexist in the database and determines how to collect them, as already described.

Statistical Assistants compile and review datasets under guidance of professionals and sectionchiefs.

Statistical Assistants prepare infographics under guidance of professionals and projectmanager;

Professionals, in collaboration with concerned SROs, prepare narrative in their assigned areas.

ACS and the applicant approves the draft of the ad hoc document. ACS is in charge of the final production and dissemination of the document.

Page 10: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 7

IV. Procedures for data collection, compilation and validation for regular flagship productionThis section covers the main procedures of for regular flagship production like Country Profiles andEconomic Report on Africa.

Procedure 15: Regular flagship production

At the beginning of the year, ACS appoints a project manager (ACS side) for each regularflagship production;

ACS and the leading division set up a joint team for each production who will follow activitiesrelated to data collection, tables, graphs, maps and infographics production;

The joint team defines the overall expected output, agenda and milestones;

Based on the flagship concept notes, the joint team specifies templates for tables, graphs, mapsand type or infographics needed;

Based on the flagship concept notes, the project manager evaluates data needs and identifieswhat is available or not in ECA’s database and sets up an appropriate data collection process:

o Statistical Assistants compile indicators available at ECA according to the templatesadopted;

o For data unavailable at ECA’s database but that can be obtained with some effort, ACSwill work with SRDCs and members states to obtain/compile these data from theexisting surveys, censuses, administrative sources etc.

o Concerning new indicators for which data do not exist, ACS will define themethodology for their production, for quality control and validation in collaborationwith subject specialists available at ECA and specialized UN organizations, RECs orInternational Agencies if needed. Then, these indicators will be collected or computedfollowing the developed methodology by statistical assistant and professionals;

When necessary, the joint team may appeal to the knowledge of specialists in some ECA’sdivisions in specific areas, including imputations, forecasts and estimates;

Statistical Assistants prepare infographics, tables, maps and graphics under guidance ofprofessionals and project manager;

If needed, narration is prepared by professionals in their respective area. SROs can beinvolved if necessary;

ACS’s Director and Section Chiefs validates prepared output;

The project manager send officially the output to the division leading the production; The division leading the production approves the output.

V. Risk analysis on implementing this SOP and mitigating strategies

The following risks have been identified:

Incentives for NDWG (country level) and partners, including UN agencies and RECs; Financing data collection;

Lack of staffs (numbers and profile) in SRDC; Lack of specific capacity in RECs and countries;

The use of non-harmonized standards and classifications, concepts and definitions at countrylevel.

Page 11: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 8

The main incentives at country level is that the procedure will help NSOs to play their role asdepositary and custodian of national data and facilitate data collection activities in so far as they willno longer receive multiple questionnaires from international institutions. And even if they continueto receive these questionnaires, they will already have data available according to internationalstandards and they will only have to select and transfer those data to the partners according to theirneeds.

To get the support of partners for the SOP, SROs and/or ACS will undertake advocacy to countriesto obtain buy-in from senior management of NSOs, key national data providers andpartners/stakeholders. This advocacy mission also aims at bringing in other UN agencies whichregularly collect data in countries in order to take into account their needs and contributions, andtherefore minimize the burden of reporting national data.

Concerning the funding, the distribution of resources for every activity at ECA will identify dataneeds and appropriate portion of resources allocated and pooled into a fund to be managed by theECA Data Committee.

To minimize the risks of lack of staffs in SRDC, the following measures can be apply: In the short term, use fellows and interns from universities and regional statistics institutes.

Funds to be secured for this initiative by ECA.

In the long run, P2 recruitment will be more statistics oriented;

Concerning lack of capacity, the following risk management can be applied: ACS and SRO willassess RECs’ and member States’ capacity building needs and develop short and long term statisticalcapacities building in collaboration with IDEP.

In order to minimize the risk related to the use of non-harmonized standards and classifications,concepts and definitions at country level, ECA will advocate the use of international standards andconcepts through the collaboration with specialized agencies and RECs.

To coordinate all processes defined in this SOP, the ECA Data Committee (EDC) recommended byTask Force 4 on Databank Architecture will be created with specific missions:

Facilitate the implementation and the update of this SOP;

Oversee data governance and ensure that the databank produces data according to data qualitycriteria;

Review the anticipated data needs to to confirm that they respond to new and emerging issuesas political priorities change;

Ensure collaboration between all ECA’s actors (ACS, SROs and others Divisions).

The composition of the Data Committee is as follow: Permanent Members

o Director of ACS as Chairo Director of SPOQDo Director of DoAo Director of PIKMD

Rotating Members

Page 12: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 9

o One SRO Directoro One Director of Substantive Division. The actual recommendation of the Task Force

is for two directors of substantive divisions, but it is considered that one is adequate. The EDC will meet at least twice a year.

Page 13: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 10

Annexes

Annex 1: Thematic AssignmentsThematic areas Approver CompilerPop and demography Fatouma GulilatEducation Oumar GulilatHealth Oumar MeazaPoverty Fatouma MeazaEnvironment Ayenika MeazaCPI – Monthly, CPI – Annual, PPI – Annual Emmanuel EliasNational Accounts final Emmanuel EliasFinancial and monetary statistics Emmanuel EliasTrade Katalin TesfayeBalance of Payments, debt and financial flows Katalin TesfayeTransport Meriem Meron/YosephTourism Leandre HaileAgriculture Issoufou HaileEmployment Issoufou GulilatMining Andry ThomasIndustry Andry ThomasScience, Technology and Innovation Molla YaredEnergy Khogali ThomasMDGs Negussie Gulilat

Page 14: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 11

Annex 2: Country AssignmentsStaff CountriesAndryAndriantseheno Comoros

MadagascarMauritiusSeychelles

Andre Nonguierma Burkina FasoCaboVerdeCôte d'Ivoire

Aster Denekew MalawiNamibiaSouth Africa

AyenikaGodheart Sao Tome and PrincipeDemocratic Republic of CongoNigeriaSierra Leone

Emmanuel Ngok Republic of the CongoGuinea-BissauGuineaEquatorial Guinea

Fatouma Sissoko MaliGambia,The SenegalTogo

Issoufou Seidou BeninNigerLiberiaGhana

Katalin Bokor BurundiRwandaDjiboutiKenya

Khogali Ali Khogali EgyptMoroccoSudanSouth Sudan

Léandre Ngogang Wandji GabonCameroonCentral African RepublicChad

Molla Hunegnaw BotswanaLesothoMozambiqueZambia

Negussie Gorfe EthiopiaSomaliaUgandaZimbabwe

Oumar Sarr Mauritania

Page 15: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 12

LibyaTunisiaUnited Republic of Tanzania

Meriem Ait Ouyahia AlgeriaAngolaEritreaSwaziland

ACS Country Focal Points are required to coordinate data collection with responsible SRDCs andStatistical Assistants, and maintain up to date information on:- National statistical office/institute:

o Name and contact details of DG of NSO, heads of main divisions/departments, data focalpoints;

o Human resource and capacity situation;- National statistical system:

o Other government departments and agencies that produce thematic and sectoral statistics,regular statistical products and tentative release calendars;

o Non-official sources of data that ECA may be interested in;o Programmes of surveys, censuses and activities that would produce data;

- Statistical organization:o Revision and implementation status of NSDS;o Structure of NSS: centralisation/decentralisation; independence and reporting;

- National mapping system:o Geodetic reference frame parameters;o Linkages between geospatial information management and statistical systems;o Contact details of national mapping agency;o Available geospatial datasets.

Page 16: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 13

Annex 3: ECA Data Management Protocol

Page 17: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 14

ECA Data Management Protocol

IntroductionThe increased emphasis on grounding policy research and advocacy on clear and objective evidencerequires commensurate emphasis on maintaining robust statistics to inform the research and knowledgegeneration. To ensure the consistency of the messages that ECA is putting out, indicators used in all policyand knowledge products should have the same values. This can best be achieved if the statistics used arederived from the same corporate databank maintained by the African Centre for Statistics. Also, the dataused for all ECA publications, reports, briefs, and other knowledge products must be globally accessible tothe general public to enable them interrogate ECA’s analyses and findings. This wide dissemination of ECA’sdata holdings is crucial to enhance ECA’s credibility as the think tank of reference on matters pertaining toAfrican development. It will also increase the availability of reliable data by partners in Africandevelopment.

As part of the refocusing of ECA the sub regional offices (SROs) now have core data centre functions toleverage their proximity to their constituent countries for data collection, field studies and backstopping.The SROs are also now required to produce country profiles as a major tool for disseminating ECA’sresearch and policy analysis. The data produced by the SROs and used for the country profiles should besubjected to the same processes and available through the same corporate data servers and warehouses asother data products. However, it is not expected that any one SRO would have the staff complement andexpertise necessary to undertake the complex data transformation activities, including data qualityassurance, and adherence to global standards and best practices.

ECA therefore needs data management protocols setting out the policies, standards, processes,technologies and responsibilities/roles to manage and ensure the availability, accessibility, quality,consistency, auditability and security of all ECA’s data resources.

Purpose and AudienceThe ECA Data Management Protocol describes the guidelines, processes and recommended practices forthe collection, management and distribution of data in ECA. It does not describe the entire ECA databanknor provide technical documentation for components of the databank architecture. The intended audienceincludes all ECA staff involved in the collection, management, and distribution of data and metadata inECA’s databases and warehouses; as well as all users of data and information products disseminatedthrough ECA data servers. This is a dynamic document that will be updated as the databank and dataservers grow and the Statistical Information Management System evolves.

Key Principles1. Maximum Use for Available Data. Every data set collected, compiled or otherwise produced to

support an ECA activity must be made available to all of ECA. This follows from the fact that no matterhow specific the original purpose for producing a data resource, somebody else somewhere will findit useful for another purpose, if they know of its existence.

2. Metadata Driven Data Management (MDM). To support Principle 1, ECA will maintain a searchablemetadata collection that provides field-based description of data resources so that secondary userscan discover existing data resources and evaluate them to determine the suitability of the data setsfor the intended use.

3. One-Stop Corporate Databank. ECA shall maintain a corporate databank that will provide the dataneeded for all its work. All ECA data sets shall be deposited into this corporate databank from whichall users shall draw when they need data.

Page 18: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 15

4. Enter Once, Use Many Times. Following from Principle 1, every data set should be entered intoECA’s corporate databank only once, usually at a point nearest to where it was originally collected,compiled or produced.

5. Harvest Data as and when Released. Working with national data providers and other internationaldata producers and providers, ECA shall implement a work flow to receive data from the providersas and when released by the providers.

Data sourcesECA data holdings are generally drawn from four main sources: (i) Official data from member states (ii) datafrom regional and international organisations (iii) data generated from flash surveys undertaken by ECA (iv)data generated from ECA research outputs.

1. Official sources. These are data sets released by governments as official statistics. The primary sourcethese official statistics in member states is the national statistical office (NSO), which is a coordinatingbody for data in the country. However, data are also collected directly from ministries, departmentsand agencies (MDAs) mandated to maintain sectoral/thematic data. These include central banks,ministries of finance, agriculture, education, health, science and technology, among others. Suchdata are usually submitted directly to ECA. They could also be harvested from the officialdissemination websites and online distribution networks maintained by the producing agencies, ortheir designated distributing agents.

National statistical offices have assigned data focal points to respond to the joint requests from ECA(through ACS), African Development Bank (AfDB) and African Union Commission (AUC) for the jointAfrican Statistical Yearbook (ASYB). The focal points compile data from line ministries and dataproducing agencies and fill data request questionnaires for the ASYB. To adhere to Principle 5above, ACS will work with SROs to constitute the national data providers of sectoral/thematic datainto formal technical working groups under the auspices of the National Strategy for theDevelopment of Statistics (NSDS) Coordination Committees, in collaboration with other partners,notably Food and Agricultural and Organisation (FAO) that already has country level TechnicalWorking Groups (TWGs) to collect and validate agricultural data for its web-based statisticaldatabase for food and agriculture data (CountryStat). The TWGs will maintain and publish the datarelease calendar for different data themes so that ECA (and other partners) can plan and time theirinteractions with them to achieve the as-and-when-released principle. ECA will encourage nationaldata providers to implement Statistical Data and Metadata Exchange (SDMX) capabilities so thatdata users, including ECA, can harvest the data they need automatically from the datadissemination systems. They will also organize data for regular data requests for users that cannotharvest automatically. ECA, through the SROs, will provide technical backstopping to the TWGs tovalidate and upload the datasets regularly.

2. Regional and International Sources. While national data sources are generally preferred, there aredata sets that need to be harmonized or normalized to facilitate comparability of the indicators, e.g.,school enrolment statistics. For such datasets, there are departments of the United NationsSecretariat, UN agencies and international organizations that have been explicitly assigned (or ceded)responsibility to prepare and analyse such indicators, especially in the context of MDG monitoring.ECA collects such harmonized data sets from the online databases of these international custodians.

Professional staff and statistical assistants of ACS have been assigned data topics that they areresponsible for maintaining. ACS staffs will continuously to monitor the data sources assigned tothem and ensure that the extracts in ECA’s corporate databank are up to date at all times.Increasingly, international custodians of harmonized datasets provide application programming

Page 19: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 16

interfaces and/or web services to enable users harvest the data (semi) automatically. ACS willimplement appropriate software tools for such automatic update of ECA databases and datawarehouses.

3. Flash Surveys. There are times that ECA researchers and analysts may require data that are notregularly maintained by the countries, nor by any international data providers. ECA therefore hasneed to occasionally conduct flash surveys to collect such data.

For datasets in this category, ECA divisions would be required to identify the data needs forplanned outputs early in the process, ideally as part of the concept note or aide memoire for theactivity. ACS will then assign appropriate focal point(s), depending on the required data topics, towork with the division and SROs responsible for the countries where the surveys will beundertaken.

4. Derived Data from Research Outputs. These are mainly secondary indicators and composite indicesthat are generated by statistical, econometric or mathematical models based on primary indicators.Such indicators that are used to describe, characterize or compare situations in/among Africancountries acquire the status of data and would be stored and managed like any other dataset. Themetadata stored against them would necessarily include the models and methods used to producethem, so that users of the associated information products can verify their validity.

Roles and Responsibilities for Data Collection ACS shall continuously maintain a comprehensive register of indicators and data sets needed by users

in ECA for programmed outputs. The register shall classify the indicators and include appropriatemetadata for other potential users to determine their suitability for use.

Users in ECA shall be required to include a data plan as part of every concept note, aide memoire orother planning document for outputs and activities. The data plan shall indicate what datasets and/orindicators would be required for the activities, and at what stage they would be needed. The dataplan shall also indicate if new derived indicators will be produced by the activities.

Based on the data plans being received, ACS shall nominate a professional staff member and astatistics assistant as statistics focal points for the activities and work with the task managers in therequesting division or office to ensure that datasets are compiled and made available by the timeindicated in the data plan. Actions to be undertaken jointly would include clarifying the definitionsof indicators, proposing proxy indicators if the proposed indicators are not measurable, andidentifying datasets that are not already available in ECA databank and preparing data acquisitionplans.

Where the activity produces derived indicators, ACS shall provide for them to be incorporated inECA’s corporate databank and treated like any other data, with appropriate metadata.

Where the dataset implies conducting a flash survey, the requesting division or office shall mobilizenecessary funds and process the financial and requisition documents for the data provisioncomponent of the activities.

SPOQD, OES and any other office that would be involved in reviewing and approving programme andproject plans shall insist that every concept note or aide memoire for outputs and activities includesa data plan.

ACS and the SRO Data Centres shall jointly maintain a globally searchable database of members ofthe national technical working groups, as well as a database of the release calendars for various datathemes, by country.

ACS, in collaboration with substantive divisions that deal with specific topics, shall maintain aninternally searchable database of release calendars for the data sourced from international sources.

ACS shall develop and maintain appropriate software tools to harvest official statistics from nationaland international data sources that have implemented online dissemination facilities.

Based on the data release calendars, SRO Data Centres shall liaise with NSOs and national TWGs tocollect data from national sources that have not yet implemented online dissemination systems.

Page 20: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 17

Data Registration and Metadata ManagementData for ECA databases and analytical work by ECA divisions and SROs are obtained mainly as emailattachments and portable media (DD-ROMs and flash drives), and direct extraction and download fromonline databases other data dissemination portals. Few data sets are received as printed copies ofcompleted questionnaires and publications, requiring them to pass through manual data entry processes;the questionnaires and publications are sometimes available in digital form for direct extraction.

Some of these datasets are currently are stored on computers of the individuals who received them. Theresult is that other potential users of ECA do not have easy access to the datasets, resulting in duplicaterequests and efforts.

It has been asserted under Principle 1 that if users know of the existence of a data resource, they are likelyto find it useful for other purposes than what it was originally collected for. There is therefore a need topublish descriptions of existing datasets so that they can be discovered. This is one of the functions ofmetadata, which are indexed and searchable data that describe other datasets the way data describeobjects, events and phenomena in the real world. The emphasis of the data discovery enquiry could includegeographic questions, like what data we have for specific regions; temporal, like what data do we have forspecific periods; and administrative, like who can read particular datasets.

Having found the datasets, users need to understand them to use them properly, for the purpose for whichthey are intended. The description needed to provide such understanding are usually discipline-specific. Butthere are generic information that would be expected to be included. These could include how the datasetwas collected and processed and the data structure and/or file format. Metadata also provide the means topreserve the institutional knowledge about the datasets when staff leave a project or move to otherroles/functions.

ECA databank architecture will therefore be metadata driven. In that regard, every dataset received byanybody on behalf of ECA should be registered, assigned a unique reference number, and described usingstandard metadata fields. Depending on the particular dataset, the metadata record could include:

• Structural metadata: unique variable name(s) and acronym(s), allowing users to search for statisticscorresponding to their needs. For example, users searching for some statistics related to “inflation”should be given some indication on what already exists in the database and where to go for a closer look;

• Conceptual metadata: description of the concepts used and their practical implementation, allowingusers to understand what the statistics are measuring and, thus, their fitness for use;

• Methodological metadata: description of methods used for the generation of the data (e.g. sampling,collection methods, editing processes, transformations);

• Quality metadata: description of the different quality dimensions of the resulting statistics (e.g.timeliness, accuracy);

Roles and Responsibilities for Metadata Management ACS to propose a metadata profile for ECA, proposing the core list of metadata fields to be

maintained for every dataset in ECA’s databank. ACS to prepare a template, and maintain software tools and procedures for registering data; and

entering, editing and searching metadata records. The tools should also include features to enforcethe compulsory attachment of metadata before any dataset would be accepted into the databank.

ACS to conduct regular training sessions on metadata concepts and tools used in ECA for metadatamanagement.

All ECA staff compiling data from various sources to attach metadata records to every datasetentered into the ECA corporate databank.

Page 21: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 18

Data Quality Assurance ProgrammeAs ECA strives to establish the credibility of its research, knowledge and policy products, it is important thatthe underlying datasets that inform the products are trusted by the users and general public. To earn suchtrust, ECA shall strive to maintain a transparent data quality programme for all its data and informationproducts.

Data quality is commonly defined as the assessed or perceived fitness of data to serve the purpose forwhich the dataset has been collected. However, given that the maximum use principle implies that datashould be used for other purposes than the original intended purpose, data quality has also been definedas the extent to which the data actually represents what it purports to represent. Data quality has severaloverlapping dimensions and are context dependent. However, there are key dimensions that tend to beincluded in most contexts. These are relevance, timeliness, accuracy, coherence, interpretability andaccessibility. The data quality dimension that immediately comes to mind in discussions is accuracy, whichincludes accurate access to, accurate transformation of, and accurate interpretation of the data. Theseviews of accuracy underscore the importance of metadata in data management.

ECA’s data quality assurance programme shall include the normal three aspects of improvement,prevention and monitoring. On improvement, a system shall be put in place to receive feedback from usersand correct any reported errors or problems with the data already in the database. Prevention calls forimplementing streamlined data capture processes and interfaces, and building data checkers to trappotential errors and stop them from being entered into the database. Monitoring implies explicitly auditingthe databases periodically to ensure that errors have not slipped through, while at the same time feedinginto the improvement actions. As part of the monitoring, a feedback questionnaire shall be administered todata users that made request for specific datasets to be compiled. The results of the questionnaire shall betranslated into associated quality statements that shall be published for the datasets.

Roles and Responsibilities for Data Quality Assurance ACS shall develop a data quality framework based on the UN Statistical Commission’s Template for a

Generic National Quality Assurance Framework (NQAF). ACS shall implement streamlined data validation procedures for new datasets, and feedback

mechanisms for quality monitoring. ECA data users shall complete the data assessment questionnaires for data products derived from

ECA databanks to be aggregated into the data quality statements.

Data validationData validation is an essential process for ensuring the quality of the collected data from various sources.Any data compiled from any source by ECA shall pass through a systematic validation before being ingestedinto ECA’s databank. This will help in identifying potential errors in the datasets, such as inconsistent orhighly improbable values, based on predetermined rules, scientific theories, and trends and acceptsstandards. Data validation criteria are normally established according to the statistical domain and type ofdata. Common criteria usually include:

sum of sub components must be equals to the total data values in a field must be a numeric value an indicator used as a denominator in a model to compute a derived indicator cannot have a zero

value data must be associated with a country/geographic area data must be associated with a year or month or a specific period

Page 22: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 19

Roles and Responsibilities for Data Validation ACS shall liaise with subject specialists in other divisions to define and continuously review data

validation criteria and implement software tools to check for completeness, consistency andarithmetic integrity.

ACS shall ensure that data entry sub systems apply data validation rules consistently before data isaccepted into the databank.

ACS shall ensure that data entry workflows include documentation, verification and confirmationprocedures before changes are made to data in the databank, with industry standard provisions forversion control to be able to roll back if errors are later discovered.

Data Harmonization, Transformation and AnalysisIt has already been established that the data in ECA’s corporate databank and data warehouses will bedrawn from several sources, with priority to national providers. The indicators from the various sourcessometimes have different units, such as currencies. The definitions of the indicators may vary, forprimary/secondary school completion. There is therefore need to harmonize the data before countries canbe compared.

Rebasing and RescalingTwo common data transformations used to harmonize data for such comparability are rebasing andrescaling. Indicators series like prices and growth rates normally need to be referenced to their equivalentvalues at a fixed base year, which allows values to be compared for different years. In a similar manner, tocompare across countries, it is necessary to rebase the data to the same base year.

Rescaling is sometimes necessary when there are methodological breaks in the series provided by acountry. However, there could be an incompatibility between smoothing the series for methodologicalbreaks and keeping the consistency between the total and the components. Specific techniques would beapplied based on the nature of the series, the nature of the methodological break and the intended use ofthe series.

EstimationTime series data sometimes have data gaps. To overcome such data gaps and obtain a full set of data,estimates are calculated using statistical, mathematical or econometric models. There is no a single modelof estimation fits all; rather they are selected according to the nature of the dataset and the theory of theissue being modelled. Some time series datasets include estimates when we receive them from nationalagencies of international organizations. In such cases, the corresponding metadata should clearly indicatethese facts and document the estimation techniques to ensure accurate interpretation and proper use ofthe data.

ForecastingForecasting is about predicting the evolution of a series based on available information. The quality of theprevision depends on the amount of information available at the time of forecasting, a good knowledge ofthe series and its relations with other series, and an understanding of the underlying theoretical concepts.Predicted data values are normally not stored in the database, but generated at the time of the analysis.However, for forecasting models that require great efforts to execute, the values may be pre-computedand stored, with appropriate metadata indicating that they are forecasts. Information about the models,including the methodology used, is also documented in the metadata.

Statistical Analysis and Data ModellingTrend estimations, correlation studies, hypothesis testing and other statistical analysis of the data,including econometric modelling, would usually be undertaken as part of policy analysis. Such analyses are

Page 23: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 20

usually subject-matter specific and would be undertaken outside of the database on needs basis by thedivisions and offices that are conducting the study. However, following from the principle on one-stop dataservices, the data for these models would be drawn from the ECA corporate databank. Also, some of themodels could be similar enough to be implemented in the same software systems, as common services, toreduce duplication of efforts, share experience and continuously augment available knowledge andexpertise.

Roles and Responsibilities for Data Analysis ACS shall provide guidance on the most commonly used statistical techniques, including

o Trend analysis;o Stationarity and unit root tests;o Co-integration tests;o Detection of breakpoints;o Detection of outliers;o Various imputation techniques;o Panel data analysis;o Analysis of correlations;o Principal components analysis and factor analysis;o Logistic regression;o ANOVA, chi square tests, etc.

Divisions and offices shall describe the test, the needed data, the conditions for validity and the mainstatistics used to assess that the model fits the data.

ACS shall maintain corporate licenses for the software needed for data transformation and analysisand develop appropriate customization and interface modules, after consultation with the disciplineexperts in the divisions and offices.

Data DisseminationA databank or database system is only useful if users can derive data and information products from them.An important component of ECA’s databank is the data dissemination portal. The portal shall provide apowerful query wizard that will guide users through the steps for selecting a data source, data elements,and data layout or format that best meets their needs. The results of the queries can be visualized in tables,charts, or maps. A Bulk Export tool shall also be available for exporting data to various formats for furtheruse in different statistical applications, including accepted interchange standards at the time ofimplementation.

The system shall include a set of pre-defined dashboards to provide snapshots of specific pre-selectedindicators. Various types of dashboards will allow users to perform comprehensive analysis at both countryand regional levels. End-users can review single topic or year or location of data. Users can also look atdashboards to compare indicators across sectors/sub-sectors, sources, time, and location.

As already alluded to, the results of queries may be visualized as maps and which can be integratedtogether for cross-sectoral insights. Maps can be integrated into dashboards, mobile applications,published documents, or be shared as their own web services. Some of the sectoral areas to be supportedrequire location-based analyses and cartographic quality maps, provided by a geographic informationsystem (GIS), e.g., climate change analyses; infrastructure planning; and natural resource management. Inkeeping with the MDM approach, the attribute data for the GIS applications will be sourced from the samemaster database to prevent redundancy/inconsistency between geospatial databases and statisticaldatabases.

Page 24: ECA Standard Operating Procedure for Data Collection ...ecastats.uneca.org/acsweb/Portals/0/ACSVirtualSpaces/ECA...ECA Standard Operating Procedure for Data Collection, Compilation

Page | 21

An important use for the databank is to support the production of statistical annexes to flagshippublications. The system shall therefore provide for users to design and generate tables for specific regions,topics or time periods.

Data GovernanceData governance is defined as by the Data Governance Institute as a system of decision rights andaccountabilities for information-related processes, executed according to agreed-upon models whichdescribe who can take what actions with what information, and when, under what circumstances, usingwhat methods. ECA shall adopt a simple data governance framework that includes an ECA Data GovernanceCommittee and the designation of a Chief Statistician.

ECA Data Governance CommitteeThe Databank Architecture Task Force (No. 4) recommended the establishment of the ECA Data Committeeas a transitional oversight body for the implementation of its recommendations, with the option of a longermandate. It also recommends the establishment of a Data Quality Committee the overall functioning of thedatabank. A Data Governance Committee shall be established as a hybrid of the two recommendedcommittees to provide standard data governance functions to ECA.

Chief Statistician of ECAThe Task Force recommended that the Director of ACS should chair the data committee and oversee thefunctioning of the databank. It is recommended, in the context of the data governance framework, toformally designate the director of ACS as ECA’s Chief Statistician. Even though ACS, under the direction ofits head of ACS advises ECA on statistical matters, the ECA work programme contains mostly activitiesdesigned to benefit member States without explicit provision for the separate function of a ChiefStatistician. With such designation, the work programme presented to ECA governing structures (theStatistical Commission for Africa and Conference of Ministers) will explicitly provide for the governance ofstatistical data in ECA, with appropriate resource allocation.