pandaSDMX Documentation - Read the Docs · SDMX(short for: Statistical Data and Metadata eXchange) is a set ofstandards and guidelinesaimed at facilitating the production, dissemination,

pandaSDMX DocumentationRelease 0.3.0

Dr. Leo

September 22, 2015

Contents

1 Main features 3

2 Example 5

3 pandaSDMX Links 7

4 Table of contents 94.1 What’s new? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.4 A very short introduction to SDMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.5 Basic usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.6 Advanced topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.7 pandasdmx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.8 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.9 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

5 Indices and tables 37

Python Module Index 39

i

ii

pandaSDMX Documentation, Release 0.3.0

pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming the most intuitive and versatile tool toretrieve and acquire statistical data and metadata disseminated in SDMX format. It works well with the SDMXservices of the European statistics office (Eurostat) and the European Central Bank (ECB). While pandaSDMX isextensible to cater any output format, it currently supports only pandas, the gold-standard of data analysis in Python.But from pandas you can export your data to Excel and friends.

Contents 1

http://www.python.org

http://www.sdmx.org

http://pandas.pydata.org


2 Contents

CHAPTER 1

Main features

• intuitive API inspired by requests

• support for many SDMX features including

– generic datasets

– data structure definitions, code lists and concept schemes

– dataflow definitions and content-constraints

– categorisations and category schemes

• pythonic representation of the SDMX information model

• find dataflows by name or description in multiple languages if available

• When requesting datasets, validate column selections against code lists and content-constraints if available

• read and write SDMX messages to and from local files

• configurable HTTP connections

• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite

• writer transforming SDMX generic datasets into multi-indexed pandas DataFrames or Series of observationsand attributes

• extensible through custom readers and writers for alternative input and output formats of data and metadata

3

https://pypi.python.org/pypi/requests/

https://readthedocs.org/projects/requests-cache/


4 Chapter 1. Main features

CHAPTER 2

Example

In [1]: from pandasdmx import Request

# Get recent annual unemployment data on Greece, Ireland and Spain from EurostatIn [2]: une_resp = Request('ESTAT').get('data', 'une_rt_a', key={'GEO': 'EL+ES+IE'}, params={'startPeriod': '2006'})

# From the received dataset, select the time series on all age groups and write them to a pandas DataFrameIn [3]: une_df = une_resp.write(s for s in une_resp.msg.data.series if s.key.AGE == 'TOTAL')

# Explore the DataFrame. First, show dimension namesIn [4]: une_df.columns.namesOut[4]: FrozenList(['AGE', 'SEX', 'S_ADJ', 'GEO', 'FREQ'])

# corresponding dimension valuesIn [5]: une_df.columns.levels\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[5]: FrozenList([['TOTAL'], ['F', 'M', 'T'], ['NSA'], ['EL', 'ES', 'IE'], ['A']])

# Print aggregate unemployment rates across ages and sexesIn [6]: une_df.loc[:, ('TOTAL', 'T')]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[6]:S_ADJ NSAGEO EL ES IEFREQ A A A2014 26.5 24.5 11.32013 27.5 26.1 13.12012 24.5 24.8 14.72011 17.9 21.4 14.72010 12.7 19.9 13.92009 9.6 17.9 12.02008 7.8 11.3 6.42007 8.4 8.2 4.72006 9.0 8.5 4.5

5


6 Chapter 2. Example

CHAPTER 3

pandaSDMX Links

• Download the latest stable version from the Python package index

• Mailing list

• github

• Official SDMX website

7

https://pypi.python.org/pypi/pandaSDMX

https://groups.google.com/forum/?hl=en#!forum/sdmx-python

https://github.com/dr-leo/pandaSDMX

http://www.sdmx.org


8 Chapter 3. pandaSDMX Links

CHAPTER 4

Table of contents

4.1 What’s new?

4.1.1 v0.3.0 (2015-09-22)

• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite

• pythonic selection of series when requesting a dataset: Request.get allows the key keyword argument in a datarequest to be a dict mapping dimension names to values. In this case, the dataflow definition and datastructuredefinition, and content-constraint are downloaded on the fly, cached in memory and used to validate the keys.The dotted key string needed to construct the URL will be generated automatically.

• The Response.write method takes a parse_time keyword arg. Set it to False to avoid parsing of dates, timesand time periods as exotic formats may cause crashes.

• The Request.get method takes a memcache keyward argument. If set to a string, the received Response instancewill be stored in the dict Request.cache for later use. This is useful when, e.g., a DSD is needed multipletimes to validate keys.

• fixed base URL for Eurostat

• major refactorings to enhance code maintainability

4.1.2 v0.2.2

• Make HTTP connections configurable by exposing the requests.get API through thepandasdmx.api.Request constructor. Hence, proxy servers, authorisation information and other HTTP-related parameters consumed by requests.get can be specified for each Request instance and used insubsequent requests. The configuration is exposed as a dict through a new Request.client.configattribute.

• Responses have a new http_headers attribute containing the HTTP headers returned by the SDMX server

4.1.3 v0.2.1

• Request.get: allow fromfile to be a file-like object

• extract SDMX messages from zip archives if given. Important for large datasets from Eurostat

9


http://www.python-requests.org/en/latest/


• automatically get a resource at an URL given in the footer of the received message. This allows to automaticallyget large datasets from Eurostat that have been made available at the given URL. The number of attempts andthe time to wait before each request are configurable via the get_footer_url argument.

4.1.4 v0.2 (2015-04-13)

This version is a quantum leap. The whole project has been redesigned and rewritten from scratch to provide robustsupport for many SDMX features. The new architecture is centered around a pythonic representation of the SDMXinformation model. It is extensible through readers and writers for alternative input and output formats. Export topandas has been dramatically improved. Sphinx documentation has been added.

4.1.5 v0.1 (2014-09)

Initial release

4.2 FAQ

4.2.1 Can pandaSDMX connect to SDMX providers other than ECB and Eurostat?

Any SDMX provider can be supported that generates SDMX 2.1-compliant messages. The only agencies I know ofthat deliver data in this format are ECB and Eurostat. Support for SDMX 2.0 messages could be added as a new readermodule. Perhaps the model would have to be tweaked a bit as well.

4.2.2 Writing large datasets to pandas DataFrames is slow. What can I do?

The main performance hit comes from parsing the time or time period strings. In case of regular data such as monthly(not trading day!), call the write method with fromfreq set to True so that only the first string will be parsedand the rest inferred from the frequency of the series. Caution: If the series is stored in the XML document in reversechronological order, the reverse_obs argument must be set to True as well to prevent the resulting dataframe indexfrom extending into a remote future.

4.3 Getting started

4.3.1 Installation

Prerequisites

pandaSDMX is a pure Python package. As such it should run on any platform. It requires Python 2.7 or 3.4. Python3.3 should work as well, but this is untested. Python 3.5 should work once the dependencies have been ported.

It is recommended to use one of the common Python distributions for scientific data analysis such as

• Anaconda, or

• Canopy.

Along with a current Python interpreter these Python distributions include the dependencies as well as lots of otheruseful packages for data analysis. For other Python distributions (not only scientific) see here.

pandaSDMX has the following dependencies:

10 Chapter 4. Table of contents

http://www.python.org

https://store.continuum.io/cshop/anaconda/

https://www.enthought.com/products/canopy/

https://wiki.python.org/moin/PythonDistributions


• the data analysis library pandas which itself depends on a number of packages

• requests

• LXML

Optional dependencies

• requests-cache allowing to cache SDMX messages in memory

• IPython is required to build the Sphinx documentation To do this, check out the pandaSDMX repository ongithub.

• nose to run the test suite.

Download

You can download and install pandaSDMX like any other Python package, e.g.

• from the command line with pip install pandasdmx, or

• manually by downloading and unzipping the latest source distribution. From the package directory you shouldthen issue the command python setup.py install.

4.3.2 Running the test suite

From the package directory, issue the folloing command:

>>> nosetests pandasdmx

4.3.3 Package overview

Modules

api module containing the API to make queries to SDMX web services. See pandasdmx.api.Request in partic-ular its get method. pandasdmx.api.Request.get() return pandasdmx.api.Response instances.

model implements the SDMX information model.

remote contains a wrapper class around requests for http. Called by pandasdmx.api.Request.get() tomake http requests to SDMX services. Also reads sdmxml files instead of querying them over the web.

Subpackages

reader read SDMX files and instantiate the appropriate classes from pandasdmx.model There is only one readerfor XML-based SDMXML v2.1. Future versions may add reader modules for other formats.

writer contains writer classes transforming SDMX artefacts into other formats or writing them to arbitrary desti-nations such as databases. The only available writer for now writes generic datasets to pandas DataFrame orSeries.

utils: utility functions and classes. Contains a wrapper around dict allowing attribute access to dict items.

tests unit tests and sample files

4.3. Getting started 11

http://pandas.pydata.org/

https://pypi.python.org/pypi/requests/

http://www.lxml.de


http://ipython.org/

https://pypi.python.org/pypi/nose

https://pypi.python.org/pypi/pandaSDMX/


4.3.4 What next?

The remaining chapters explain the key characteristics of SDMX, demonstrate the basic usage of pandaSDMX andprovide additional information on some advanced topics. While users that are new to SDMX are likely to benefit a lotfrom reading the next chapter on SDMX, normal use of pandaSDMX should not strictly require this. The Basic usagechapter should enable you to retrieve datasets and write them to pandas DataFrames. But if you want to exploit thefull richness of the information model, or simply feel more comfortable if you know what happens behind the scenes,the SDMX introduction is for you. It also contains links to authoratative SDMX resources.

4.4 A very short introduction to SDMX

4.4.1 Overall purpose

SDMX (short for: Statistical Data and Metadata eXchange) is a set of standards and guidelines aimed at facilitatingthe production, dissemination, retrieval and processing of statistical data and metadata. SDMX is sponsored by a widerange of public institutions including the UN, the IMF, the Worldbank, BIS, ILO, FAO, the OECD, the ECB, Eurostat,and a number of national statistics offices. These and other institutions provide a vast array of current and historicdatasets and metadatasets via free or fee-based REST and SOAP web services. pandaSDMX only supports SDMXv2.1, that is, the latest version of this standard. Some agencies such as the IMF continue to offer SDMX 2.0-compliantservices. These cannot be accessed by pandaSDMX. While this may change in future versions, there is the expectationthat SDMX providers will upgrade to the latest standards at some point.

4.4.2 Information model

At its core, SDMX defines an information model consisting of a set of classes, their logical relations, and semantics.There are classes defining things like datasets, metadatasets, data and metadata structures, processes, organisationsand their specific roles to name but a few. The information model is agnostic as to its implementation. Luckily, theSDMX standard provides an XML-based implementation (see below). And a JSON-variant is in the works.

The following sections briefly introduces some key elements of the information model.

Datasets

a dataset can broadly be described as a container of ordered observations and attributes attached to them. Observations(e.g. the annual unemployment rate) are classified by dimensions such as country, age, sex, and time period. Attributesmay further describe an individual observation or a set of observations. Typical uses for attributes are the level ofconfidentiality, or data quality. Observations may be clustered into series, in particular, time series. The dataset mustexplicitly specify the dimension at observation such as ‘time’, ‘time_period’ or anything else. If a dataset consistsof series whose dimension at observation is neither time nor time period, the dataset is called cross-sectional. Adataset that is not grouped into series, i.e. where all dimension values including time, if available, are stated for eachobservation, are called flat datasets. These are not memory-efficient, but benefit from a very simple representation.

An attribute may be attached to a series to express the fact that it applies to all contained observations. This increasesefficiency and adds meaning. Subsets of series within a dataset may be clustered into groups. A group is defined byspecifying one or more dimension values, but not all: At least the dimension at observation and one other dimensionmust remain free (or wild-carded). Otherwise, the group would in fact be either a single observation or a series. Themain purpose of group is to serve as another attachment point for attributes. Hence, a given attributes may be attachedto all series within the group at once. Attributes may finally be attached to the entire dataset, i.e. to all observationstherein.


http://www.sdmx.org

http://sdmx.org/?cat=5


Structural metadata: data structure definition, concept scheme and code list

In the above section on datasets, we have carelessly used structural terms such as dimension, dimension value andattachment of attributes. This is because it is almost impossible to talk about datasets without talking about theirstructure. The information model provides a number of classes to describe the structure of datasets without talkingabout data. The container class for this is called DataStructureDefinition (in short: DSD). It contains a list of dimen-sions and for each dimension a reference to exactly one concept describing its meaing. A concept describes the set ofpermissible dimension values. This can be done in various ways depending on the intended data type. Finite value sets(such as country codes, currencies, a data quality classification etc.) are described by reference to code lists. Infinitevalue sets are described by facets which is simply a way to express that a dimension may have int, float or time-stampvalues, to name but a few. A set of concepts referred to in the dimension descriptors of a data structure definition iscalled concept scheme.

The set of allowed observation values such as the unemployment rate measured in per cent is defined by a specialdimension: the MeasureDimension, thus enabling the validation of any observation value against its DSD.

Dataflow definition

A dataflow describes what a particular dataset is about, how often it is updated over time by its maintaining agency,under what conditions it will be provided etc. The terminology is a bit confusing: You cannot actually obtain a dataflowfrom an SDMX web service. Rather, you can request one or more dataflow definitions describing a flow of data overtime. The dataflow definition and the artefacts to which it refers give you all the information you need to exploit thedatasets you can request using the dataflow’s ID.

A DataFlowDefinition is a class that describes a dataflow. A DataFlowDefinition has a unique identifier, a human-readable name and potentially a more detailed description. Both may be multi-lingual. The dataflow’s ID is used toquery the dataset it describes. The dataflow also features a reference to the DSD which structures the datasets availableunder this dataflow ID. For instance, in the frontpage example we used the dataflow ID ‘une_rt_a’.

Constraints

There are two types of constraints:

A content-constraint is a mechanism to express the fact that datasets of a given dataflow only comprise columns fora subset of values from the code-lists representing dimension values. For example, the datastructure definition for adataflow on exchange rates references tha codelist of all country codes in the world, whereas the datasets providedunder this dataflow only covers the ten largest currencies. These can be enumerated by a content-constraint attachedto the dataflow definition. Content-constraints can be used to validate dimension names and values (a.k.a. keys) whenrequesting datasets selecting columns of interest.

An attachment-constraint describes to which parts of a dataset (column/series, group of series, observation, the entiredataset) certain attributes may be attached. Attachment-constraints are not supported by pandaSDMX as this featureis needed only for dataset generation. However, pandaSDMX does support attributes in the information model andwhen exporting datasets to pandas.

Category schemes and categorisations

Categories serve to classify or categorise things like dataflows, e.g., by subject matter. Multiple categories may belongto a container called CategorySchemes.

A Categorisation links the thing to be categorised, e.g., a DataFlowDefinition, to a Category.

4.4. A very short introduction to SDMX 13


Class hierarchy

The SDMX information model defines a number of base classes from which concrete classes such as DataFlowDefi-nition or DataStructureDefinition inherit. E.g., DataFlowDefinition inherits from MaintainableArtefact attributes indi-cating the maintaining agency. MaintainableArtefact inherits from VersionableArtefact, which, in turn, inherits fromIdentifiableArtefact which inherits from AnnotableArtefact and so forth. Hence, DataStructureDefinition may havea unique ID, a version, a natural language name in multiple languages, a description, and annotations. pandaSDMXtakes advantage from this class hierarchy.

4.4.3 XML-implementation of the information model

The SDMX standard defines an XML-based implementation of the information model called SDMXML. An SD-MXML document contains exactly one SDMX Message. There are several types of Message such as GenericDataMes-sage to represent a DataSet in generic form, i.e. containing all the information required to interpret it. Hence, datasetsin generic representation may be used without knowing the related DataStructureDefinition. The downside is thatgeneric dataset messages are much larger than their sister format StructureSpecificDataSet. pandaSDMX as of v0.2only supports generic dataset messages.

Another important SDMXML message type is StructureMessage which may contain artefacts such as DataStructure-Definitions, codelists, conceptschemes, categoryschemes and so forth.

SDMXML provides that each message contains a Header containing some metadata about the message. Finally,SDMXML messages may contain a Footer element. It provides information on any errors that have occurred on theserver side, e.g., if the requested dataset exceeds the size limit, or the server needs some time to make it availableunder a given link.

The test suite comes with a number of small SDMXML demo files. View them in your favorite XML editor to get adeeper understanding of the structure and content of various message types.

SDMX services provide XML schemas to validate a particular SDMXML file. However, pandaSDMX does not yetsupport validation.

4.4.4 SDMX web services

The SDMX standard defines both a REST and a SOAP web service API. pandaSDMX only supports the REST API.

The URL specifies the type, providing agency, and ID of the requested SDMX resource (dataflow, categoryscheme,data etc.). The query part of the URL (after the ‘?’) may be used to give optional query parameters. For instance, whenrequesting data, the scope of the dataset may be narrowed down by specifying a key to select only matching columns(e.g. on a particular country). The dimension names and values used to select the rows can be validated by checkingif they are contained in the relevant codelists referenced by the datastructure definition (see above), and any content-constraint attached to the dataflow definition for the queried dataset. Moreover, rows may be chosen by specifying astartperiod and endperiod for the time series. In addition, the query part may set a references parameter to instructthe SDMX server to return a number of other artefacts along with the resource actually requested. For example, aDataStructureDefinition contains references to codelists and conceptschemes (see above). If the ‘references’ parameteris set to ‘all’, these will be returned in the same StructureMessage. The next chapter contains some examples todemonstrate this mechanism. Further details can be found in the SDMX User Guide, and the Web Service Guidelines.

4.4.5 Further reading

• The SDMX standards and guidelines are the authoritative resource. This page is a must for anyone eager to divedeeper into SDMX. Start with the User Guide and the Information Model (Part 2 of the standard). The WebServices Guidelines contain instructive examples for typical queries.


http://sdmx.org/?cat=5


• Eurostat SDMX page

• European Central Bank SDMX page It links to a range of study guides and helpful video tutorials.

• SDMXSource: - Java, .NET and ActionScript implementations of SDMX software, in part open source

4.5 Basic usage

4.5.1 Introductory remarks

This chapter illustrates the main steps of a typical workflow, namely:

1. retrieving relevant dataflows by category or from a complete list of dataflows,

2. exploring the data structure definition of the selected dataflow

3. selecting relevant series (columns) and a time-range (rows) from a dataset provided under the chosen dataflowand requesting the data via http

4. exploring the received data using the information model

5. writing a dataset or selected series thereof to a pandas DataFrame or Series

6. Reading and writing SDMX files

7. Handling errors

These steps share common tasks which flow from the architecture of pandaSDMX:

1. Call pandasdmx.api.Request.get() on a new or existing pandasdmx.api.Request instance toobtain an SDMX message from a web service or a file and load it into memory

2. Explore the pandasdmx.api.Response‘instance returned by:meth:‘pandasdmx.api.Request.get

• check for errors

• Access the SDMX message’s content through its msg attribute.

• write data to a pandas DataFrame or Series by Calling pandasdmx.api.Response.write(). Thisworks only for generic data messages.

4.5.2 Importing pandaSDMX

As explained in the preceeding section, we will need pandasdmx.api.Request all the time. Yet, wecan use the following shortcut to import it:

In [1]: from pandasdmx import Request

4.5.3 Connecting to an SDMX web service, caching

We instantiate pandasdmx.api.Request. The constructor accepts an optional agency ID as string. The list ofsupported agencies is shown in the error message if an invalid agency ID is passed:

In [2]: ecb = Request('ECB')

ecb is now configured so as to make requests to the European Central Bank. If you want to send requests to otheragencies, simply instantiate dedicated Request objects.

4.5. Basic usage 15

http://ec.europa.eu/eurostat/data/sdmx-data-metadata-exchange

https://sdw-wsrest.ecb.europa.eu/

http://www.sdmxsource.org/


Configuring the http connection

To pre-configure the HTTP connections to be established by a Request instance, you can pass all keyword argumentsconsumed by the underlying HTTP library requests (new in version 0.2.2). For a complete description of the optionssee the requests documentation. For example, a proxy server can be specified for subsequent requests like so:

In [3]: ecb_via_proxy = Request('ECB', proxies={'http': 'http://1.2.3.4:5678'})

HTTP request parameters are exposed through a dict. It may be modified between requests.

In [4]: ecb_via_proxy.client.configOut[4]: {'proxies': {'http': 'http://1.2.3.4:5678'}, 'stream': True, 'timeout': 30.1}

The Request.client attribute acts a bit like a requests.Session in that it conveniently stores the configu-ration for subsequent HTTP requests.

Caching received files

Since version 0.3.0, requests-cache is supported. To use it, pass an optional cache keyword argument to Request()constructor. If given, it must be a dict whose items will be passed to requests_cache.install_cache func-tion. Use it if you want to cache SDMX messages in databases such as MongoDB, Redis or SQLite. Read through therequests-cache‘ docs for further information.

Loading a file instead of requesting it via http

Any Request instance can load SDMX messages from local files. Issuing r = Request() without passing anyagency ID instantiates a Request object not tied to any agency. It may only be used to load SDMX messages fromfiles, unless a pre-fabricated URL is passed to pandasdmx.api.Request.get().

4.5.4 Finding dataflows

Note: Unlike the ECB, Eurostat, and probably other data providers do not support categories to facilitate data retrieval.Yet, it is recommended to read the following section as it explains some key concepts of the information model.

Getting the categorisation scheme

We can search the list of dataflows by category:. To do this, we request the category scheme from the ECB’s SDMXservice and explore the response like so:

In [5]: cat_resp = ecb.get(resource_type = 'categoryscheme')

In [6]: type(cat_resp)Out[6]: pandasdmx.api.Response

In [7]: cat_msg = cat_resp.msg

In [8]: type(cat_msg)Out[8]: pandasdmx.model.StructureMessage

In [9]: cat_header = cat_msg.header\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------IndexError Traceback (most recent call last)


http://www.python-requests.org/



<ipython-input-9-a9ddadaeaf54> in <module>()----> 1 cat_header = cat_msg.header

/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/model.py in header(self)46 @property47 def header(self):

---> 48 return self._reader.read_instance(Header, self)4950

/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/reader/sdmxml.py in read_instance(self, cls, sdmxobj, offset, first_only)186 if result:187 if first_only:

--> 188 return cls(self, result[0])189 else:190 return [cls(self, i) for i in result]

/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/model.py in __init__(self, *args, **kwargs)89 # Set additional attributes present in DataSet messages90 for name in ['structured_by', 'dim_at_obs']:

---> 91 value = self._reader.read_as_str(name, self)92 if value:93 setattr(self, name, value)

/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/reader/sdmxml.py in read_as_str(self, name, sdmxobj, first_only)212 result = self._str2path[name](sdmxobj._elem)213 if first_only:

--> 214 return result[0]215 else:216 return result

IndexError: list index out of range

In [10]: type(cat_header)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-10-c30209e950aa> in <module>()----> 1 type(cat_header)

NameError: name 'cat_header' is not defined

In [11]: categorisations = cat_msg.categorisations

In [12]: type(categorisations)Out[12]: pandasdmx.model.Categorisations

The content of the SDMX message, its header and its payload are exposed as attributes. Try dir(cat_msg) to findout that we have not only obtained the category scheme, but also the dataflows and categorisations. This is becausethe get method has set the references parameter to the appropriate default value. We can see this from the URL:

In [13]: cat_resp.urlOut[13]: 'http://sdw-wsrest.ecb.int/service/categoryscheme?references=all'

The HTTP headers returned by the SDMX server are availble as well (new in version 0.2.2):

In [14]: cat_resp.http_headersOut[14]: {'pragma': 'no-cache', 'vary': 'Accept, Accept-Encoding', 'content-length': '4965', 'connection': 'keep-alive', 'expires': 'Tue, 22 Sep 2015 20:34:39 GMT', 'content-encoding': 'gzip', 'date': 'Tue, 22 Sep 2015 20:34:39 GMT', 'cache-control': 'max-age=0, no-cache, no-store', 'server': 'Apache-Coyote/1.1', 'content-type': 'application/xml'}

Note that categorisations, categoryschemes, and many other artefacts from the SDMX information model are repre-

4.5. Basic usage 17


sented by subclasses of dict.

In [15]: categorisations.__class__.__mro__Out[15]:(pandasdmx.model.Categorisations,pandasdmx.model.SDMXObject,pandasdmx.utils.DictLike,pandasdmx.utils.aadict.aadict,dict,object)

If dict keys are valid attribute names, you can use attribute syntax. This is thanks topandasdmx.utils.DictLike, a thin wrapper around dict that internally uses a patched third-partytool.

Likewise, cat_msg.categoryschemes is an instance of DictLike. This is because by calling ‘‘ecb.get‘‘ without specifying a resource_id, we instructed the SDMX service to return all available cate-gorisation schemes. The DictLike container for the received category schemes uses the ID attribute ofpandasdmx.model.CategoryScheme as keys. This level of generality is required to cater for situations inwhich more than one category scheme is returned. In our example, however, there is but one:

In [16]: cs = cat_msg.categoryschemes

In [17]: type(cs)Out[17]: pandasdmx.utils.DictLike

In [18]: list(cs.keys())\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[18]: ['MOBILE_NAVI']

pandasdmx.model.CategoryScheme inherits from pandasdmx.utils.DictLike as well. Itsvalues are pandasdmx.model.Category instances, its keyse are their id attributes. Note thatpandasdmx.model.DictLike has a ‘‘ aslist‘‘ method. It returns its values as a new list sorted by id.The sorting criterion may be overridden in subclasses. We shall see this when dealing with dimensions in apandasdmx.model.DataStructureDefinition where the dimensions are ordered by position.

We can explore our category scheme like so:

In [19]: cs0 = cs.aslist()[0]

In [20]: type(cs0)Out[20]: pandasdmx.model.CategoryScheme

# Print the number of categoriesIn [21]: len(cs0)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[21]: 11

# Print ID's of categoriesIn [22]: list(cs0.keys())\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[22]: ['04', '01', '09', '10', '05', '11', '07', '08', '06', '03', '02']

# English name of category '07'In [23]: cs0['07'].name.en\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[23]: 'Exchange rates'

Extracting the dataflows in a particular category

As we saw from the attributes of cat_msg, the SDMX message, we have already the categorisations athand. While in the SDMXML file categories are represented as a flat list, pandaSDMX groups them by



category and exposes them as a pandasdmx.utils.DictLike‘mapping each category ID to alist of :class:‘pandasdmx.model.Categorisation instances each of which links its categoryto a pandasdmx.model.DataFlowDefinition instance. Technically, these links are represented bypandasdmx.model.Reference instances whose id attribute enables us to access the dataflow definitions inthe selected category ‘07’. We can print the string representations of the dataflows in this category:

In [24]: cat07_l = cat_msg.categorisations['07']

In [25]: list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-25-eab370a959af> in <module>()----> 1 list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)

<ipython-input-25-eab370a959af> in <genexpr>(.0)----> 1 list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)

AttributeError: 'StructureMessage' object has no attribute 'dataflows'

These are all dataflows offered by the ECB in the category on exchange rates.

Finding dataflows without using categories

In the previous section we have used categories to find relevant dataflows. However, in many situations there are nocategories to narrow down the result set. Here, pandasdmx.utils.DictLike.find() comes in handy:

In [26]: cat_msg.dataflows.find('rates')---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-26-27a10d15ea82> in <module>()----> 1 cat_msg.dataflows.find('rates')


4.5.5 Extracting the data structure and data from a dataflow

In this section we will focus on a particular dataflow. We will use the ‘EXR’ dataflow from the European CentralBank. In the previous section we already obtained the dataflow definitions by requesting the categoryschemes withthe appropriate references. But this works only if the SDMX services supports category schemes. If not (and manyagencies don’t), we need to download the dataflow definitions explicitly by issuing:

>>> flows = ecb.get(resource_type = 'dataflow')

Dataflow definitions at a glance

A pandasdmx.model.DataFlowDefinition has an id , name , version and many other attributes inher-ited from various base classes. It is worthwhile to look at the method resolution order to see how it works. Many otherclasses from the model have similar base classes.

It is crucial to bear in mind two things:

• the id of a dataflow definition is also used to request data of this dataflow.

• the structure attribute of the dataflow definition. is a reference to the data structure definition describingdatasets of this dataflow.

4.5. Basic usage 19


Getting the data structure definition (DSD)

We can extract the DSD’s ID and request the DSD. Then we will show some of its attributes.

Next, we extract the DSD’s ID and download the DSD together with all artefacts that it refers to and that refer to it.We set the params keyword argument explicitly to show how it works.

In [27]: dsd_id = cat_msg.dataflows.EXR.structure.id---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-27-1936b3fd084d> in <module>()----> 1 dsd_id = cat_msg.dataflows.EXR.structure.id


In [28]: dsd_id\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-28-040219d71e12> in <module>()----> 1 dsd_id

NameError: name 'dsd_id' is not defined

In [29]: refs = dict(references = 'all')

In [30]: dsd_resp = ecb.get(resource_type = 'datastructure', resource_id = dsd_id, params = refs)---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-30-f702833b1327> in <module>()----> 1 dsd_resp = ecb.get(resource_type = 'datastructure', resource_id = dsd_id, params = refs)

NameError: name 'dsd_id' is not defined

In [31]: dsd = dsd_resp.msg.datastructures[dsd_id]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-31-aee41cc35170> in <module>()----> 1 dsd = dsd_resp.msg.datastructures[dsd_id]

NameError: name 'dsd_resp' is not defined

A DSD essentially defines two things:

• the dimensions of the datasets of this dataflow, i.e. the order and names of the dimensions and the permissiblevalues or the data type for each dimension, and

• the attributes, i.e. their names, permissible values and where each may be attached. There are four possibleattachment points:

– at the individual observation

– at series level

– at group level (i.e. a subset of series defind by dimension values)

– at dataset level.

Let’s look at the dimensions and for the ‘CURRENCY’ dimension also at the allowed values as enumerated in thereferenced code list:



In [32]: list(d.id for d in dsd.dimensions.aslist())---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-32-f01703696cdb> in <module>()----> 1 list(d.id for d in dsd.dimensions.aslist())

NameError: name 'dsd' is not defined

In [33]: currency_codelist = dsd.dimensions.CURRENCY.local_repr.enum\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-33-2d605e119e58> in <module>()----> 1 currency_codelist = dsd.dimensions.CURRENCY.local_repr.enum

NameError: name 'dsd' is not defined

In [34]: len(currency_codelist)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-34-19bb8d405398> in <module>()----> 1 len(currency_codelist)

NameError: name 'currency_codelist' is not defined

In [35]: currency_codelist.USD, currency_codelist.JPY\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-35-94e9ead82449> in <module>()----> 1 currency_codelist.USD, currency_codelist.JPY

NameError: name 'currency_codelist' is not defined

So there are six dimensions. Because we can only filter out sets of columns, we disregard ‘TIME_PERIOD’ as thisis the dimension at observation. The ‘CURRENCY’ dimension stands at position 2. Moreover, we are now sure that‘USD’ and ‘JPY’ are valid dimension values. We need this information to construct a filter for our dataset query whichshould be limited to the currencies we are interested in.

Note that pandasdmx.model.Scheme.aslist() sorts the dimension objects by their position attribute. Theorder matters when constructing filters for dataset queries (see below).

Attribute names and allowed values can be obtained in a similar fashion.

Note: Groups are not yet implemented in the DSD. But this is not a major problem as they are implemented forgeneric datasets. Thus, datasets should be rendered properly including all attributes and their attachment levels.

4.5.6 Working with datasets

Selecting and requesting data from a dataflow

Requesting a dataset is as easy as requesting a dataflow definition or any other SDMX artefact: Just call thepandasdmx.api.Request.get() method and pass it ‘data’ as the resource_type and the dataflow ID as re-source_id.

However, we only want to download those parts of the data we are interested in. Not only does this increase perfor-mance. Rather, some dataflows are really huge, and would exceed the server or client limits. The REST API of SDMXoffers two ways to narrow down a data request:

4.5. Basic usage 21


• specifying dimension values which the series to be returned must match (“horizontal filter”) or

• limiting the time range or number of observations per series (“vertical filter”)

From the ECB’s dataflow on exchange rates, we specify the CURRENCY dimension to be either ‘USD’ or ‘JPY’.This can be done by passing a key keyword argument to the get method. It may either be a string (low-level API)or a dict. The dict form introduced in v0.3.0 is more convenient and pythonic as it allows pandaSDMX to infer thestring form from the dict. Its keys (= dimension values) and values (= dimension values) will be validated against thedatastructure definition as well as the content-constraint if available.

As of v0.3.0, content-constraints are implemented only in their CubeRegion flavor. KeyValueSets are not yet sup-ported. In this case, the provided demension values will be validated only against the code-list. It is thus not alwaysguaranteed that the dataset actually contains the desired data, e.g., because the country of interest does not deliver thedata to the SDMX data provider.

If we choose the string form of the key, it must consist of ‘.’-separated slots representing the dimensions. Values areoptional. As we saw in the previous section, the ECB’s dataflow for exchange rates has five relevant dimensions, the‘CURRENCY’ dimension being at position two. This yields the key ‘.USD+JPY...’. The ‘+’ can be read as an ‘OR’operator. The dict form is shown below.

Further, we will set the start period for the time series to 2014 to exclude any prior data from the request.

In [36]: data_resp = ecb.get(resource_type = 'data', resource_id = 'EXR', key={'CURRENCY': 'USD+JPY'}, params = {'startPeriod': '2014'})

In [37]: type(data_resp.msg)Out[37]: pandasdmx.model.GenericDataMessage

In [38]: data = data_resp.msg.data

In [39]: type(data)Out[39]: pandasdmx.model.GenericDataSet

Generic datasets

At present, pandaSDMX can only process generic datasets, i.e. datasets that encompass sufficient structural informa-tion to be interpreted without consulting the related DSD. However, as we saw, we need the DSD anyway to understandthe data structure, the meaning of dimension and attribute values, and to select series by specifying a valid key.

The pandasdmx.model.GenericDataSet has the following features:

dim_at_obs attribute showing which dimension is at observation level. For time series its value is either ‘TIME’or ‘TIME_PERIOD’. If it is ‘AllDimensions’, the dataset is said to be flat. In this case there are no series, just aflat list of observations.

series property returning an iterator over pandasdmx.model.Series instances

obs method returning an iterator over the observations. Only for flat datasets.

attributes namedtuple of attributes, if any, that are attached at dataset level

The pandasdmx.model.Series has the following features:

key nnamedtuple mapping dimension names to dimension values

obs method returning an iterator over observations within the series

attributes: namedtuple mapping any attribute names to values

groups list of pandasdmx.model.Group instances to which this series belongs. Note that groups are merelyattachment points for attributes.



In [40]: data.dim_at_obsOut[40]: 'TIME_PERIOD'

In [41]: series_l = list(data.series)

In [42]: len(series_l)Out[42]: 16

In [43]: series_l[5].key\\\\\\\\\\\\Out[43]: SeriesKey(FREQ='D', CURRENCY='USD', CURRENCY_DENOM='EUR', EXR_TYPE='SP00', EXR_SUFFIX='A')

In [44]: set(s.key.FREQ for s in data.series)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[44]: {'A', 'D', 'H', 'M', 'Q'}

We see that this dataset comprises 16 time series of several different period lengths.

Writing to pandas

Selecting columns using the model API

As we want to write data to a pandas DataFrame rather than an iterator of pandas Series, we must not mix up thetime spans. Therefore, we single out the daily data first. The pandasdmx.api.Response.write() acceptsan optional iterable to select a subset of the series contained in the dataset. Thus we can now generate our pandasDataFrame from daily exchange rate data only:

In [45]: daily = (s for s in data.series if s.key.FREQ == 'D')

In [46]: cur_df = data_resp.write(daily)

In [47]: cur_df.shapeOut[47]: (440, 2)

In [48]: cur_df.tail()\\\\\\\\\\\\\\\\\\Out[48]:FREQ DCURRENCY JPY USDCURRENCY_DENOM EUR EUREXR_TYPE SP00 SP00EXR_SUFFIX A A2015-09-16 135.45 1.12282015-09-17 136.76 1.13122015-09-18 136.31 1.14192015-09-21 135.50 1.12502015-09-22 133.75 1.1155

Controlling the output

The docstring of pandasdmx.writer.data2pandas.Writer.write() explains a number of optional argu-ments to control whether or not another dataframe should be generated for the attributes, which attributes it shouldcontain, and, most importantly, if the resulting pandas Series should be concatenated to a single DataFrame at all(asframe = True is the default).

4.5. Basic usage 23


Controlling index generation

Also, the write method provides the following parameters to increase performance for large datasets with regularindexes (e.g. monthly data:

• fromfreq: if True, the index will be extrapolated from the first date or period and the frequency. This is onlyrobust if the dataset has a uniform index, e.g. has no gaps like for daily trading data.

• reverse_obs:: if True, return observations in a series in reverse document order. This may be useful toestablish chronological order, in particular incombination with fromfreq. Default is False.

• If pandas raises parsing errors due to exotic date-time formats, set parse_time to False to obtain a stringindex rather than datetime index. Default is True.

4.5.7 Working with files

The pandasdmx.api.Request.getmethod accepts two optional keyword arguments tofile and fromfile.If a file path or, in case of fromfile, a file-like object is given, any SDMX message received from the server will bewritten to a file, or a file will be read instead of making a request to a remote server.

The file to be read may be a zip file (new in version 0.2.1). In this case, the SDMX message must be the first file in thearchive. The same works for zip files returned from an SDMX server. This happens, e.g., when Eurostat finds that therequested dataset has been too large. In this case the first request will yield a message with a footer containing a linkto a zip file to be made available after some time. The link may be extracted by issuing something like:

>>> resp.msg.footer.text[1]

and passed as url argument when calling get a second time to get the zipped data message.

Since version 0.2.1, this second request can be performed automatically through the get_footer_url parameter.It defaults to (30, 3) which means that three attempts will be made in 30 seconds intervals. This behavior is usefulwhen requesting large datasets from Eurostat. Deactivate it by setting get_footer_url to None.

4.5.8 Caching Response instances in memory

The ‘’get” API provides a rudimentary cache for Response instances. It is a simple dict mapping user-provided namesto the Response instances. If we want to cache a Response, we can provide a suitable name by passing the keywordargument ‘’cache” to the get method. Pre-existing items under the same key will be overwritten.

Note: Caching of http responses can also be achieved through ‘’requests-cache’. Activate the cache by instantiatingpandasdmx.api.Request passing a keyword argument cache. It must be a dict mapping config and othervalues.

4.5.9 Handling errors

The pandasdmx.api.Response instance generated after the response from the server has been received has astatus_code attribute. The SDMX web services guidelines explain the meaing of these codes. In addition, if theSDMX server has encountered an error, it may return a message which includes a footer containing explanatory notes.pandaSDMX exposes the content of a footer via a text attribute which is a list of strings.

Note: pandaSDMX raises only http errors with status code between 400 and 499. Codes >= 500 do not raise an erroras the SDMX web services guidelines define special meanings to those codes. The caller must therefore raise an errorif needed.



4.6 Advanced topics

4.6.1 Debugging the information model

The information model does not (yet) expose all attributes of SDMX messages. However, the underlying XML ele-ments are accessible from almost everywhere. This is thanks to the base class pandasdmx.model.SDMXObject.It injects two attributes: _elem and _reader which grant access to the XML element represented by the modelclass instance as well as the reader instance.

4.6.2 Extending pandaSDMX

pandaSDMX is now extensible by readers and writers. While the API needs a few refinements, it should be straightfor-ward to depart from pandasdmx.writer.data2pandas to develop writers for alternative output formats suchas spreadsheet, database, or web applications.

Similarly, a reader for the upcoming JSON-based SDMX format would be useful.

Interested developers should contact the author at [email protected].

4.7 pandasdmx

4.7.1 pandasdmx package

Subpackages

pandasdmx.reader package

Submodules

pandasdmx.reader.sdmxml module This module contains a reader for SDMXML v2.1.

class pandasdmx.reader.sdmxml.SDMXMLReader(request, **kwargs)Bases: pandasdmx.reader.BaseReader

Read SDMX-ML 2.1 and expose it as instances from pandasdmx.model

assignment_status(sdmxobj)

attr_relationship(sdmxobj)

categorisation_items(sdmxobj)

concept_id(sdmxobj)

d = {‘agencyID’: @agencyID, ‘ref_structure’: str:Structure, ‘id’: @id, ‘ref_version’: @version, ‘series_key_values_path’: gen:SeriesKey/gen:Value/@value, ‘structured_by’: mes:Structure/@structureID, ‘group_key_id_path’: gen:GroupKey/gen:Value/@id, ‘obs_key_id_path’: gen:ObsKey/gen:Value/@id, ‘ref_package’: @package, ‘uri’: @uri, ‘ref_class’: @class, ‘attr_id_path’: gen:Attributes/gen:Value/@id, ‘dim_at_obs’: //mes:Header/mes:Structure/@dimensionAtObservation, ‘group_key_values_path’: gen:GroupKey/gen:Value/@value, ‘constraint_attachment’: str:ConstraintAttachment, ‘annotationtype’: com:AnnotationType/text(), ‘attr_values_path’: gen:Attributes/gen:Value/@value, ‘urn’: @urn, ‘headerID’: mes:ID/text(), ‘value’: com:Value/text(), ‘obs_value_path’: gen:ObsValue/@value, ‘series_key_id_path’: gen:SeriesKey/gen:Value/@id, ‘obs_key_values_path’: gen:ObsKey/gen:Value/@value, ‘ref_target’: str:Target, ‘url’: @url, ‘generic_series_dim_path’: gen:ObsDimension/@value, ‘include’: @include, ‘generic_obs_path’: gen:Obs, ‘ref_source’: str:Source}

footer_code(sdmxobj)

footer_severity(sdmxobj)

footer_text(sdmxobj)return list of xml:lang attributes. If node has no attributes, assume that language is ‘en’.

generic_groups(sdmxobj)

generic_series(sdmxobj)

4.6. Advanced topics 25

mailto:[email protected]


get_dataset(elem)

group_key(sdmxobj)

header_error(sdmxobj)

header_prepared(sdmxobj)

header_sender(sdmxobj)

initialize(source)

international_str(name, sdmxobj)return DictLike of xml:lang attributes. If node has no attributes, assume that language is ‘en’.

isfinal(sdmxobj)

iter_generic_obs(sdmxobj, with_value, with_attributes)

iter_generic_series_obs(sdmxobj, with_value, with_attributes, reverse_obs=False)

k = ‘datastructures’

key = ‘ref_source’

localrepr(sdmxobj)

path = ‘str:Source’

position(sdmxobj)

read_as_str(name, sdmxobj, first_only=True)

read_identifiables(name, sdmxobj)If sdmxobj inherits from dict: update it with modelized elements. These must be instances ofmodel.IdentifiableArtefact, i.e. have an ‘id’ attribute. This will be used as dict keys. If sdmxobj doesnot inherit from dict: return a new DictLike.

read_instance(cls, sdmxobj, offset=None, first_only=True)If cls in _cls2path and matches, return an instance of cls with the first XML element, or, if fest_only isFalse, a list of cls instances for all elements found, If no matches were found, return None.

read_one(name, sdmxobj)return model class instance of the first element in the result set of the xpath expression as defined in_model_map. If no elements are found, return None.

read_subclass_instance(target_cls, sdmxobj, offset=None, first_only=True)Iterate over model classes in _cls2path which are subclasses of ‘target_cls’ and instanciate the classeswhose xpath expression returns a non-empty result. Return a list of subclass instances.

series_attrib(sdmxobj)

series_key(sdmxobj)

v = (‘mes:Structures/str:DataStructures/str:DataStructure’, <class ‘pandasdmx.model.DataStructureDefinition’>)

Module contents This module contains the base class for readers.

class pandasdmx.reader.BaseReader(request, **kwargs)Bases: object

initialize(source)



pandasdmx.utils package

Submodules

pandasdmx.utils.aadict moduleclass pandasdmx.utils.aadict.aadict

Bases: dict

A dict subclass that allows attribute access to be synonymous with item access, e.g. mydict.attribute== mydict[’attribute’]. It also provides several other useful helper methods, such as pick() andomit().

static d2a(subject)

static d2ar(subject)

omit(*args)

pick(*args)

update(*args, **kw)

Module contents module pandasdmx.utils - helper classes and functions

class pandasdmx.utils.DictLikeBases: pandasdmx.utils.aadict.aadict

Thin wrapper around dict type

It allows attribute-like item access, has a find() method and inherits other useful features from aadict.

aslist()property returning values() as unordered list

find(search_str, by=’name’, language=’en’)Select values by attribute

Parameters

• searchstr (str) – the string to search for

• by (str) – the name of the attribute to search by, defaults to ‘name’ The specified at-tribute must be either a string or a dict mapping language codes to strings. Such attributesoccur, e.g. in pandasdmx.model.NameableArtefact which is a base class forpandasdmx.model.DataFlowDefinition and many others.

• language (str) – language code specifying the language of the text to be searched, de-faults to ‘en’

Returns items where value.<by> contains the search_str. International strings stored as dict withlanguage codes as keys are searched. Capitalization is ignored.

Return type DictLike

class pandasdmx.utils.NamedTupleFactoryBases: object

Wrap namedtuple function from the collections stdlib module to return a singleton if a nametuple with the samefield names has already been created.

__call__(name, fields)return a subclass of tuple instance as does namedtuple

4.7. pandasdmx 27


cache = {(‘dim’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.SeriesObservation’>, (‘key’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.GenericObservation’>}

pandasdmx.writer package

Submodules

pandasdmx.writer.data2pandas module This module contains a writer class that writes a generic data message topandas dataframes or series.

class pandasdmx.writer.data2pandas.Writer(msg, **kwargs)Bases: pandasdmx.writer.BaseWriter

iter_pd_series(iter_series, dim_at_obs, dtype, attributes, reverse_obs, fromfreq, parse_time)

write(source=None, asframe=True, dtype=<class ‘numpy.float64’>, attributes=’‘, reverse_obs=False,fromfreq=False, parse_time=True)

Transfform a pandasdmx.model.DataMessage instance to a pandas DataFrame or iterator overpandas Series.

Parameters

• source (pandasdmx.model.DataMessage) – a pandasdmx.model.DataSet or iterator ofpandasdmx.model.Series

• asframe (bool) – if True, merge the series of values and/or attributes into one or twomulti-indexed pandas.DataFrame(s), otherwise return an iterator of pandas.Series. (de-fault: True)

• dtype (str, NP.dtype, None) – datatype for values. Defaults to NP.float64 if None, do notreturn the values of a series. In this case, attributes must not be an empty string so thatsome attribute is returned.

• attributes (str, None) – string determining which attributes, if any, should be returnedin separate series or a separate DataFrame. Allowed values: ‘’, ‘o’, ‘s’, ‘g’, ‘d’ or anycombination thereof such as ‘os’, ‘go’. Defaults to ‘osgd’. Where ‘o’, ‘s’, ‘g’, and ‘d’mean that attributes at observation, series, group and dataset level will be returned asmembers of per-observation dict-likes with attribute-like access.

• reverse_obs (bool) – if True, return observations in reverse order. Default: False

• fromfreq (bool) – if True, extrapolate time periods from the first item and FREQ di-mension. Default: False

• parse_time (bool) – if True (default), try to generate datetime index, provided thatdim_at_obs is ‘TIME’ or ‘TIME_PERIOD’. Otherwise, parse_time is ignored. IfFalse, always generate index of strings. Set it to False to increase performance and avoidparsing errors for exotic date-time formats unsupported by pandas.

Module contents This module contains the base class for writers.

class pandasdmx.writer.BaseWriter(msg, **kwargs)Bases: object



Submodules

pandasdmx.api module

This module defines two classes: pandasdmx.api.Request and pandasdmx.api.Response. Together,these form the high-level API of pandasdmx. Requesting data and metadata from an SDMX server requires a goodunderstanding of this API and a basic understanding of the SDMX web service guidelines only the chapters on RESTservices are relevant as pandasdmx does not support the SOAP interface.

class pandasdmx.api.Request(agency=’‘, writer=’pandasdmx.writer.data2pandas’, cache=None,**http_cfg)

Bases: object

Get SDMX data and metadata from remote servers or local files.

agency

clear_cache()

get(resource_type=’‘, resource_id=’‘, agency=’‘, key=’‘, params={}, fromfile=None, tofile=None,url=None, get_footer_url=(30, 3), memcache=None)get SDMX data or metadata and return it as a pandasdmx.api.Response instance.

While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs forthe SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Requestinstance for each data provider you want to access, or pass a pre-fabricated URL through the url param-eter.

Parameters

• resource_type (str) – the type of resource to be requested. Values must be one of theitems in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It is usedfor URL construction, not to read the received SDMX file. Hence, if fromfile is given,resource_type may be ‘’. Defaults to ‘’.

• resource_id (str) – the id of the resource to be requested. It is used for URL construc-tion. Defaults to ‘’.

• agency (str) – ID of the agency providing the data or metadata. Used for URL con-struction only. It tells the SDMX web service which agency the requested informationoriginates from. Note that an SDMX service may provide information from multiple dataproviders. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to__init__() which specifies the SDMX web service to be accessed.

• key (str, dict) – select columns from a dataset by specifying dimension values. If type isstr, it must conform to the SDMX REST API, i.e. dot-separated dimension values. If ‘key’is of type ‘dict’, it must map dimension names to allowed dimension values. Two or morevalues can be separated by ‘+’ as in the str form. The DSD will be downloaded and theitems are validated against it before downloading the dataset.

• params (dict) – defines the query part of the URL. The SDMX web service guidelines(www.sdmx.org) explain the meaning of permissible parameters. It can be used to re-strict the time range of the data to be delivered (startperiod, endperiod), whether parents,siblings or descendants of the specified resource should be returned as well (e.g. ref-erences=’parentsandsiblings’). Sensible defaults are set automatically depending on thevalues of other args such as resource_type. Defaults to {}.

• fromfile (str) – path to the file to be loaded instead of accessing an SDMX web service.Defaults to None. If fromfile is given, args relating to URL construction will be ignored.

4.7. pandasdmx 29


• tofile (str) – file path to write the received SDMX file on the fly. This is useful if youwant to load data offline using fromfile or if you want to open an SDMX file in an XMLeditor.

• url (str) – URL of the resource to download. If given, any other arguments such asresource_type or resource_id are ignored. Default is None.

• get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts). De-termines the behavior in case the received SDMX message has a footer where one of itslines is a valid URL. get_footer_url defines how many attempts should be madeto request the resource at that URL after waiting so many seconds before each attempt.This behavior is useful when requesting large datasets from Eurostat. Other agencies donot seem to send such footers. Once an attempt to get the resource has been successful,the original message containing the footer is dismissed and the dataset is returned. Thetofile argument is propagated. Note that the written file may be a zip archive. pandaS-DMX handles zip archives since version 0.2.1. Defaults to (30, 3).

• memcache (str) – If given, return Response instance if already in self.cache(dict),

• download resource and cache Response instance. (otherwise) –

Returns instance containing the requested SDMX Message.

Return type pandasdmx.api.Response

get_reader()get a Reader instance. Called by get().

make_key(flow_id, key)Download the dataflow def. and DSD and validate key(dict) against it.

Return: key(str)

pandasdmx.model module

This module is part of the pandaSDMX package

SDMX 2.1 information model

3. 2014 Dr. Leo ([email protected])

class pandasdmx.model.AnnotableArtefact(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

annotations

class pandasdmx.model.Annotation(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

annotationtype

id

text

title

url

class pandasdmx.model.AttributeDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Categorisation(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact


mailto:[email protected]


class pandasdmx.model.Categorisations(*args, **kwargs)Bases: pandasdmx.model.SDMXObject, pandasdmx.utils.DictLike

class pandasdmx.model.Category(*args, **kwargs)Bases: pandasdmx.model.Item

class pandasdmx.model.CategoryScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Code(*args, **kwargs)Bases: pandasdmx.model.Item

class pandasdmx.model.Codelist(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.Component(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact

concept

local_repr

class pandasdmx.model.ComponentList(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact, pandasdmx.model.Scheme

class pandasdmx.model.Concept(*args, **kwargs)Bases: pandasdmx.model.Item

class pandasdmx.model.ConceptScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme

class pandasdmx.model.ConstrainableBases: object

class pandasdmx.model.Constraint(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.ContentConstraint(*args, **kwargs)Bases: pandasdmx.model.Constraint

class pandasdmx.model.CubeRegion(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.DataAttribute(*args, **kwargs)Bases: pandasdmx.model.Component

related_to

usage_status

class pandasdmx.model.DataMessage(*args, **kwargs)Bases: pandasdmx.model.Message

class pandasdmx.model.DataSet(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

dim_at_obs

iter_groups

obs(with_values=True, with_attributes=True)return an iterator over observations in a flat dataset. An observation is represented as a namedtuple with 3fields (‘key’, ‘value’, ‘attrib’).

4.7. pandasdmx 31


obs.key is a namedtuple of dimensions. Its field names represent dimension names, its values the dimensionvalues.

obs.value is a string that can in in most cases be interpreted as float64 obs.attrib is a namedtuple of attributenames and values.

with_values and with_attributes: If one or both of these flags is False, the respective value will be None.Use these flags to increase performance. The flags default to True.

seriesreturn an iterator over Series instances in this DataSet. Note that DataSets in flat format, i.e.header.dim_at_obs = “AllDimensions”, have no series. Use DataSet.obs() instead.

class pandasdmx.model.DataStructureDefinition(*args, **kwargs)Bases: pandasdmx.model.Structure

class pandasdmx.model.DataflowDefinition(*args, **kwargs)Bases: pandasdmx.model.StructureUsage, pandasdmx.model.Constrainable

class pandasdmx.model.Dimension(*args, **kwargs)Bases: pandasdmx.model.Component

class pandasdmx.model.DimensionDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.Facet(facet_type=None, facet_value_type=’‘, itemscheme_facet=’‘, *args,**kwargs)

Bases: object

facet_type = {}

facet_value_type = (‘String’, ‘Big Integer’, ‘Integer’, ‘Long’, ‘Short’, ‘Double’, ‘Boolean’, ‘URI’, ‘DateTime’, ‘Time’, ‘GregorianYear’, ‘GregorianMonth’, ‘GregorianDate’, ‘Day’, ‘MonthDay’, ‘Duration’)

itemscheme_facet = ‘’

class pandasdmx.model.Footer(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

code

severity

text

class pandasdmx.model.GenericDataMessage(*args, **kwargs)Bases: pandasdmx.model.DataMessage

class pandasdmx.model.GenericDataSet(*args, **kwargs)Bases: pandasdmx.model.DataSet

class pandasdmx.model.Group(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.Header(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

error

id

prepared

sender

structure



class pandasdmx.model.IdentifiableArtefact(*args, **kwargs)Bases: pandasdmx.model.AnnotableArtefact

uri

urn

class pandasdmx.model.Item(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact

children

parent

class pandasdmx.model.ItemScheme(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact, pandasdmx.model.Scheme

is_partial

class pandasdmx.model.KeyValue(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.MaintainableArtefact(*args, **kwargs)Bases: pandasdmx.model.VersionableArtefact

is_external_ref

is_final

maintainer

service_url

structure_url

class pandasdmx.model.MeasureDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList

class pandasdmx.model.MeasureDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension

class pandasdmx.model.Message(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

header

class pandasdmx.model.NameableArtefact(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact

description

name

class pandasdmx.model.PrimaryMeasure(*args, **kwargs)Bases: pandasdmx.model.Component

class pandasdmx.model.Ref(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject

agency_id

id

package

ref_class

resolve()

4.7. pandasdmx 33


version

class pandasdmx.model.ReportingYearStartDay(*args, **kwargs)Bases: pandasdmx.model.DataAttribute

class pandasdmx.model.Representation(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

class pandasdmx.model.SDMXObject(reader, elem, **kwargs)Bases: object

class pandasdmx.model.Scheme(*args, **kwargs)Bases: pandasdmx.utils.DictLike

aslist()

class pandasdmx.model.Series(*args, **kwargs)Bases: pandasdmx.model.SDMXObject

group_attribreturn a namedtuple containing all attributes attached to groups of which the given series is a member foreach group of which the series is a member

obs(with_values=True, with_attributes=True, reverse_obs=False)return an iterator over observations in a series. An observation is represented as a namedtuple with 3 fields(‘key’, ‘value’, ‘attrib’). obs.key is a namedtuple of dimensions, obs.value is a string value and obs.attribis a namedtuple of attributes. If with_values or with_attributes is False, the respective value is None. Usethese flags to increase performance. The flags default to True.

class pandasdmx.model.Structure(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

class pandasdmx.model.StructureMessage(*args, **kwargs)Bases: pandasdmx.model.Message

class pandasdmx.model.StructureSpecificDataMessage(*args, **kwargs)Bases: pandasdmx.model.DataMessage

class pandasdmx.model.StructureSpecificDataSet(*args, **kwargs)Bases: pandasdmx.model.DataSet

class pandasdmx.model.StructureUsage(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact

structure

class pandasdmx.model.TimeDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension

class pandasdmx.model.VersionableArtefact(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact

valid_from

valid_to

version

pandasdmx.remote module

This module is part of pandaSDMX. It contains a classes for http access.



class pandasdmx.remote.REST(cache, http_cfg)Bases: object

Query SDMX resources via REST or from a file

The constructor accepts arbitrary keyword arguments that will be passed to the requests.get function on eachcall. This makes the REST class somewhat similar to a requests.Session. E.g., proxies or authorisation dataneeds only be provided once. The keyword arguments are stored in self.config. Modify this dict to issue thenext ‘get’ request with changed arguments.

get(url, fromfile=None, params={})Get SDMX message from REST service or local file

Parameters

• url (str) – URL of the REST service without the query part If None, fromfile must be set.Default is None

• params (dict) – will be appended as query part to the URL after a ‘?’

• fromfile (str) – path to SDMX file containing an SDMX message. It will be passed onto the reader for parsing.

Returns

three objects:

0. file-like object containing the SDMX message

1. the complete URL, if any, including the query part constructed from params

2. the status code

Return type tuple

Raises HTTPError if SDMX service responded with –

status code 401. Otherwise, the status code is returned

max_size = 16777216upper bound for in-memory temp file. Larger files will be spooled from disc

request(url, params={})Retrieve SDMX messages. If needed, override in subclasses to support other data providers.

Parameters url (str) – The URL of the message.

Returns the xml data as file-like object

Module contents

pandaSDMX - a Python package for SDMX - Statistical Data and Metadata eXchange

4.8 Contributing

Contributions such as bug reports or pull requests and any other user feedback are much appreciated. Developmenttakes place on github. There is also a low traffic Mailing list.

4.8. Contributing 35

https://github.com/dr-leo/pandaSDMX

https://groups.google.com/forum/?hl=en#!forum/sdmx-python


4.9 License

Notwithstanding other licenses applicable to any third-party software included in this package, pandaSDMX is li-censed under the Apache 2.0 license, a copy of which is included in the source distribution.

Copyright 2014, 2015 Dr. Leo <fhaxbox66qgmail.com>, All Rights Reserved.


http://www.apache.org/licenses/

CHAPTER 5

Indices and tables

• genindex

• modindex

• search

37


38 Chapter 5. Indices and tables

Python Module Index

ppandasdmx, 35pandasdmx.api, 29pandasdmx.model, 30pandasdmx.reader, 26pandasdmx.reader.sdmxml, 25pandasdmx.remote, 34pandasdmx.utils, 27pandasdmx.utils.aadict, 27pandasdmx.writer, 28pandasdmx.writer.data2pandas, 28

39


40 Python Module Index

Index

Symbols__call__() (pandasdmx.utils.NamedTupleFactory

method), 27

Aaadict (class in pandasdmx.utils.aadict), 27agency (pandasdmx.api.Request attribute), 29agency_id (pandasdmx.model.Ref attribute), 33AnnotableArtefact, 14AnnotableArtefact (class in pandasdmx.model), 30Annotation (class in pandasdmx.model), 30annotations (pandasdmx.model.AnnotableArtefact

attribute), 30annotationtype (pandasdmx.model.Annotation attribute),

30aslist() (pandasdmx.model.Scheme method), 34aslist() (pandasdmx.utils.DictLike method), 27assignment_status() (pandas-

dmx.reader.sdmxml.SDMXMLReadermethod), 25

attachment-constraint, 13attr_relationship() (pandas-


AttributeDescriptor (class in pandasdmx.model), 30attributes, 12

BBaseReader (class in pandasdmx.reader), 26BaseWriter (class in pandasdmx.writer), 28

Ccache (pandasdmx.utils.NamedTupleFactory attribute),

27Categorisation, 13Categorisation (class in pandasdmx.model), 30categorisation_items() (pandas-


Categorisations (class in pandasdmx.model), 30

Category, 13Category (class in pandasdmx.model), 31CategoryScheme (class in pandasdmx.model), 31CategorySchemes, 13children (pandasdmx.model.Item attribute), 33classes, 12clear_cache() (pandasdmx.api.Request method), 29Code (class in pandasdmx.model), 31code (pandasdmx.model.Footer attribute), 32code lists, 13Codelist (class in pandasdmx.model), 31Component (class in pandasdmx.model), 31ComponentList (class in pandasdmx.model), 31concept, 13Concept (class in pandasdmx.model), 31concept (pandasdmx.model.Component attribute), 31concept scheme, 13concept_id() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 25ConceptScheme (class in pandasdmx.model), 31Constrainable (class in pandasdmx.model), 31Constraint (class in pandasdmx.model), 31content-constraint, 13ContentConstraint (class in pandasdmx.model), 31cross-sectional, 12CubeRegion (class in pandasdmx.model), 31

Dd (pandasdmx.reader.sdmxml.SDMXMLReader at-

tribute), 25d2a() (pandasdmx.utils.aadict.aadict static method), 27d2ar() (pandasdmx.utils.aadict.aadict static method), 27DataAttribute (class in pandasdmx.model), 31dataflow, 13DataFlowDefinition, 13, 14DataflowDefinition (class in pandasdmx.model), 32DataMessage (class in pandasdmx.model), 31DataSet, 14dataset, 12DataSet (class in pandasdmx.model), 31DataStructureDefinition, 13, 14

41


DataStructureDefinition (class in pandasdmx.model), 32description (pandasdmx.model.NameableArtefact at-

tribute), 33DictLike (class in pandasdmx.utils), 27dim_at_obs (pandasdmx.model.DataSet attribute), 31Dimension (class in pandasdmx.model), 32dimension at observation, 12DimensionDescriptor (class in pandasdmx.model), 32dimensions, 12

Eerror (pandasdmx.model.Header attribute), 32

FFacet (class in pandasdmx.model), 32facet_type (pandasdmx.model.Facet attribute), 32facet_value_type (pandasdmx.model.Facet attribute), 32facets, 13find() (pandasdmx.utils.DictLike method), 27flat datasets, 12Footer, 14Footer (class in pandasdmx.model), 32footer_code() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 25footer_severity() (pandas-


footer_text() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 25

Ggeneric_groups() (pandas-


generic_series() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 25

GenericDataMessage, 14GenericDataMessage (class in pandasdmx.model), 32GenericDataSet (class in pandasdmx.model), 32get() (pandasdmx.api.Request method), 29get() (pandasdmx.remote.REST method), 35get_dataset() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 25get_reader() (pandasdmx.api.Request method), 30group, 12Group (class in pandasdmx.model), 32group_attrib (pandasdmx.model.Series attribute), 34group_key() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26groups, 12

HHeader, 14

Header (class in pandasdmx.model), 32header (pandasdmx.model.Message attribute), 33header_error() (pandas-


header_prepared() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26

header_sender() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26

Iid (pandasdmx.model.Annotation attribute), 30id (pandasdmx.model.Header attribute), 32id (pandasdmx.model.Ref attribute), 33IdentifiableArtefact, 14IdentifiableArtefact (class in pandasdmx.model), 32information model, 12initialize() (pandasdmx.reader.BaseReader method), 26initialize() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26international_str() (pandas-


is_external_ref (pandasdmx.model.MaintainableArtefactattribute), 33

is_final (pandasdmx.model.MaintainableArtefact at-tribute), 33

is_partial (pandasdmx.model.ItemScheme attribute), 33isfinal() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26Item (class in pandasdmx.model), 33ItemScheme (class in pandasdmx.model), 33itemscheme_facet (pandasdmx.model.Facet attribute), 32iter_generic_obs() (pandas-


iter_generic_series_obs() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26

iter_groups (pandasdmx.model.DataSet attribute), 31iter_pd_series() (pandasdmx.writer.data2pandas.Writer

method), 28

Kk (pandasdmx.reader.sdmxml.SDMXMLReader at-

tribute), 26key (pandasdmx.reader.sdmxml.SDMXMLReader

attribute), 26KeyValue (class in pandasdmx.model), 33

Llocal_repr (pandasdmx.model.Component attribute), 31

42 Index


localrepr() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 26

MMaintainableArtefact, 14MaintainableArtefact (class in pandasdmx.model), 33maintainer (pandasdmx.model.MaintainableArtefact at-

tribute), 33make_key() (pandasdmx.api.Request method), 30max_size (pandasdmx.remote.REST attribute), 35MeasureDescriptor (class in pandasdmx.model), 33MeasureDimension, 13MeasureDimension (class in pandasdmx.model), 33Message, 14Message (class in pandasdmx.model), 33

Nname (pandasdmx.model.NameableArtefact attribute), 33NameableArtefact (class in pandasdmx.model), 33NamedTupleFactory (class in pandasdmx.utils), 27

Oobs() (pandasdmx.model.DataSet method), 31obs() (pandasdmx.model.Series method), 34observations, 12omit() (pandasdmx.utils.aadict.aadict method), 27

Ppackage (pandasdmx.model.Ref attribute), 33pandasdmx (module), 35pandasdmx.api (module), 29pandasdmx.model (module), 30pandasdmx.reader (module), 26pandasdmx.reader.sdmxml (module), 25pandasdmx.remote (module), 34pandasdmx.utils (module), 27pandasdmx.utils.aadict (module), 27pandasdmx.writer (module), 28pandasdmx.writer.data2pandas (module), 28parent (pandasdmx.model.Item attribute), 33path (pandasdmx.reader.sdmxml.SDMXMLReader at-

tribute), 26pick() (pandasdmx.utils.aadict.aadict method), 27position() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26prepared (pandasdmx.model.Header attribute), 32PrimaryMeasure (class in pandasdmx.model), 33

Rread_as_str() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26read_identifiables() (pandas-


read_instance() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26

read_one() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 26

read_subclass_instance() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26

Ref (class in pandasdmx.model), 33ref_class (pandasdmx.model.Ref attribute), 33references, 14related_to (pandasdmx.model.DataAttribute attribute), 31ReportingYearStartDay (class in pandasdmx.model), 34Representation (class in pandasdmx.model), 34Request (class in pandasdmx.api), 29request() (pandasdmx.remote.REST method), 35resolve() (pandasdmx.model.Ref method), 33REST (class in pandasdmx.remote), 34

SScheme (class in pandasdmx.model), 34SDMXML, 14SDMXMLReader (class in pandasdmx.reader.sdmxml),

25SDMXObject (class in pandasdmx.model), 34sender (pandasdmx.model.Header attribute), 32series, 12Series (class in pandasdmx.model), 34series (pandasdmx.model.DataSet attribute), 32series_attrib() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26series_key() (pandasdmx.reader.sdmxml.SDMXMLReader

method), 26service_url (pandasdmx.model.MaintainableArtefact at-

tribute), 33severity (pandasdmx.model.Footer attribute), 32Structure (class in pandasdmx.model), 34structure (pandasdmx.model.Header attribute), 32structure (pandasdmx.model.StructureUsage attribute),

34structure_url (pandasdmx.model.MaintainableArtefact at-

tribute), 33StructureMessage, 14StructureMessage (class in pandasdmx.model), 34StructureSpecificDataMessage (class in pandas-

dmx.model), 34StructureSpecificDataSet, 14StructureSpecificDataSet (class in pandasdmx.model), 34StructureUsage (class in pandasdmx.model), 34

Ttext (pandasdmx.model.Annotation attribute), 30text (pandasdmx.model.Footer attribute), 32TimeDimension (class in pandasdmx.model), 34

Index 43


title (pandasdmx.model.Annotation attribute), 30

Uupdate() (pandasdmx.utils.aadict.aadict method), 27uri (pandasdmx.model.IdentifiableArtefact attribute), 33url (pandasdmx.model.Annotation attribute), 30urn (pandasdmx.model.IdentifiableArtefact attribute), 33usage_status (pandasdmx.model.DataAttribute attribute),

31

Vv (pandasdmx.reader.sdmxml.SDMXMLReader at-

tribute), 26valid_from (pandasdmx.model.VersionableArtefact at-

tribute), 34valid_to (pandasdmx.model.VersionableArtefact at-

tribute), 34version (pandasdmx.model.Ref attribute), 33version (pandasdmx.model.VersionableArtefact at-

tribute), 34VersionableArtefact, 14VersionableArtefact (class in pandasdmx.model), 34

Wwrite() (pandasdmx.writer.data2pandas.Writer method),

28Writer (class in pandasdmx.writer.data2pandas), 28

44 Index

Documents

pandaSDMX Documentation - Read the Docs · SDMX(short for: Statistical Data and Metadata eXchange) is a set ofstandards and guidelinesaimed at facilitating the production, dissemination,