Upload
doankhue
View
286
Download
0
Embed Size (px)
Citation preview
pandaSDMX DocumentationRelease 0.3.0
Dr. Leo
September 22, 2015
Contents
1 Main features 3
2 Example 5
3 pandaSDMX Links 7
4 Table of contents 94.1 What’s new? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94.2 FAQ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.3 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.4 A very short introduction to SDMX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124.5 Basic usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.6 Advanced topics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.7 pandasdmx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.8 Contributing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.9 License . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5 Indices and tables 37
Python Module Index 39
i
ii
pandaSDMX Documentation, Release 0.3.0
pandaSDMX is an Apache 2.0-licensed Python package aimed at becoming the most intuitive and versatile tool toretrieve and acquire statistical data and metadata disseminated in SDMX format. It works well with the SDMXservices of the European statistics office (Eurostat) and the European Central Bank (ECB). While pandaSDMX isextensible to cater any output format, it currently supports only pandas, the gold-standard of data analysis in Python.But from pandas you can export your data to Excel and friends.
Contents 1
pandaSDMX Documentation, Release 0.3.0
2 Contents
CHAPTER 1
Main features
• intuitive API inspired by requests
• support for many SDMX features including
– generic datasets
– data structure definitions, code lists and concept schemes
– dataflow definitions and content-constraints
– categorisations and category schemes
• pythonic representation of the SDMX information model
• find dataflows by name or description in multiple languages if available
• When requesting datasets, validate column selections against code lists and content-constraints if available
• read and write SDMX messages to and from local files
• configurable HTTP connections
• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite
• writer transforming SDMX generic datasets into multi-indexed pandas DataFrames or Series of observationsand attributes
• extensible through custom readers and writers for alternative input and output formats of data and metadata
3
pandaSDMX Documentation, Release 0.3.0
4 Chapter 1. Main features
CHAPTER 2
Example
In [1]: from pandasdmx import Request
# Get recent annual unemployment data on Greece, Ireland and Spain from EurostatIn [2]: une_resp = Request('ESTAT').get('data', 'une_rt_a', key={'GEO': 'EL+ES+IE'}, params={'startPeriod': '2006'})
# From the received dataset, select the time series on all age groups and write them to a pandas DataFrameIn [3]: une_df = une_resp.write(s for s in une_resp.msg.data.series if s.key.AGE == 'TOTAL')
# Explore the DataFrame. First, show dimension namesIn [4]: une_df.columns.namesOut[4]: FrozenList(['AGE', 'SEX', 'S_ADJ', 'GEO', 'FREQ'])
# corresponding dimension valuesIn [5]: une_df.columns.levels\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[5]: FrozenList([['TOTAL'], ['F', 'M', 'T'], ['NSA'], ['EL', 'ES', 'IE'], ['A']])
# Print aggregate unemployment rates across ages and sexesIn [6]: une_df.loc[:, ('TOTAL', 'T')]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[6]:S_ADJ NSAGEO EL ES IEFREQ A A A2014 26.5 24.5 11.32013 27.5 26.1 13.12012 24.5 24.8 14.72011 17.9 21.4 14.72010 12.7 19.9 13.92009 9.6 17.9 12.02008 7.8 11.3 6.42007 8.4 8.2 4.72006 9.0 8.5 4.5
5
pandaSDMX Documentation, Release 0.3.0
6 Chapter 2. Example
CHAPTER 3
pandaSDMX Links
• Download the latest stable version from the Python package index
• Mailing list
• github
• Official SDMX website
7
pandaSDMX Documentation, Release 0.3.0
8 Chapter 3. pandaSDMX Links
CHAPTER 4
Table of contents
4.1 What’s new?
4.1.1 v0.3.0 (2015-09-22)
• support for requests-cache allowing to cache SDMX messages in memory, MongoDB, Redis or SQLite
• pythonic selection of series when requesting a dataset: Request.get allows the key keyword argument in a datarequest to be a dict mapping dimension names to values. In this case, the dataflow definition and datastructuredefinition, and content-constraint are downloaded on the fly, cached in memory and used to validate the keys.The dotted key string needed to construct the URL will be generated automatically.
• The Response.write method takes a parse_time keyword arg. Set it to False to avoid parsing of dates, timesand time periods as exotic formats may cause crashes.
• The Request.get method takes a memcache keyward argument. If set to a string, the received Response instancewill be stored in the dict Request.cache for later use. This is useful when, e.g., a DSD is needed multipletimes to validate keys.
• fixed base URL for Eurostat
• major refactorings to enhance code maintainability
4.1.2 v0.2.2
• Make HTTP connections configurable by exposing the requests.get API through thepandasdmx.api.Request constructor. Hence, proxy servers, authorisation information and other HTTP-related parameters consumed by requests.get can be specified for each Request instance and used insubsequent requests. The configuration is exposed as a dict through a new Request.client.configattribute.
• Responses have a new http_headers attribute containing the HTTP headers returned by the SDMX server
4.1.3 v0.2.1
• Request.get: allow fromfile to be a file-like object
• extract SDMX messages from zip archives if given. Important for large datasets from Eurostat
9
pandaSDMX Documentation, Release 0.3.0
• automatically get a resource at an URL given in the footer of the received message. This allows to automaticallyget large datasets from Eurostat that have been made available at the given URL. The number of attempts andthe time to wait before each request are configurable via the get_footer_url argument.
4.1.4 v0.2 (2015-04-13)
This version is a quantum leap. The whole project has been redesigned and rewritten from scratch to provide robustsupport for many SDMX features. The new architecture is centered around a pythonic representation of the SDMXinformation model. It is extensible through readers and writers for alternative input and output formats. Export topandas has been dramatically improved. Sphinx documentation has been added.
4.1.5 v0.1 (2014-09)
Initial release
4.2 FAQ
4.2.1 Can pandaSDMX connect to SDMX providers other than ECB and Eurostat?
Any SDMX provider can be supported that generates SDMX 2.1-compliant messages. The only agencies I know ofthat deliver data in this format are ECB and Eurostat. Support for SDMX 2.0 messages could be added as a new readermodule. Perhaps the model would have to be tweaked a bit as well.
4.2.2 Writing large datasets to pandas DataFrames is slow. What can I do?
The main performance hit comes from parsing the time or time period strings. In case of regular data such as monthly(not trading day!), call the write method with fromfreq set to True so that only the first string will be parsedand the rest inferred from the frequency of the series. Caution: If the series is stored in the XML document in reversechronological order, the reverse_obs argument must be set to True as well to prevent the resulting dataframe indexfrom extending into a remote future.
4.3 Getting started
4.3.1 Installation
Prerequisites
pandaSDMX is a pure Python package. As such it should run on any platform. It requires Python 2.7 or 3.4. Python3.3 should work as well, but this is untested. Python 3.5 should work once the dependencies have been ported.
It is recommended to use one of the common Python distributions for scientific data analysis such as
• Anaconda, or
• Canopy.
Along with a current Python interpreter these Python distributions include the dependencies as well as lots of otheruseful packages for data analysis. For other Python distributions (not only scientific) see here.
pandaSDMX has the following dependencies:
10 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
• the data analysis library pandas which itself depends on a number of packages
• requests
• LXML
Optional dependencies
• requests-cache allowing to cache SDMX messages in memory
• IPython is required to build the Sphinx documentation To do this, check out the pandaSDMX repository ongithub.
• nose to run the test suite.
Download
You can download and install pandaSDMX like any other Python package, e.g.
• from the command line with pip install pandasdmx, or
• manually by downloading and unzipping the latest source distribution. From the package directory you shouldthen issue the command python setup.py install.
4.3.2 Running the test suite
From the package directory, issue the folloing command:
>>> nosetests pandasdmx
4.3.3 Package overview
Modules
api module containing the API to make queries to SDMX web services. See pandasdmx.api.Request in partic-ular its get method. pandasdmx.api.Request.get() return pandasdmx.api.Response instances.
model implements the SDMX information model.
remote contains a wrapper class around requests for http. Called by pandasdmx.api.Request.get() tomake http requests to SDMX services. Also reads sdmxml files instead of querying them over the web.
Subpackages
reader read SDMX files and instantiate the appropriate classes from pandasdmx.model There is only one readerfor XML-based SDMXML v2.1. Future versions may add reader modules for other formats.
writer contains writer classes transforming SDMX artefacts into other formats or writing them to arbitrary desti-nations such as databases. The only available writer for now writes generic datasets to pandas DataFrame orSeries.
utils: utility functions and classes. Contains a wrapper around dict allowing attribute access to dict items.
tests unit tests and sample files
4.3. Getting started 11
pandaSDMX Documentation, Release 0.3.0
4.3.4 What next?
The remaining chapters explain the key characteristics of SDMX, demonstrate the basic usage of pandaSDMX andprovide additional information on some advanced topics. While users that are new to SDMX are likely to benefit a lotfrom reading the next chapter on SDMX, normal use of pandaSDMX should not strictly require this. The Basic usagechapter should enable you to retrieve datasets and write them to pandas DataFrames. But if you want to exploit thefull richness of the information model, or simply feel more comfortable if you know what happens behind the scenes,the SDMX introduction is for you. It also contains links to authoratative SDMX resources.
4.4 A very short introduction to SDMX
4.4.1 Overall purpose
SDMX (short for: Statistical Data and Metadata eXchange) is a set of standards and guidelines aimed at facilitatingthe production, dissemination, retrieval and processing of statistical data and metadata. SDMX is sponsored by a widerange of public institutions including the UN, the IMF, the Worldbank, BIS, ILO, FAO, the OECD, the ECB, Eurostat,and a number of national statistics offices. These and other institutions provide a vast array of current and historicdatasets and metadatasets via free or fee-based REST and SOAP web services. pandaSDMX only supports SDMXv2.1, that is, the latest version of this standard. Some agencies such as the IMF continue to offer SDMX 2.0-compliantservices. These cannot be accessed by pandaSDMX. While this may change in future versions, there is the expectationthat SDMX providers will upgrade to the latest standards at some point.
4.4.2 Information model
At its core, SDMX defines an information model consisting of a set of classes, their logical relations, and semantics.There are classes defining things like datasets, metadatasets, data and metadata structures, processes, organisationsand their specific roles to name but a few. The information model is agnostic as to its implementation. Luckily, theSDMX standard provides an XML-based implementation (see below). And a JSON-variant is in the works.
The following sections briefly introduces some key elements of the information model.
Datasets
a dataset can broadly be described as a container of ordered observations and attributes attached to them. Observations(e.g. the annual unemployment rate) are classified by dimensions such as country, age, sex, and time period. Attributesmay further describe an individual observation or a set of observations. Typical uses for attributes are the level ofconfidentiality, or data quality. Observations may be clustered into series, in particular, time series. The dataset mustexplicitly specify the dimension at observation such as ‘time’, ‘time_period’ or anything else. If a dataset consistsof series whose dimension at observation is neither time nor time period, the dataset is called cross-sectional. Adataset that is not grouped into series, i.e. where all dimension values including time, if available, are stated for eachobservation, are called flat datasets. These are not memory-efficient, but benefit from a very simple representation.
An attribute may be attached to a series to express the fact that it applies to all contained observations. This increasesefficiency and adds meaning. Subsets of series within a dataset may be clustered into groups. A group is defined byspecifying one or more dimension values, but not all: At least the dimension at observation and one other dimensionmust remain free (or wild-carded). Otherwise, the group would in fact be either a single observation or a series. Themain purpose of group is to serve as another attachment point for attributes. Hence, a given attributes may be attachedto all series within the group at once. Attributes may finally be attached to the entire dataset, i.e. to all observationstherein.
12 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
Structural metadata: data structure definition, concept scheme and code list
In the above section on datasets, we have carelessly used structural terms such as dimension, dimension value andattachment of attributes. This is because it is almost impossible to talk about datasets without talking about theirstructure. The information model provides a number of classes to describe the structure of datasets without talkingabout data. The container class for this is called DataStructureDefinition (in short: DSD). It contains a list of dimen-sions and for each dimension a reference to exactly one concept describing its meaing. A concept describes the set ofpermissible dimension values. This can be done in various ways depending on the intended data type. Finite value sets(such as country codes, currencies, a data quality classification etc.) are described by reference to code lists. Infinitevalue sets are described by facets which is simply a way to express that a dimension may have int, float or time-stampvalues, to name but a few. A set of concepts referred to in the dimension descriptors of a data structure definition iscalled concept scheme.
The set of allowed observation values such as the unemployment rate measured in per cent is defined by a specialdimension: the MeasureDimension, thus enabling the validation of any observation value against its DSD.
Dataflow definition
A dataflow describes what a particular dataset is about, how often it is updated over time by its maintaining agency,under what conditions it will be provided etc. The terminology is a bit confusing: You cannot actually obtain a dataflowfrom an SDMX web service. Rather, you can request one or more dataflow definitions describing a flow of data overtime. The dataflow definition and the artefacts to which it refers give you all the information you need to exploit thedatasets you can request using the dataflow’s ID.
A DataFlowDefinition is a class that describes a dataflow. A DataFlowDefinition has a unique identifier, a human-readable name and potentially a more detailed description. Both may be multi-lingual. The dataflow’s ID is used toquery the dataset it describes. The dataflow also features a reference to the DSD which structures the datasets availableunder this dataflow ID. For instance, in the frontpage example we used the dataflow ID ‘une_rt_a’.
Constraints
There are two types of constraints:
A content-constraint is a mechanism to express the fact that datasets of a given dataflow only comprise columns fora subset of values from the code-lists representing dimension values. For example, the datastructure definition for adataflow on exchange rates references tha codelist of all country codes in the world, whereas the datasets providedunder this dataflow only covers the ten largest currencies. These can be enumerated by a content-constraint attachedto the dataflow definition. Content-constraints can be used to validate dimension names and values (a.k.a. keys) whenrequesting datasets selecting columns of interest.
An attachment-constraint describes to which parts of a dataset (column/series, group of series, observation, the entiredataset) certain attributes may be attached. Attachment-constraints are not supported by pandaSDMX as this featureis needed only for dataset generation. However, pandaSDMX does support attributes in the information model andwhen exporting datasets to pandas.
Category schemes and categorisations
Categories serve to classify or categorise things like dataflows, e.g., by subject matter. Multiple categories may belongto a container called CategorySchemes.
A Categorisation links the thing to be categorised, e.g., a DataFlowDefinition, to a Category.
4.4. A very short introduction to SDMX 13
pandaSDMX Documentation, Release 0.3.0
Class hierarchy
The SDMX information model defines a number of base classes from which concrete classes such as DataFlowDefi-nition or DataStructureDefinition inherit. E.g., DataFlowDefinition inherits from MaintainableArtefact attributes indi-cating the maintaining agency. MaintainableArtefact inherits from VersionableArtefact, which, in turn, inherits fromIdentifiableArtefact which inherits from AnnotableArtefact and so forth. Hence, DataStructureDefinition may havea unique ID, a version, a natural language name in multiple languages, a description, and annotations. pandaSDMXtakes advantage from this class hierarchy.
4.4.3 XML-implementation of the information model
The SDMX standard defines an XML-based implementation of the information model called SDMXML. An SD-MXML document contains exactly one SDMX Message. There are several types of Message such as GenericDataMes-sage to represent a DataSet in generic form, i.e. containing all the information required to interpret it. Hence, datasetsin generic representation may be used without knowing the related DataStructureDefinition. The downside is thatgeneric dataset messages are much larger than their sister format StructureSpecificDataSet. pandaSDMX as of v0.2only supports generic dataset messages.
Another important SDMXML message type is StructureMessage which may contain artefacts such as DataStructure-Definitions, codelists, conceptschemes, categoryschemes and so forth.
SDMXML provides that each message contains a Header containing some metadata about the message. Finally,SDMXML messages may contain a Footer element. It provides information on any errors that have occurred on theserver side, e.g., if the requested dataset exceeds the size limit, or the server needs some time to make it availableunder a given link.
The test suite comes with a number of small SDMXML demo files. View them in your favorite XML editor to get adeeper understanding of the structure and content of various message types.
SDMX services provide XML schemas to validate a particular SDMXML file. However, pandaSDMX does not yetsupport validation.
4.4.4 SDMX web services
The SDMX standard defines both a REST and a SOAP web service API. pandaSDMX only supports the REST API.
The URL specifies the type, providing agency, and ID of the requested SDMX resource (dataflow, categoryscheme,data etc.). The query part of the URL (after the ‘?’) may be used to give optional query parameters. For instance, whenrequesting data, the scope of the dataset may be narrowed down by specifying a key to select only matching columns(e.g. on a particular country). The dimension names and values used to select the rows can be validated by checkingif they are contained in the relevant codelists referenced by the datastructure definition (see above), and any content-constraint attached to the dataflow definition for the queried dataset. Moreover, rows may be chosen by specifying astartperiod and endperiod for the time series. In addition, the query part may set a references parameter to instructthe SDMX server to return a number of other artefacts along with the resource actually requested. For example, aDataStructureDefinition contains references to codelists and conceptschemes (see above). If the ‘references’ parameteris set to ‘all’, these will be returned in the same StructureMessage. The next chapter contains some examples todemonstrate this mechanism. Further details can be found in the SDMX User Guide, and the Web Service Guidelines.
4.4.5 Further reading
• The SDMX standards and guidelines are the authoritative resource. This page is a must for anyone eager to divedeeper into SDMX. Start with the User Guide and the Information Model (Part 2 of the standard). The WebServices Guidelines contain instructive examples for typical queries.
14 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
• Eurostat SDMX page
• European Central Bank SDMX page It links to a range of study guides and helpful video tutorials.
• SDMXSource: - Java, .NET and ActionScript implementations of SDMX software, in part open source
4.5 Basic usage
4.5.1 Introductory remarks
This chapter illustrates the main steps of a typical workflow, namely:
1. retrieving relevant dataflows by category or from a complete list of dataflows,
2. exploring the data structure definition of the selected dataflow
3. selecting relevant series (columns) and a time-range (rows) from a dataset provided under the chosen dataflowand requesting the data via http
4. exploring the received data using the information model
5. writing a dataset or selected series thereof to a pandas DataFrame or Series
6. Reading and writing SDMX files
7. Handling errors
These steps share common tasks which flow from the architecture of pandaSDMX:
1. Call pandasdmx.api.Request.get() on a new or existing pandasdmx.api.Request instance toobtain an SDMX message from a web service or a file and load it into memory
2. Explore the pandasdmx.api.Response‘instance returned by:meth:‘pandasdmx.api.Request.get
• check for errors
• Access the SDMX message’s content through its msg attribute.
• write data to a pandas DataFrame or Series by Calling pandasdmx.api.Response.write(). Thisworks only for generic data messages.
4.5.2 Importing pandaSDMX
As explained in the preceeding section, we will need pandasdmx.api.Request all the time. Yet, wecan use the following shortcut to import it:
In [1]: from pandasdmx import Request
4.5.3 Connecting to an SDMX web service, caching
We instantiate pandasdmx.api.Request. The constructor accepts an optional agency ID as string. The list ofsupported agencies is shown in the error message if an invalid agency ID is passed:
In [2]: ecb = Request('ECB')
ecb is now configured so as to make requests to the European Central Bank. If you want to send requests to otheragencies, simply instantiate dedicated Request objects.
4.5. Basic usage 15
pandaSDMX Documentation, Release 0.3.0
Configuring the http connection
To pre-configure the HTTP connections to be established by a Request instance, you can pass all keyword argumentsconsumed by the underlying HTTP library requests (new in version 0.2.2). For a complete description of the optionssee the requests documentation. For example, a proxy server can be specified for subsequent requests like so:
In [3]: ecb_via_proxy = Request('ECB', proxies={'http': 'http://1.2.3.4:5678'})
HTTP request parameters are exposed through a dict. It may be modified between requests.
In [4]: ecb_via_proxy.client.configOut[4]: {'proxies': {'http': 'http://1.2.3.4:5678'}, 'stream': True, 'timeout': 30.1}
The Request.client attribute acts a bit like a requests.Session in that it conveniently stores the configu-ration for subsequent HTTP requests.
Caching received files
Since version 0.3.0, requests-cache is supported. To use it, pass an optional cache keyword argument to Request()constructor. If given, it must be a dict whose items will be passed to requests_cache.install_cache func-tion. Use it if you want to cache SDMX messages in databases such as MongoDB, Redis or SQLite. Read through therequests-cache‘ docs for further information.
Loading a file instead of requesting it via http
Any Request instance can load SDMX messages from local files. Issuing r = Request() without passing anyagency ID instantiates a Request object not tied to any agency. It may only be used to load SDMX messages fromfiles, unless a pre-fabricated URL is passed to pandasdmx.api.Request.get().
4.5.4 Finding dataflows
Note: Unlike the ECB, Eurostat, and probably other data providers do not support categories to facilitate data retrieval.Yet, it is recommended to read the following section as it explains some key concepts of the information model.
Getting the categorisation scheme
We can search the list of dataflows by category:. To do this, we request the category scheme from the ECB’s SDMXservice and explore the response like so:
In [5]: cat_resp = ecb.get(resource_type = 'categoryscheme')
In [6]: type(cat_resp)Out[6]: pandasdmx.api.Response
In [7]: cat_msg = cat_resp.msg
In [8]: type(cat_msg)Out[8]: pandasdmx.model.StructureMessage
In [9]: cat_header = cat_msg.header\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------IndexError Traceback (most recent call last)
16 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
<ipython-input-9-a9ddadaeaf54> in <module>()----> 1 cat_header = cat_msg.header
/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/model.py in header(self)46 @property47 def header(self):
---> 48 return self._reader.read_instance(Header, self)4950
/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/reader/sdmxml.py in read_instance(self, cls, sdmxobj, offset, first_only)186 if result:187 if first_only:
--> 188 return cls(self, result[0])189 else:190 return [cls(self, i) for i in result]
/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/model.py in __init__(self, *args, **kwargs)89 # Set additional attributes present in DataSet messages90 for name in ['structured_by', 'dim_at_obs']:
---> 91 value = self._reader.read_as_str(name, self)92 if value:93 setattr(self, name, value)
/home/docs/checkouts/readthedocs.org/user_builds/pandasdmx/envs/latest/lib/python3.4/site-packages/pandaSDMX-0.3.0-py3.4.egg/pandasdmx/reader/sdmxml.py in read_as_str(self, name, sdmxobj, first_only)212 result = self._str2path[name](sdmxobj._elem)213 if first_only:
--> 214 return result[0]215 else:216 return result
IndexError: list index out of range
In [10]: type(cat_header)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-10-c30209e950aa> in <module>()----> 1 type(cat_header)
NameError: name 'cat_header' is not defined
In [11]: categorisations = cat_msg.categorisations
In [12]: type(categorisations)Out[12]: pandasdmx.model.Categorisations
The content of the SDMX message, its header and its payload are exposed as attributes. Try dir(cat_msg) to findout that we have not only obtained the category scheme, but also the dataflows and categorisations. This is becausethe get method has set the references parameter to the appropriate default value. We can see this from the URL:
In [13]: cat_resp.urlOut[13]: 'http://sdw-wsrest.ecb.int/service/categoryscheme?references=all'
The HTTP headers returned by the SDMX server are availble as well (new in version 0.2.2):
In [14]: cat_resp.http_headersOut[14]: {'pragma': 'no-cache', 'vary': 'Accept, Accept-Encoding', 'content-length': '4965', 'connection': 'keep-alive', 'expires': 'Tue, 22 Sep 2015 20:34:39 GMT', 'content-encoding': 'gzip', 'date': 'Tue, 22 Sep 2015 20:34:39 GMT', 'cache-control': 'max-age=0, no-cache, no-store', 'server': 'Apache-Coyote/1.1', 'content-type': 'application/xml'}
Note that categorisations, categoryschemes, and many other artefacts from the SDMX information model are repre-
4.5. Basic usage 17
pandaSDMX Documentation, Release 0.3.0
sented by subclasses of dict.
In [15]: categorisations.__class__.__mro__Out[15]:(pandasdmx.model.Categorisations,pandasdmx.model.SDMXObject,pandasdmx.utils.DictLike,pandasdmx.utils.aadict.aadict,dict,object)
If dict keys are valid attribute names, you can use attribute syntax. This is thanks topandasdmx.utils.DictLike, a thin wrapper around dict that internally uses a patched third-partytool.
Likewise, cat_msg.categoryschemes is an instance of DictLike. This is because by calling ‘‘ecb.get‘‘ without specifying a resource_id, we instructed the SDMX service to return all available cate-gorisation schemes. The DictLike container for the received category schemes uses the ID attribute ofpandasdmx.model.CategoryScheme as keys. This level of generality is required to cater for situations inwhich more than one category scheme is returned. In our example, however, there is but one:
In [16]: cs = cat_msg.categoryschemes
In [17]: type(cs)Out[17]: pandasdmx.utils.DictLike
In [18]: list(cs.keys())\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[18]: ['MOBILE_NAVI']
pandasdmx.model.CategoryScheme inherits from pandasdmx.utils.DictLike as well. Itsvalues are pandasdmx.model.Category instances, its keyse are their id attributes. Note thatpandasdmx.model.DictLike has a ‘‘ aslist‘‘ method. It returns its values as a new list sorted by id.The sorting criterion may be overridden in subclasses. We shall see this when dealing with dimensions in apandasdmx.model.DataStructureDefinition where the dimensions are ordered by position.
We can explore our category scheme like so:
In [19]: cs0 = cs.aslist()[0]
In [20]: type(cs0)Out[20]: pandasdmx.model.CategoryScheme
# Print the number of categoriesIn [21]: len(cs0)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[21]: 11
# Print ID's of categoriesIn [22]: list(cs0.keys())\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[22]: ['04', '01', '09', '10', '05', '11', '07', '08', '06', '03', '02']
# English name of category '07'In [23]: cs0['07'].name.en\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[23]: 'Exchange rates'
Extracting the dataflows in a particular category
As we saw from the attributes of cat_msg, the SDMX message, we have already the categorisations athand. While in the SDMXML file categories are represented as a flat list, pandaSDMX groups them by
18 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
category and exposes them as a pandasdmx.utils.DictLike‘mapping each category ID to alist of :class:‘pandasdmx.model.Categorisation instances each of which links its categoryto a pandasdmx.model.DataFlowDefinition instance. Technically, these links are represented bypandasdmx.model.Reference instances whose id attribute enables us to access the dataflow definitions inthe selected category ‘07’. We can print the string representations of the dataflows in this category:
In [24]: cat07_l = cat_msg.categorisations['07']
In [25]: list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-25-eab370a959af> in <module>()----> 1 list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)
<ipython-input-25-eab370a959af> in <genexpr>(.0)----> 1 list(cat_msg.dataflows[i.artefact.id] for i in cat07_l)
AttributeError: 'StructureMessage' object has no attribute 'dataflows'
These are all dataflows offered by the ECB in the category on exchange rates.
Finding dataflows without using categories
In the previous section we have used categories to find relevant dataflows. However, in many situations there are nocategories to narrow down the result set. Here, pandasdmx.utils.DictLike.find() comes in handy:
In [26]: cat_msg.dataflows.find('rates')---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-26-27a10d15ea82> in <module>()----> 1 cat_msg.dataflows.find('rates')
AttributeError: 'StructureMessage' object has no attribute 'dataflows'
4.5.5 Extracting the data structure and data from a dataflow
In this section we will focus on a particular dataflow. We will use the ‘EXR’ dataflow from the European CentralBank. In the previous section we already obtained the dataflow definitions by requesting the categoryschemes withthe appropriate references. But this works only if the SDMX services supports category schemes. If not (and manyagencies don’t), we need to download the dataflow definitions explicitly by issuing:
>>> flows = ecb.get(resource_type = 'dataflow')
Dataflow definitions at a glance
A pandasdmx.model.DataFlowDefinition has an id , name , version and many other attributes inher-ited from various base classes. It is worthwhile to look at the method resolution order to see how it works. Many otherclasses from the model have similar base classes.
It is crucial to bear in mind two things:
• the id of a dataflow definition is also used to request data of this dataflow.
• the structure attribute of the dataflow definition. is a reference to the data structure definition describingdatasets of this dataflow.
4.5. Basic usage 19
pandaSDMX Documentation, Release 0.3.0
Getting the data structure definition (DSD)
We can extract the DSD’s ID and request the DSD. Then we will show some of its attributes.
Next, we extract the DSD’s ID and download the DSD together with all artefacts that it refers to and that refer to it.We set the params keyword argument explicitly to show how it works.
In [27]: dsd_id = cat_msg.dataflows.EXR.structure.id---------------------------------------------------------------------------AttributeError Traceback (most recent call last)<ipython-input-27-1936b3fd084d> in <module>()----> 1 dsd_id = cat_msg.dataflows.EXR.structure.id
AttributeError: 'StructureMessage' object has no attribute 'dataflows'
In [28]: dsd_id\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-28-040219d71e12> in <module>()----> 1 dsd_id
NameError: name 'dsd_id' is not defined
In [29]: refs = dict(references = 'all')
In [30]: dsd_resp = ecb.get(resource_type = 'datastructure', resource_id = dsd_id, params = refs)---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-30-f702833b1327> in <module>()----> 1 dsd_resp = ecb.get(resource_type = 'datastructure', resource_id = dsd_id, params = refs)
NameError: name 'dsd_id' is not defined
In [31]: dsd = dsd_resp.msg.datastructures[dsd_id]\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-31-aee41cc35170> in <module>()----> 1 dsd = dsd_resp.msg.datastructures[dsd_id]
NameError: name 'dsd_resp' is not defined
A DSD essentially defines two things:
• the dimensions of the datasets of this dataflow, i.e. the order and names of the dimensions and the permissiblevalues or the data type for each dimension, and
• the attributes, i.e. their names, permissible values and where each may be attached. There are four possibleattachment points:
– at the individual observation
– at series level
– at group level (i.e. a subset of series defind by dimension values)
– at dataset level.
Let’s look at the dimensions and for the ‘CURRENCY’ dimension also at the allowed values as enumerated in thereferenced code list:
20 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
In [32]: list(d.id for d in dsd.dimensions.aslist())---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-32-f01703696cdb> in <module>()----> 1 list(d.id for d in dsd.dimensions.aslist())
NameError: name 'dsd' is not defined
In [33]: currency_codelist = dsd.dimensions.CURRENCY.local_repr.enum\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-33-2d605e119e58> in <module>()----> 1 currency_codelist = dsd.dimensions.CURRENCY.local_repr.enum
NameError: name 'dsd' is not defined
In [34]: len(currency_codelist)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-34-19bb8d405398> in <module>()----> 1 len(currency_codelist)
NameError: name 'currency_codelist' is not defined
In [35]: currency_codelist.USD, currency_codelist.JPY\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\---------------------------------------------------------------------------NameError Traceback (most recent call last)<ipython-input-35-94e9ead82449> in <module>()----> 1 currency_codelist.USD, currency_codelist.JPY
NameError: name 'currency_codelist' is not defined
So there are six dimensions. Because we can only filter out sets of columns, we disregard ‘TIME_PERIOD’ as thisis the dimension at observation. The ‘CURRENCY’ dimension stands at position 2. Moreover, we are now sure that‘USD’ and ‘JPY’ are valid dimension values. We need this information to construct a filter for our dataset query whichshould be limited to the currencies we are interested in.
Note that pandasdmx.model.Scheme.aslist() sorts the dimension objects by their position attribute. Theorder matters when constructing filters for dataset queries (see below).
Attribute names and allowed values can be obtained in a similar fashion.
Note: Groups are not yet implemented in the DSD. But this is not a major problem as they are implemented forgeneric datasets. Thus, datasets should be rendered properly including all attributes and their attachment levels.
4.5.6 Working with datasets
Selecting and requesting data from a dataflow
Requesting a dataset is as easy as requesting a dataflow definition or any other SDMX artefact: Just call thepandasdmx.api.Request.get() method and pass it ‘data’ as the resource_type and the dataflow ID as re-source_id.
However, we only want to download those parts of the data we are interested in. Not only does this increase perfor-mance. Rather, some dataflows are really huge, and would exceed the server or client limits. The REST API of SDMXoffers two ways to narrow down a data request:
4.5. Basic usage 21
pandaSDMX Documentation, Release 0.3.0
• specifying dimension values which the series to be returned must match (“horizontal filter”) or
• limiting the time range or number of observations per series (“vertical filter”)
From the ECB’s dataflow on exchange rates, we specify the CURRENCY dimension to be either ‘USD’ or ‘JPY’.This can be done by passing a key keyword argument to the get method. It may either be a string (low-level API)or a dict. The dict form introduced in v0.3.0 is more convenient and pythonic as it allows pandaSDMX to infer thestring form from the dict. Its keys (= dimension values) and values (= dimension values) will be validated against thedatastructure definition as well as the content-constraint if available.
As of v0.3.0, content-constraints are implemented only in their CubeRegion flavor. KeyValueSets are not yet sup-ported. In this case, the provided demension values will be validated only against the code-list. It is thus not alwaysguaranteed that the dataset actually contains the desired data, e.g., because the country of interest does not deliver thedata to the SDMX data provider.
If we choose the string form of the key, it must consist of ‘.’-separated slots representing the dimensions. Values areoptional. As we saw in the previous section, the ECB’s dataflow for exchange rates has five relevant dimensions, the‘CURRENCY’ dimension being at position two. This yields the key ‘.USD+JPY...’. The ‘+’ can be read as an ‘OR’operator. The dict form is shown below.
Further, we will set the start period for the time series to 2014 to exclude any prior data from the request.
In [36]: data_resp = ecb.get(resource_type = 'data', resource_id = 'EXR', key={'CURRENCY': 'USD+JPY'}, params = {'startPeriod': '2014'})
In [37]: type(data_resp.msg)Out[37]: pandasdmx.model.GenericDataMessage
In [38]: data = data_resp.msg.data
In [39]: type(data)Out[39]: pandasdmx.model.GenericDataSet
Generic datasets
At present, pandaSDMX can only process generic datasets, i.e. datasets that encompass sufficient structural informa-tion to be interpreted without consulting the related DSD. However, as we saw, we need the DSD anyway to understandthe data structure, the meaning of dimension and attribute values, and to select series by specifying a valid key.
The pandasdmx.model.GenericDataSet has the following features:
dim_at_obs attribute showing which dimension is at observation level. For time series its value is either ‘TIME’or ‘TIME_PERIOD’. If it is ‘AllDimensions’, the dataset is said to be flat. In this case there are no series, just aflat list of observations.
series property returning an iterator over pandasdmx.model.Series instances
obs method returning an iterator over the observations. Only for flat datasets.
attributes namedtuple of attributes, if any, that are attached at dataset level
The pandasdmx.model.Series has the following features:
key nnamedtuple mapping dimension names to dimension values
obs method returning an iterator over observations within the series
attributes: namedtuple mapping any attribute names to values
groups list of pandasdmx.model.Group instances to which this series belongs. Note that groups are merelyattachment points for attributes.
22 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
In [40]: data.dim_at_obsOut[40]: 'TIME_PERIOD'
In [41]: series_l = list(data.series)
In [42]: len(series_l)Out[42]: 16
In [43]: series_l[5].key\\\\\\\\\\\\Out[43]: SeriesKey(FREQ='D', CURRENCY='USD', CURRENCY_DENOM='EUR', EXR_TYPE='SP00', EXR_SUFFIX='A')
In [44]: set(s.key.FREQ for s in data.series)\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\\Out[44]: {'A', 'D', 'H', 'M', 'Q'}
We see that this dataset comprises 16 time series of several different period lengths.
Writing to pandas
Selecting columns using the model API
As we want to write data to a pandas DataFrame rather than an iterator of pandas Series, we must not mix up thetime spans. Therefore, we single out the daily data first. The pandasdmx.api.Response.write() acceptsan optional iterable to select a subset of the series contained in the dataset. Thus we can now generate our pandasDataFrame from daily exchange rate data only:
In [45]: daily = (s for s in data.series if s.key.FREQ == 'D')
In [46]: cur_df = data_resp.write(daily)
In [47]: cur_df.shapeOut[47]: (440, 2)
In [48]: cur_df.tail()\\\\\\\\\\\\\\\\\\Out[48]:FREQ DCURRENCY JPY USDCURRENCY_DENOM EUR EUREXR_TYPE SP00 SP00EXR_SUFFIX A A2015-09-16 135.45 1.12282015-09-17 136.76 1.13122015-09-18 136.31 1.14192015-09-21 135.50 1.12502015-09-22 133.75 1.1155
Controlling the output
The docstring of pandasdmx.writer.data2pandas.Writer.write() explains a number of optional argu-ments to control whether or not another dataframe should be generated for the attributes, which attributes it shouldcontain, and, most importantly, if the resulting pandas Series should be concatenated to a single DataFrame at all(asframe = True is the default).
4.5. Basic usage 23
pandaSDMX Documentation, Release 0.3.0
Controlling index generation
Also, the write method provides the following parameters to increase performance for large datasets with regularindexes (e.g. monthly data:
• fromfreq: if True, the index will be extrapolated from the first date or period and the frequency. This is onlyrobust if the dataset has a uniform index, e.g. has no gaps like for daily trading data.
• reverse_obs:: if True, return observations in a series in reverse document order. This may be useful toestablish chronological order, in particular incombination with fromfreq. Default is False.
• If pandas raises parsing errors due to exotic date-time formats, set parse_time to False to obtain a stringindex rather than datetime index. Default is True.
4.5.7 Working with files
The pandasdmx.api.Request.getmethod accepts two optional keyword arguments tofile and fromfile.If a file path or, in case of fromfile, a file-like object is given, any SDMX message received from the server will bewritten to a file, or a file will be read instead of making a request to a remote server.
The file to be read may be a zip file (new in version 0.2.1). In this case, the SDMX message must be the first file in thearchive. The same works for zip files returned from an SDMX server. This happens, e.g., when Eurostat finds that therequested dataset has been too large. In this case the first request will yield a message with a footer containing a linkto a zip file to be made available after some time. The link may be extracted by issuing something like:
>>> resp.msg.footer.text[1]
and passed as url argument when calling get a second time to get the zipped data message.
Since version 0.2.1, this second request can be performed automatically through the get_footer_url parameter.It defaults to (30, 3) which means that three attempts will be made in 30 seconds intervals. This behavior is usefulwhen requesting large datasets from Eurostat. Deactivate it by setting get_footer_url to None.
4.5.8 Caching Response instances in memory
The ‘’get” API provides a rudimentary cache for Response instances. It is a simple dict mapping user-provided namesto the Response instances. If we want to cache a Response, we can provide a suitable name by passing the keywordargument ‘’cache” to the get method. Pre-existing items under the same key will be overwritten.
Note: Caching of http responses can also be achieved through ‘’requests-cache’. Activate the cache by instantiatingpandasdmx.api.Request passing a keyword argument cache. It must be a dict mapping config and othervalues.
4.5.9 Handling errors
The pandasdmx.api.Response instance generated after the response from the server has been received has astatus_code attribute. The SDMX web services guidelines explain the meaing of these codes. In addition, if theSDMX server has encountered an error, it may return a message which includes a footer containing explanatory notes.pandaSDMX exposes the content of a footer via a text attribute which is a list of strings.
Note: pandaSDMX raises only http errors with status code between 400 and 499. Codes >= 500 do not raise an erroras the SDMX web services guidelines define special meanings to those codes. The caller must therefore raise an errorif needed.
24 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
4.6 Advanced topics
4.6.1 Debugging the information model
The information model does not (yet) expose all attributes of SDMX messages. However, the underlying XML ele-ments are accessible from almost everywhere. This is thanks to the base class pandasdmx.model.SDMXObject.It injects two attributes: _elem and _reader which grant access to the XML element represented by the modelclass instance as well as the reader instance.
4.6.2 Extending pandaSDMX
pandaSDMX is now extensible by readers and writers. While the API needs a few refinements, it should be straightfor-ward to depart from pandasdmx.writer.data2pandas to develop writers for alternative output formats suchas spreadsheet, database, or web applications.
Similarly, a reader for the upcoming JSON-based SDMX format would be useful.
Interested developers should contact the author at [email protected].
4.7 pandasdmx
4.7.1 pandasdmx package
Subpackages
pandasdmx.reader package
Submodules
pandasdmx.reader.sdmxml module This module contains a reader for SDMXML v2.1.
class pandasdmx.reader.sdmxml.SDMXMLReader(request, **kwargs)Bases: pandasdmx.reader.BaseReader
Read SDMX-ML 2.1 and expose it as instances from pandasdmx.model
assignment_status(sdmxobj)
attr_relationship(sdmxobj)
categorisation_items(sdmxobj)
concept_id(sdmxobj)
d = {‘agencyID’: @agencyID, ‘ref_structure’: str:Structure, ‘id’: @id, ‘ref_version’: @version, ‘series_key_values_path’: gen:SeriesKey/gen:Value/@value, ‘structured_by’: mes:Structure/@structureID, ‘group_key_id_path’: gen:GroupKey/gen:Value/@id, ‘obs_key_id_path’: gen:ObsKey/gen:Value/@id, ‘ref_package’: @package, ‘uri’: @uri, ‘ref_class’: @class, ‘attr_id_path’: gen:Attributes/gen:Value/@id, ‘dim_at_obs’: //mes:Header/mes:Structure/@dimensionAtObservation, ‘group_key_values_path’: gen:GroupKey/gen:Value/@value, ‘constraint_attachment’: str:ConstraintAttachment, ‘annotationtype’: com:AnnotationType/text(), ‘attr_values_path’: gen:Attributes/gen:Value/@value, ‘urn’: @urn, ‘headerID’: mes:ID/text(), ‘value’: com:Value/text(), ‘obs_value_path’: gen:ObsValue/@value, ‘series_key_id_path’: gen:SeriesKey/gen:Value/@id, ‘obs_key_values_path’: gen:ObsKey/gen:Value/@value, ‘ref_target’: str:Target, ‘url’: @url, ‘generic_series_dim_path’: gen:ObsDimension/@value, ‘include’: @include, ‘generic_obs_path’: gen:Obs, ‘ref_source’: str:Source}
footer_code(sdmxobj)
footer_severity(sdmxobj)
footer_text(sdmxobj)return list of xml:lang attributes. If node has no attributes, assume that language is ‘en’.
generic_groups(sdmxobj)
generic_series(sdmxobj)
4.6. Advanced topics 25
pandaSDMX Documentation, Release 0.3.0
get_dataset(elem)
group_key(sdmxobj)
header_error(sdmxobj)
header_prepared(sdmxobj)
header_sender(sdmxobj)
initialize(source)
international_str(name, sdmxobj)return DictLike of xml:lang attributes. If node has no attributes, assume that language is ‘en’.
isfinal(sdmxobj)
iter_generic_obs(sdmxobj, with_value, with_attributes)
iter_generic_series_obs(sdmxobj, with_value, with_attributes, reverse_obs=False)
k = ‘datastructures’
key = ‘ref_source’
localrepr(sdmxobj)
path = ‘str:Source’
position(sdmxobj)
read_as_str(name, sdmxobj, first_only=True)
read_identifiables(name, sdmxobj)If sdmxobj inherits from dict: update it with modelized elements. These must be instances ofmodel.IdentifiableArtefact, i.e. have an ‘id’ attribute. This will be used as dict keys. If sdmxobj doesnot inherit from dict: return a new DictLike.
read_instance(cls, sdmxobj, offset=None, first_only=True)If cls in _cls2path and matches, return an instance of cls with the first XML element, or, if fest_only isFalse, a list of cls instances for all elements found, If no matches were found, return None.
read_one(name, sdmxobj)return model class instance of the first element in the result set of the xpath expression as defined in_model_map. If no elements are found, return None.
read_subclass_instance(target_cls, sdmxobj, offset=None, first_only=True)Iterate over model classes in _cls2path which are subclasses of ‘target_cls’ and instanciate the classeswhose xpath expression returns a non-empty result. Return a list of subclass instances.
series_attrib(sdmxobj)
series_key(sdmxobj)
v = (‘mes:Structures/str:DataStructures/str:DataStructure’, <class ‘pandasdmx.model.DataStructureDefinition’>)
Module contents This module contains the base class for readers.
class pandasdmx.reader.BaseReader(request, **kwargs)Bases: object
initialize(source)
26 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
pandasdmx.utils package
Submodules
pandasdmx.utils.aadict moduleclass pandasdmx.utils.aadict.aadict
Bases: dict
A dict subclass that allows attribute access to be synonymous with item access, e.g. mydict.attribute== mydict[’attribute’]. It also provides several other useful helper methods, such as pick() andomit().
static d2a(subject)
static d2ar(subject)
omit(*args)
pick(*args)
update(*args, **kw)
Module contents module pandasdmx.utils - helper classes and functions
class pandasdmx.utils.DictLikeBases: pandasdmx.utils.aadict.aadict
Thin wrapper around dict type
It allows attribute-like item access, has a find() method and inherits other useful features from aadict.
aslist()property returning values() as unordered list
find(search_str, by=’name’, language=’en’)Select values by attribute
Parameters
• searchstr (str) – the string to search for
• by (str) – the name of the attribute to search by, defaults to ‘name’ The specified at-tribute must be either a string or a dict mapping language codes to strings. Such attributesoccur, e.g. in pandasdmx.model.NameableArtefact which is a base class forpandasdmx.model.DataFlowDefinition and many others.
• language (str) – language code specifying the language of the text to be searched, de-faults to ‘en’
Returns items where value.<by> contains the search_str. International strings stored as dict withlanguage codes as keys are searched. Capitalization is ignored.
Return type DictLike
class pandasdmx.utils.NamedTupleFactoryBases: object
Wrap namedtuple function from the collections stdlib module to return a singleton if a nametuple with the samefield names has already been created.
__call__(name, fields)return a subclass of tuple instance as does namedtuple
4.7. pandasdmx 27
pandaSDMX Documentation, Release 0.3.0
cache = {(‘dim’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.SeriesObservation’>, (‘key’, ‘value’, ‘attrib’): <class ‘pandasdmx.utils.GenericObservation’>}
pandasdmx.writer package
Submodules
pandasdmx.writer.data2pandas module This module contains a writer class that writes a generic data message topandas dataframes or series.
class pandasdmx.writer.data2pandas.Writer(msg, **kwargs)Bases: pandasdmx.writer.BaseWriter
iter_pd_series(iter_series, dim_at_obs, dtype, attributes, reverse_obs, fromfreq, parse_time)
write(source=None, asframe=True, dtype=<class ‘numpy.float64’>, attributes=’‘, reverse_obs=False,fromfreq=False, parse_time=True)
Transfform a pandasdmx.model.DataMessage instance to a pandas DataFrame or iterator overpandas Series.
Parameters
• source (pandasdmx.model.DataMessage) – a pandasdmx.model.DataSet or iterator ofpandasdmx.model.Series
• asframe (bool) – if True, merge the series of values and/or attributes into one or twomulti-indexed pandas.DataFrame(s), otherwise return an iterator of pandas.Series. (de-fault: True)
• dtype (str, NP.dtype, None) – datatype for values. Defaults to NP.float64 if None, do notreturn the values of a series. In this case, attributes must not be an empty string so thatsome attribute is returned.
• attributes (str, None) – string determining which attributes, if any, should be returnedin separate series or a separate DataFrame. Allowed values: ‘’, ‘o’, ‘s’, ‘g’, ‘d’ or anycombination thereof such as ‘os’, ‘go’. Defaults to ‘osgd’. Where ‘o’, ‘s’, ‘g’, and ‘d’mean that attributes at observation, series, group and dataset level will be returned asmembers of per-observation dict-likes with attribute-like access.
• reverse_obs (bool) – if True, return observations in reverse order. Default: False
• fromfreq (bool) – if True, extrapolate time periods from the first item and FREQ di-mension. Default: False
• parse_time (bool) – if True (default), try to generate datetime index, provided thatdim_at_obs is ‘TIME’ or ‘TIME_PERIOD’. Otherwise, parse_time is ignored. IfFalse, always generate index of strings. Set it to False to increase performance and avoidparsing errors for exotic date-time formats unsupported by pandas.
Module contents This module contains the base class for writers.
class pandasdmx.writer.BaseWriter(msg, **kwargs)Bases: object
28 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
Submodules
pandasdmx.api module
This module defines two classes: pandasdmx.api.Request and pandasdmx.api.Response. Together,these form the high-level API of pandasdmx. Requesting data and metadata from an SDMX server requires a goodunderstanding of this API and a basic understanding of the SDMX web service guidelines only the chapters on RESTservices are relevant as pandasdmx does not support the SOAP interface.
class pandasdmx.api.Request(agency=’‘, writer=’pandasdmx.writer.data2pandas’, cache=None,**http_cfg)
Bases: object
Get SDMX data and metadata from remote servers or local files.
agency
clear_cache()
get(resource_type=’‘, resource_id=’‘, agency=’‘, key=’‘, params={}, fromfile=None, tofile=None,url=None, get_footer_url=(30, 3), memcache=None)get SDMX data or metadata and return it as a pandasdmx.api.Response instance.
While ‘get’ can load any SDMX file (also as zip-file) specified by ‘fromfile’, it can only construct URLs forthe SDMX service set for this instance. Hence, you have to instantiate a pandasdmx.api.Requestinstance for each data provider you want to access, or pass a pre-fabricated URL through the url param-eter.
Parameters
• resource_type (str) – the type of resource to be requested. Values must be one of theitems in Request._resources such as ‘data’, ‘dataflow’, ‘categoryscheme’ etc. It is usedfor URL construction, not to read the received SDMX file. Hence, if fromfile is given,resource_type may be ‘’. Defaults to ‘’.
• resource_id (str) – the id of the resource to be requested. It is used for URL construc-tion. Defaults to ‘’.
• agency (str) – ID of the agency providing the data or metadata. Used for URL con-struction only. It tells the SDMX web service which agency the requested informationoriginates from. Note that an SDMX service may provide information from multiple dataproviders. may be ‘’ if fromfile is given. Not to be confused with the agency ID passed to__init__() which specifies the SDMX web service to be accessed.
• key (str, dict) – select columns from a dataset by specifying dimension values. If type isstr, it must conform to the SDMX REST API, i.e. dot-separated dimension values. If ‘key’is of type ‘dict’, it must map dimension names to allowed dimension values. Two or morevalues can be separated by ‘+’ as in the str form. The DSD will be downloaded and theitems are validated against it before downloading the dataset.
• params (dict) – defines the query part of the URL. The SDMX web service guidelines(www.sdmx.org) explain the meaning of permissible parameters. It can be used to re-strict the time range of the data to be delivered (startperiod, endperiod), whether parents,siblings or descendants of the specified resource should be returned as well (e.g. ref-erences=’parentsandsiblings’). Sensible defaults are set automatically depending on thevalues of other args such as resource_type. Defaults to {}.
• fromfile (str) – path to the file to be loaded instead of accessing an SDMX web service.Defaults to None. If fromfile is given, args relating to URL construction will be ignored.
4.7. pandasdmx 29
pandaSDMX Documentation, Release 0.3.0
• tofile (str) – file path to write the received SDMX file on the fly. This is useful if youwant to load data offline using fromfile or if you want to open an SDMX file in an XMLeditor.
• url (str) – URL of the resource to download. If given, any other arguments such asresource_type or resource_id are ignored. Default is None.
• get_footer_url ((int, int)) – tuple of the form (seconds, number_of_attempts). De-termines the behavior in case the received SDMX message has a footer where one of itslines is a valid URL. get_footer_url defines how many attempts should be madeto request the resource at that URL after waiting so many seconds before each attempt.This behavior is useful when requesting large datasets from Eurostat. Other agencies donot seem to send such footers. Once an attempt to get the resource has been successful,the original message containing the footer is dismissed and the dataset is returned. Thetofile argument is propagated. Note that the written file may be a zip archive. pandaS-DMX handles zip archives since version 0.2.1. Defaults to (30, 3).
• memcache (str) – If given, return Response instance if already in self.cache(dict),
• download resource and cache Response instance. (otherwise) –
Returns instance containing the requested SDMX Message.
Return type pandasdmx.api.Response
get_reader()get a Reader instance. Called by get().
make_key(flow_id, key)Download the dataflow def. and DSD and validate key(dict) against it.
Return: key(str)
pandasdmx.model module
This module is part of the pandaSDMX package
SDMX 2.1 information model
3. 2014 Dr. Leo ([email protected])
class pandasdmx.model.AnnotableArtefact(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject
annotations
class pandasdmx.model.Annotation(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject
annotationtype
id
text
title
url
class pandasdmx.model.AttributeDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList
class pandasdmx.model.Categorisation(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact
30 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
class pandasdmx.model.Categorisations(*args, **kwargs)Bases: pandasdmx.model.SDMXObject, pandasdmx.utils.DictLike
class pandasdmx.model.Category(*args, **kwargs)Bases: pandasdmx.model.Item
class pandasdmx.model.CategoryScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme
class pandasdmx.model.Code(*args, **kwargs)Bases: pandasdmx.model.Item
class pandasdmx.model.Codelist(*args, **kwargs)Bases: pandasdmx.model.ItemScheme
class pandasdmx.model.Component(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact
concept
local_repr
class pandasdmx.model.ComponentList(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact, pandasdmx.model.Scheme
class pandasdmx.model.Concept(*args, **kwargs)Bases: pandasdmx.model.Item
class pandasdmx.model.ConceptScheme(*args, **kwargs)Bases: pandasdmx.model.ItemScheme
class pandasdmx.model.ConstrainableBases: object
class pandasdmx.model.Constraint(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact
class pandasdmx.model.ContentConstraint(*args, **kwargs)Bases: pandasdmx.model.Constraint
class pandasdmx.model.CubeRegion(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
class pandasdmx.model.DataAttribute(*args, **kwargs)Bases: pandasdmx.model.Component
related_to
usage_status
class pandasdmx.model.DataMessage(*args, **kwargs)Bases: pandasdmx.model.Message
class pandasdmx.model.DataSet(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
dim_at_obs
iter_groups
obs(with_values=True, with_attributes=True)return an iterator over observations in a flat dataset. An observation is represented as a namedtuple with 3fields (‘key’, ‘value’, ‘attrib’).
4.7. pandasdmx 31
pandaSDMX Documentation, Release 0.3.0
obs.key is a namedtuple of dimensions. Its field names represent dimension names, its values the dimensionvalues.
obs.value is a string that can in in most cases be interpreted as float64 obs.attrib is a namedtuple of attributenames and values.
with_values and with_attributes: If one or both of these flags is False, the respective value will be None.Use these flags to increase performance. The flags default to True.
seriesreturn an iterator over Series instances in this DataSet. Note that DataSets in flat format, i.e.header.dim_at_obs = “AllDimensions”, have no series. Use DataSet.obs() instead.
class pandasdmx.model.DataStructureDefinition(*args, **kwargs)Bases: pandasdmx.model.Structure
class pandasdmx.model.DataflowDefinition(*args, **kwargs)Bases: pandasdmx.model.StructureUsage, pandasdmx.model.Constrainable
class pandasdmx.model.Dimension(*args, **kwargs)Bases: pandasdmx.model.Component
class pandasdmx.model.DimensionDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList
class pandasdmx.model.Facet(facet_type=None, facet_value_type=’‘, itemscheme_facet=’‘, *args,**kwargs)
Bases: object
facet_type = {}
facet_value_type = (‘String’, ‘Big Integer’, ‘Integer’, ‘Long’, ‘Short’, ‘Double’, ‘Boolean’, ‘URI’, ‘DateTime’, ‘Time’, ‘GregorianYear’, ‘GregorianMonth’, ‘GregorianDate’, ‘Day’, ‘MonthDay’, ‘Duration’)
itemscheme_facet = ‘’
class pandasdmx.model.Footer(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject
code
severity
text
class pandasdmx.model.GenericDataMessage(*args, **kwargs)Bases: pandasdmx.model.DataMessage
class pandasdmx.model.GenericDataSet(*args, **kwargs)Bases: pandasdmx.model.DataSet
class pandasdmx.model.Group(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
class pandasdmx.model.Header(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
error
id
prepared
sender
structure
32 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
class pandasdmx.model.IdentifiableArtefact(*args, **kwargs)Bases: pandasdmx.model.AnnotableArtefact
uri
urn
class pandasdmx.model.Item(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact
children
parent
class pandasdmx.model.ItemScheme(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact, pandasdmx.model.Scheme
is_partial
class pandasdmx.model.KeyValue(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
class pandasdmx.model.MaintainableArtefact(*args, **kwargs)Bases: pandasdmx.model.VersionableArtefact
is_external_ref
is_final
maintainer
service_url
structure_url
class pandasdmx.model.MeasureDescriptor(*args, **kwargs)Bases: pandasdmx.model.ComponentList
class pandasdmx.model.MeasureDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension
class pandasdmx.model.Message(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
header
class pandasdmx.model.NameableArtefact(*args, **kwargs)Bases: pandasdmx.model.IdentifiableArtefact
description
name
class pandasdmx.model.PrimaryMeasure(*args, **kwargs)Bases: pandasdmx.model.Component
class pandasdmx.model.Ref(reader, elem, **kwargs)Bases: pandasdmx.model.SDMXObject
agency_id
id
package
ref_class
resolve()
4.7. pandasdmx 33
pandaSDMX Documentation, Release 0.3.0
version
class pandasdmx.model.ReportingYearStartDay(*args, **kwargs)Bases: pandasdmx.model.DataAttribute
class pandasdmx.model.Representation(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
class pandasdmx.model.SDMXObject(reader, elem, **kwargs)Bases: object
class pandasdmx.model.Scheme(*args, **kwargs)Bases: pandasdmx.utils.DictLike
aslist()
class pandasdmx.model.Series(*args, **kwargs)Bases: pandasdmx.model.SDMXObject
group_attribreturn a namedtuple containing all attributes attached to groups of which the given series is a member foreach group of which the series is a member
obs(with_values=True, with_attributes=True, reverse_obs=False)return an iterator over observations in a series. An observation is represented as a namedtuple with 3 fields(‘key’, ‘value’, ‘attrib’). obs.key is a namedtuple of dimensions, obs.value is a string value and obs.attribis a namedtuple of attributes. If with_values or with_attributes is False, the respective value is None. Usethese flags to increase performance. The flags default to True.
class pandasdmx.model.Structure(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact
class pandasdmx.model.StructureMessage(*args, **kwargs)Bases: pandasdmx.model.Message
class pandasdmx.model.StructureSpecificDataMessage(*args, **kwargs)Bases: pandasdmx.model.DataMessage
class pandasdmx.model.StructureSpecificDataSet(*args, **kwargs)Bases: pandasdmx.model.DataSet
class pandasdmx.model.StructureUsage(*args, **kwargs)Bases: pandasdmx.model.MaintainableArtefact
structure
class pandasdmx.model.TimeDimension(*args, **kwargs)Bases: pandasdmx.model.Dimension
class pandasdmx.model.VersionableArtefact(*args, **kwargs)Bases: pandasdmx.model.NameableArtefact
valid_from
valid_to
version
pandasdmx.remote module
This module is part of pandaSDMX. It contains a classes for http access.
34 Chapter 4. Table of contents
pandaSDMX Documentation, Release 0.3.0
class pandasdmx.remote.REST(cache, http_cfg)Bases: object
Query SDMX resources via REST or from a file
The constructor accepts arbitrary keyword arguments that will be passed to the requests.get function on eachcall. This makes the REST class somewhat similar to a requests.Session. E.g., proxies or authorisation dataneeds only be provided once. The keyword arguments are stored in self.config. Modify this dict to issue thenext ‘get’ request with changed arguments.
get(url, fromfile=None, params={})Get SDMX message from REST service or local file
Parameters
• url (str) – URL of the REST service without the query part If None, fromfile must be set.Default is None
• params (dict) – will be appended as query part to the URL after a ‘?’
• fromfile (str) – path to SDMX file containing an SDMX message. It will be passed onto the reader for parsing.
Returns
three objects:
0. file-like object containing the SDMX message
1. the complete URL, if any, including the query part constructed from params
2. the status code
Return type tuple
Raises HTTPError if SDMX service responded with –
status code 401. Otherwise, the status code is returned
max_size = 16777216upper bound for in-memory temp file. Larger files will be spooled from disc
request(url, params={})Retrieve SDMX messages. If needed, override in subclasses to support other data providers.
Parameters url (str) – The URL of the message.
Returns the xml data as file-like object
Module contents
pandaSDMX - a Python package for SDMX - Statistical Data and Metadata eXchange
4.8 Contributing
Contributions such as bug reports or pull requests and any other user feedback are much appreciated. Developmenttakes place on github. There is also a low traffic Mailing list.
4.8. Contributing 35
pandaSDMX Documentation, Release 0.3.0
4.9 License
Notwithstanding other licenses applicable to any third-party software included in this package, pandaSDMX is li-censed under the Apache 2.0 license, a copy of which is included in the source distribution.
Copyright 2014, 2015 Dr. Leo <fhaxbox66qgmail.com>, All Rights Reserved.
36 Chapter 4. Table of contents
CHAPTER 5
Indices and tables
• genindex
• modindex
• search
37
pandaSDMX Documentation, Release 0.3.0
38 Chapter 5. Indices and tables
Python Module Index
ppandasdmx, 35pandasdmx.api, 29pandasdmx.model, 30pandasdmx.reader, 26pandasdmx.reader.sdmxml, 25pandasdmx.remote, 34pandasdmx.utils, 27pandasdmx.utils.aadict, 27pandasdmx.writer, 28pandasdmx.writer.data2pandas, 28
39
pandaSDMX Documentation, Release 0.3.0
40 Python Module Index
Index
Symbols__call__() (pandasdmx.utils.NamedTupleFactory
method), 27
Aaadict (class in pandasdmx.utils.aadict), 27agency (pandasdmx.api.Request attribute), 29agency_id (pandasdmx.model.Ref attribute), 33AnnotableArtefact, 14AnnotableArtefact (class in pandasdmx.model), 30Annotation (class in pandasdmx.model), 30annotations (pandasdmx.model.AnnotableArtefact
attribute), 30annotationtype (pandasdmx.model.Annotation attribute),
30aslist() (pandasdmx.model.Scheme method), 34aslist() (pandasdmx.utils.DictLike method), 27assignment_status() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 25
attachment-constraint, 13attr_relationship() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 25
AttributeDescriptor (class in pandasdmx.model), 30attributes, 12
BBaseReader (class in pandasdmx.reader), 26BaseWriter (class in pandasdmx.writer), 28
Ccache (pandasdmx.utils.NamedTupleFactory attribute),
27Categorisation, 13Categorisation (class in pandasdmx.model), 30categorisation_items() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 25
Categorisations (class in pandasdmx.model), 30
Category, 13Category (class in pandasdmx.model), 31CategoryScheme (class in pandasdmx.model), 31CategorySchemes, 13children (pandasdmx.model.Item attribute), 33classes, 12clear_cache() (pandasdmx.api.Request method), 29Code (class in pandasdmx.model), 31code (pandasdmx.model.Footer attribute), 32code lists, 13Codelist (class in pandasdmx.model), 31Component (class in pandasdmx.model), 31ComponentList (class in pandasdmx.model), 31concept, 13Concept (class in pandasdmx.model), 31concept (pandasdmx.model.Component attribute), 31concept scheme, 13concept_id() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 25ConceptScheme (class in pandasdmx.model), 31Constrainable (class in pandasdmx.model), 31Constraint (class in pandasdmx.model), 31content-constraint, 13ContentConstraint (class in pandasdmx.model), 31cross-sectional, 12CubeRegion (class in pandasdmx.model), 31
Dd (pandasdmx.reader.sdmxml.SDMXMLReader at-
tribute), 25d2a() (pandasdmx.utils.aadict.aadict static method), 27d2ar() (pandasdmx.utils.aadict.aadict static method), 27DataAttribute (class in pandasdmx.model), 31dataflow, 13DataFlowDefinition, 13, 14DataflowDefinition (class in pandasdmx.model), 32DataMessage (class in pandasdmx.model), 31DataSet, 14dataset, 12DataSet (class in pandasdmx.model), 31DataStructureDefinition, 13, 14
41
pandaSDMX Documentation, Release 0.3.0
DataStructureDefinition (class in pandasdmx.model), 32description (pandasdmx.model.NameableArtefact at-
tribute), 33DictLike (class in pandasdmx.utils), 27dim_at_obs (pandasdmx.model.DataSet attribute), 31Dimension (class in pandasdmx.model), 32dimension at observation, 12DimensionDescriptor (class in pandasdmx.model), 32dimensions, 12
Eerror (pandasdmx.model.Header attribute), 32
FFacet (class in pandasdmx.model), 32facet_type (pandasdmx.model.Facet attribute), 32facet_value_type (pandasdmx.model.Facet attribute), 32facets, 13find() (pandasdmx.utils.DictLike method), 27flat datasets, 12Footer, 14Footer (class in pandasdmx.model), 32footer_code() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 25footer_severity() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 25
footer_text() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 25
Ggeneric_groups() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 25
generic_series() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 25
GenericDataMessage, 14GenericDataMessage (class in pandasdmx.model), 32GenericDataSet (class in pandasdmx.model), 32get() (pandasdmx.api.Request method), 29get() (pandasdmx.remote.REST method), 35get_dataset() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 25get_reader() (pandasdmx.api.Request method), 30group, 12Group (class in pandasdmx.model), 32group_attrib (pandasdmx.model.Series attribute), 34group_key() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26groups, 12
HHeader, 14
Header (class in pandasdmx.model), 32header (pandasdmx.model.Message attribute), 33header_error() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 26
header_prepared() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26
header_sender() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26
Iid (pandasdmx.model.Annotation attribute), 30id (pandasdmx.model.Header attribute), 32id (pandasdmx.model.Ref attribute), 33IdentifiableArtefact, 14IdentifiableArtefact (class in pandasdmx.model), 32information model, 12initialize() (pandasdmx.reader.BaseReader method), 26initialize() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26international_str() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 26
is_external_ref (pandasdmx.model.MaintainableArtefactattribute), 33
is_final (pandasdmx.model.MaintainableArtefact at-tribute), 33
is_partial (pandasdmx.model.ItemScheme attribute), 33isfinal() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26Item (class in pandasdmx.model), 33ItemScheme (class in pandasdmx.model), 33itemscheme_facet (pandasdmx.model.Facet attribute), 32iter_generic_obs() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 26
iter_generic_series_obs() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26
iter_groups (pandasdmx.model.DataSet attribute), 31iter_pd_series() (pandasdmx.writer.data2pandas.Writer
method), 28
Kk (pandasdmx.reader.sdmxml.SDMXMLReader at-
tribute), 26key (pandasdmx.reader.sdmxml.SDMXMLReader
attribute), 26KeyValue (class in pandasdmx.model), 33
Llocal_repr (pandasdmx.model.Component attribute), 31
42 Index
pandaSDMX Documentation, Release 0.3.0
localrepr() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 26
MMaintainableArtefact, 14MaintainableArtefact (class in pandasdmx.model), 33maintainer (pandasdmx.model.MaintainableArtefact at-
tribute), 33make_key() (pandasdmx.api.Request method), 30max_size (pandasdmx.remote.REST attribute), 35MeasureDescriptor (class in pandasdmx.model), 33MeasureDimension, 13MeasureDimension (class in pandasdmx.model), 33Message, 14Message (class in pandasdmx.model), 33
Nname (pandasdmx.model.NameableArtefact attribute), 33NameableArtefact (class in pandasdmx.model), 33NamedTupleFactory (class in pandasdmx.utils), 27
Oobs() (pandasdmx.model.DataSet method), 31obs() (pandasdmx.model.Series method), 34observations, 12omit() (pandasdmx.utils.aadict.aadict method), 27
Ppackage (pandasdmx.model.Ref attribute), 33pandasdmx (module), 35pandasdmx.api (module), 29pandasdmx.model (module), 30pandasdmx.reader (module), 26pandasdmx.reader.sdmxml (module), 25pandasdmx.remote (module), 34pandasdmx.utils (module), 27pandasdmx.utils.aadict (module), 27pandasdmx.writer (module), 28pandasdmx.writer.data2pandas (module), 28parent (pandasdmx.model.Item attribute), 33path (pandasdmx.reader.sdmxml.SDMXMLReader at-
tribute), 26pick() (pandasdmx.utils.aadict.aadict method), 27position() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26prepared (pandasdmx.model.Header attribute), 32PrimaryMeasure (class in pandasdmx.model), 33
Rread_as_str() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26read_identifiables() (pandas-
dmx.reader.sdmxml.SDMXMLReadermethod), 26
read_instance() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26
read_one() (pandasdmx.reader.sdmxml.SDMXMLReadermethod), 26
read_subclass_instance() (pandas-dmx.reader.sdmxml.SDMXMLReadermethod), 26
Ref (class in pandasdmx.model), 33ref_class (pandasdmx.model.Ref attribute), 33references, 14related_to (pandasdmx.model.DataAttribute attribute), 31ReportingYearStartDay (class in pandasdmx.model), 34Representation (class in pandasdmx.model), 34Request (class in pandasdmx.api), 29request() (pandasdmx.remote.REST method), 35resolve() (pandasdmx.model.Ref method), 33REST (class in pandasdmx.remote), 34
SScheme (class in pandasdmx.model), 34SDMXML, 14SDMXMLReader (class in pandasdmx.reader.sdmxml),
25SDMXObject (class in pandasdmx.model), 34sender (pandasdmx.model.Header attribute), 32series, 12Series (class in pandasdmx.model), 34series (pandasdmx.model.DataSet attribute), 32series_attrib() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26series_key() (pandasdmx.reader.sdmxml.SDMXMLReader
method), 26service_url (pandasdmx.model.MaintainableArtefact at-
tribute), 33severity (pandasdmx.model.Footer attribute), 32Structure (class in pandasdmx.model), 34structure (pandasdmx.model.Header attribute), 32structure (pandasdmx.model.StructureUsage attribute),
34structure_url (pandasdmx.model.MaintainableArtefact at-
tribute), 33StructureMessage, 14StructureMessage (class in pandasdmx.model), 34StructureSpecificDataMessage (class in pandas-
dmx.model), 34StructureSpecificDataSet, 14StructureSpecificDataSet (class in pandasdmx.model), 34StructureUsage (class in pandasdmx.model), 34
Ttext (pandasdmx.model.Annotation attribute), 30text (pandasdmx.model.Footer attribute), 32TimeDimension (class in pandasdmx.model), 34
Index 43
pandaSDMX Documentation, Release 0.3.0
title (pandasdmx.model.Annotation attribute), 30
Uupdate() (pandasdmx.utils.aadict.aadict method), 27uri (pandasdmx.model.IdentifiableArtefact attribute), 33url (pandasdmx.model.Annotation attribute), 30urn (pandasdmx.model.IdentifiableArtefact attribute), 33usage_status (pandasdmx.model.DataAttribute attribute),
31
Vv (pandasdmx.reader.sdmxml.SDMXMLReader at-
tribute), 26valid_from (pandasdmx.model.VersionableArtefact at-
tribute), 34valid_to (pandasdmx.model.VersionableArtefact at-
tribute), 34version (pandasdmx.model.Ref attribute), 33version (pandasdmx.model.VersionableArtefact at-
tribute), 34VersionableArtefact, 14VersionableArtefact (class in pandasdmx.model), 34
Wwrite() (pandasdmx.writer.data2pandas.Writer method),
28Writer (class in pandasdmx.writer.data2pandas), 28
44 Index