Upload
martin-necasky
View
1.207
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Slides for my talk at the Publication Office of the European Union - 13.6.2013
Citation preview
Linked Open Data for Public Contracts
Martin NečaskýFaculty of Mathematics and Physics, Charles University in Prague
Faculty of Informatics and Statistics, University of Economics in Prague
13.6.2013 – Publications Office of the European Union, Luxembourg
Outline
Introduction to Linked Data What benefits Linked Data bring for TED and
Public Procurement in EU? What does it mean for TED and others to
publish its data as Linked Data? What we have already done in LOD2 project?
Linked Data - Introduction
Web Applications Eco-system
Linked Data helps to create an eco-system of web applications which publish, enrich and consume data about things in one shared global data space
Shared Global Data Space on the Web (Web of Data)
App 1
App 2
App 3
App 4
App 5
App 4
Architecture of Web of Documents
Shared global space of documents
Built on top of several simple principles:
1. HTML as a format for publishing documents
2. URLs as unique global identifiers of documents
3. HTTP for localization and accessing documents by their URLs
4. hyperlinks between documents
There are two kinds of applications working in this space of documents:• web browsers (localizing and
browsing documents through hyperlinks)
• search engines (indexing and full text searching of documents)
HTML
HTML
HTML
HTML
Web browser
Search engine
HTTP
HTTP
Web of Documents
Current Web (of Documents) provides lot of data about Prague. Problems• Data about Prague encoded in documents
distributed across the Web• Documents intended for humans not
computers• Documents about Prague or related things
not linked• Therefore, computers not able to process
data about Prague published on the Web http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
Web of Documents
Try to search for this information on the current Web• Top 100 suppliers of Prague with
headquarters outside of Prague region.• Money spent in Prague for new children
playgrounds in the last 5 years per one child.
• Organizations in Prague funded by EU structural funds and their top 100 suppliers. http://monitor.statnipokladna.cz
Prague budget
http://registry.czso.cz
Basic info about Prague
http://www.praha.eu
Prague public contracts
http://www.czso.cz
Demography of Prague
http://www.risy.cz
EU funded projects in Prague
Linked Data
data published on the Web according to four simple principles (introduced by sir T. B. Lee)1. Use URIs as names for things2. Use HTTP URIs so that people can look up those
names.3. When someone looks up a URI, provide useful
information, using the standards (RDF, SPARQL)4. Include links to other URIs so that they can
discover more things.
Things as first-class citizens
ProjectCZ.2.16/2.1.00/22189
Prague City
Prague Council
Prague Demography
Prague Budget
ContractDIL/23/07/007302/2010
HTTP URIs for Things
ProjectCZ.2.16/2.1.00/22189
praha.eu (Prague)http://
praha.eu/contract/7302
http://praha.eu/council
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/prague/budget
http://mfcr.cz/prague
risy.cz (Regional Information Service)
http://risy.cz/location/prague
http://risy.cz/contract/22189-
01
http://risy.cz/project/22189
czso.cz (Czech Statistical Office)
http://registry.czso.cz/prague
http://czso.cz/prague
http://czso.cz/prague/
demogstat
Data about Things in RDF
Client
Playground RevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...
Playground Revitalization
28444000 CZK
dcterms:titlepc:contracting
Authority
pc:agreedPrice
gr:hasCurrencygr:hasCurrencyValue
31.8.2011
pc:estimatedEndDate
http://praha.eu/
contract/7302
http://praha.eu/
contract/7302
http://praha.eu/contract/7302/price
http://praha.eu/council
<http://www.praha.eu/contract/7302> dcterms:title "Playground Revitalization" ;pc:estimatedEndDate "31.8.2011" ;pc:agreedPrice <http://www.praha.eu/contract/7302/price> ;pc:contractingAuthority <http://www.praha.eu/council> .
<http://www.praha.eu/contract/7302/price>gr:hasCurrency "CZK" ;gr:hasCurrencyValue "28444000" .
Data about Things in RDF
Client
Playground RevitalizationAuthority: PragueDelivery date: 31.8.2011Price: 28 444 000 CZK...
http://praha.eu/
contract/7302
Vocabularies published RDF data would be hardly interpretable when
each publisher would use proprietary predicates therefore, standardized (or at least widely used)
predicates should have priority before proprietary ones e.g. Dublin Core, Good Relations, FOAF, schema.org, ... or more specific ones for public procurement
• e.g., Public Contracts Ontology (http://purl.org/procurement/public-contracts )
predicates are defined in so called vocabularies (or ontologies) note: ontology is a special case of vocabulary, it contains more detailed reasoning
rules which is out of scope of this lecture note: not only predicates but also classes (= types of things) are defined in
vocabularies/ontologies
Linking URIs of Related Things
praha.eu (Prague)http://
praha.eu/contract/7302
http://praha.eu/city
mfcr.cz (Ministry of Finance)
http://mfcr.cz/prague/budget
http://mfcr.cz/prague
risy.cz (Regional Information Service)
http://risy.cz/location/prague
http://risy.cz/contract/22189-
01
http://risy.cz/project/22189
czso.cz (Czech Statistical Office)
http://registry.czso.cz/prague
http://czso.cz/prague
http://czso.cz/prague/
demogstat
c: hasBeneficiary
a:fundedBy
b:hasBudgethttp://praha.eu/council
d:hasDemography
d:hasDemography
Linking URIs of Related Things
praha.eu (Prague)http://
praha.eu/contract/7302
mfcr.cz (Ministry of Finance)
http://mfcr.cz/prague/budget
http://mfcr.cz/prague
risy.cz (Regional Information Service)
http://risy.cz/contract/22189-
01
http://risy.cz/project/22189
czso.cz (Czech Statistical Office)
http://czso.cz/prague/
demogstat
c:hasBeneficiary
a:fundedBy
http://praha.eu/city
http://risy.cz/location/prague
http://registry.czso.cz/prague
http://czso.cz/prague
http://praha.eu/council
owl:sameAs
owl:sameAs
b:hasBudget
Linked Data for TED – What are the benefits?
Benefits of Publishing TED as LD Problem: It is hard to get a unified view of a chosen thing
(i.e. contracting authority, supplier, contract, contract notice, tender, ...) from TED. The data about the thing is distributed across several contract
notices. LD solution: Each thing has a unique TED HTTP URI which
can be used by third-party applications to get all TED data for this thing. Data is represented as RDF graph respecting openly defined
vocabularies shared across developers and communities. Data include links to URIs of other things on TED. TED can flexibly and continuously extend the data provided for
the thing.
Benefits of Publishing TED as LD
User
Web application
?detail=http://ted.eu/contract/CZ/54782145
TED LD Service
http://ted.eu/contract/CZ/54782145
http://praha.eu/contract/
7302
http://praha.eu/contract/7302/
price
http://praha.eu/
council
TED easily assembles data related to the requested contract and returns it as an interconnected graph to the requesting web application.
Benefits of Publishing TED as LD
User
Web application
TED LD Service
http://ted.eu/org/CZ/00064581
http://praha.eu/contract/
7302
http://praha.eu/contract/7302/
price
http://praha.eu/
council
TED easily assembles data related to the requested authority and returns it as an interconnected graph to the requesting web application.
click
?detail=http://ted.eu/org/CZ/00064581
Problems with HTTP URIs Today, public procurement data are collected from contracting
authorities in a form of contract notices (calls for tender, contract award notices, etc.)
Notices usually do not contain explicit identifiers of contracting authorities and suppliers. These organizations are usually identified in the notices only by names
and addresses which are often misspelled and incorrect. Therefore, if we create an HTTP URI for an organization from one
notice, it is often very hard to recognize whether an organization from another notice is the same one or not.
Therefore, a serious questions arise – how the HTTP URI of an organization (contracting authority/supplier) should look like? How an organization should be identified in a notice so that we are able to unambiguously recognize it?
Problems with HTTP URIs There are two possible solutions to this question, both
are very simple from the technical point of view but very complex from the political point of view (enforcement in all EU countries)
1st solution: Some countries define unique mandatory identifiers for
organizations (for both, private companies as well as public institutions).
These identifiers should be present in the notices to identify contracting authorities and suppliers.
We can then use them to recognize organizations and associate them with corresponding HTTP URIs.
Problems with HTTP URIs 2nd solution:
Each organization involved in public procurement should have own public profile on the Web with own HTTP URI.
The public profile can be a simple HTML web page which also contains few data encoded in RDF (technically, it is very simple)
The public profile can be a part of the official web site of the organization, e.g. http://praha.eu/public-profile
Or, the organization can use services which can manage public web profiles of organizations. There already exist such services, e.g. http://opencorporates.org
• This service already contains profiles of many organizations, it associates them with HTTP URIs and provides basic RDF data about them (title, address, etc.)
The HTTP URI of the profile should become a part of the notice. This solution also saves some time and money because details about the
organization do not have to be repeated in each notice – each notice is linked to the HTTP URI where the information is present.
• Yes, if you think about the problem that there is only actual information on the profile which can be different than the information which was valid before for some earlier notices, then you are right. But this can be technically solved (e.g. TED and other authorities responsible for collecting public procurement data can back-up those information, etc.).
Problems with HTTP URIs 2nd solution:
praha.eu (Prague)
http://praha.eu/public-
profile
company-a.cz (Company A)
http://company-a.cz/public-profile
opencorporates.org
http://opencorporates.org/company-b/public-profile
http://opencorporates.org/company-c/public-profile
...
http://ted.europa.eu/notice/574832
http://ted.europa.eu/notice/575833
pc:contractingAuthority
pc:contractingAuthority
pc:supplier
pc:supplier
Benefits of Publishing TED as LD Problem: It is hard to find information related to public
contracts, contracting authorities and suppliers which is published outside of TED somewhere else on the Web, e.g., data from the post-award phase public contracts not published on TED profiles of contracting authorities and suppliers
LD solution: TED publishes the basic data infrastructure of HTTP URIs of public contracts, contracting authorities, suppliers, etc. Others can enrich this basic infrastructure with their own data. The enriched TED datasets can be consumed by third-party
applications and even by TED itself.
Benefits of Publishing TED as LD
Shared Global Data Space (Web)
TED Linked Data Basic Infrastructure
Publisher of profiles of CZ
suppliers
Publisher of post-award data of GE contracts
Suitable suppliers for a contract
?
Benefits of Publishing TED as LD
Public spending per inhabitant in 2010
Contracts similar to a contract
PC Filing Application
Public spending in Czech Republic "HeatMap"
Application
Benefits of Publishing TED as LD Problem: Other authorities must copy TED data to
their databases if they want to use TED data (which includes also republishing TED data). Repeated work for building such databases and their
maintenance is paid from public budgets (!) LD solution: Other public authorities link their primary
data (represented as Linked Data, not necessarily published) to TED without the need to copy, integrate and maintain this data in their database. Anyone who works with the data of such other public
authority can get the data directly from TED if necessary.
Benefits of Publishing TED as LD Our planned experiment in Czech Republic in cooperation with Czech Ministry
of Finance (MoF) and data about public contracts
CZ Public Budgets (MoF)
NUTS&LAU CZ regions
CZ Public ContractsDemography (Czech Stat. Office)
Public contracts in Prague with Prague budget and demography statistics?
To show that institutions can share data by linking the data instead of copying them
Benefits for StakeholdersContracting Authorities and Suppliers
Unified global data space covering various aspects of public procurement across all EU countries.
contracting authorities They can find similar contracts to their contracts. They can group their calls with other authorities to achieve better offers from
suppliers. They can verify their requirements against requirements of other buyers to
increase quality and completeness of their requirements and ask for better prices. They can search for suitable suppliers who realized similar contracts successfully
in the past. suppliers
They can get necessary information about opened calls for tenders. They can better inform potential customers about their offers. They can analyze previous contracts in their market to better target their tenders
and improve the quality of the services they offer. They can group with other suppliers with complementary offers for joint
tendering.
Benefits for StakeholdersEU and Citizens
EU saves money Only basic infrastructure is build and primary data is published
• Related data is published and linked by third-parties There is no need to build and pay for complex applications and services
• These will be built by third-parties not only for citizens but also for contracting authorities and suppliers solely on the base of their demand.
There is no need to duplicate data in different public administration services and applications
• Data is linked instead of copied EU supports building a common market and interoperability (ISA) EU supports transparency
Citizens can more easily monitor what public administrations buy in their city/country, from who and for how much
They can also more easily compare the purchases of their city/country with other cities/countries.
Linked Data for TED – What needs to be done to adopt LD principles?
LOD lifecycle
Interlinking, fusing
Evolution, repair
Quality analysis
Evolution, repair
Search, browsing,
exploration
Extraction
Storage, querying
Manual revision, authoring
LOD lifecycle supported by LOD2 Stack
http://stack.lod2.eu
Public Procurement and LOD2 Project vocabulary for publishing Public Contracts as Linked Data
combination of existing broadly adopted vocabularies and their extension for public procurement (GoodRelations, Payments Ontology, schema.org, Dublin Core, SKOS)
Public Contracts filing application web application for contracting authorities and suppliers It enables to publish data about public contracts as Linked
Data. Contracting authorities can search for similar contracts and
suitable suppliers. Experimental Linked Data from Czech Republic, Great
Britain and TED
Experimental Linked Data from Czech Republic, Great Britain and TED created as part of LOD2 project
CZPublic
Contracts
Common Procurement Vocabulary
CZ Business Entities
CZ Demography
Stats
CZPublic
Budgets
DBPedia
TED Public Contracts and Organizations
SDMX
CZLAU Regions
NUTS Regions
(RAMON)
GB Public Contracts and Organizations
Products Ontology
Thank You for Your Attention