28
THE ROAD TO OPEN DATA ENLIGHTENMENT IS PAVED WITH NICE EXCUSES Toon Vanagt CEO data.be @Toon 3 rd Dec 2014

The Road to Open Data Enlightenment Is Paved With Nice Excuses

Embed Size (px)

Citation preview

THE ROAD TO OPEN DATA

ENLIGHTENMENT

IS PAVED WITH NICE EXCUSES

Toon Vanagt CEO data.be @Toon3rd Dec 2014

Official Belgian company info sources

Some data.be features

Autocomplete

Mashing up govsources

Data enrichment

Financial ratios

OCR in PDFs

Entity recognition

Alerts

Belgian Finance department and

Police Force are top users of

data.be

On the

internet you

must always

remember:

If something

of value is

free, you’re

the product!

Definitions

‘Open knowledge’ is any

content, information or

data that people are

free to use, re-use and

redistribute — without

any legal, technological

or social restriction.

(okfn.org)

‘Open data’ and ‘open content’ mean anyone can freely access, use, modify, and share for any purpose —subject, at most, to requirements that preserve provenance and openness. (opendefinition.org)

Open Data Enlightenment vs

BuzzThe Age of Enlightenment is the era from the1650s to the 1780s in which cultural and intellectual forces emphasized reason, analysis and individualism rather than traditional lines of authority….

The current open data philosophy redefines ‘authority’ too and appeals to analytical power of citizens, hackers, journalists and entrepreneurs to put data to good use.

Open data:

fosters “bottom up”-approach

stimulates to get more out of the data sets

delivers unexpected results & insights

Beware of fancy alchemy headlines:

Open Data Is The New Oil

Unlocking The Gold Mine

Turning Government Data Into Gold

€40 Billion boost to the EU's economy each year…

Excuse 1: But how will we make

money?

Does your government (department) really have to make money with open data?

Open data quickly evolved into primary state infrastructure & service.

Open data benefits society as a whole, so why tax usage separately?

If you still want or have to charge users, limit the cost in PSI-spirit to your marginal data delivery expense (extra bandwidth).

Who pays for open data gov

cost?

1. Government subsidizes underlying open data department costs as a primary service. Government covers the open data related cost as part of tis general expenses.

2. Government agencies charge each other for cost of data usage between federal, regional and city level departments

3. 11 open data revenue models for government agencies as authentic sources

3 options at input side

8 options at output side

Charging the INPUT side

Government makes the user pay for (legally required!) data mutations:

1. Creation of data sets (company creation, alarm system registration, publication of annual accounts,…)

2. Change of data: (address move, new stakeholder in company, name changes, corrections…)

3. Deletion of a dataset (inactive company, bankruptcy,…)

Downsides of INPUT based revenue

model

Introduces financial hurdlesRemoves incentives to keep data up to dateResults in lower data qualityRequires higher ‘enforcement’ costRequires cost to clean up outdated data sets

Charging the OUTPUT side1. User pays for individual consultation

2. Basic data are free, but user has to pay to consult extended data or meta data

3. User pays for use of structured data sets (csv, xml, batch, API,..)

4. User pays for real-time data sets, which reflect current state in authentic data source (daily update versus monthly update)

5. User pays for removed data (from archive) or for change log (historic overview)

6. Users pays to Service Level Agreement (eg guaranteed bandwidth or outside business hours)

7. User pays for monitoring keywords (or events) in (or about) certain data sets to receive alerts (push notifications, e-mails, SMS,…)

8. User pays for custom bench marking, segmentations, ratios or advanced filtering options

Downsides of OUTPUT based revenue

model

Financial hurdle for ‘newcomers’

Reduces innovation and consolidates ‘status-quo’

Inequality (more for those who can pay, higher service through faster access, better informed)

Results in limited usage and applications

Requires costs for billing & payment system with back office operations

Belgian example 1: Official State

Gazette / Belgisch Staatsblad /

Moniteur

Input based:

1. Creation of data sets (company creation, publication of annual accounts,…)

2. Change of data: (address move, name changes, capital changes, new stakeholders…)

Belgian example 2:

National Bank Balance sheets

Input

Pay for publication of annual accounts (274 EUR for BVBA/SPRL = limited liability company)

Output

User pays for use of structured data sets via a webservice (roughly between 1.850 EUR and 15.000 EUR per year).

User pays for old archived data sets which are no longer shown on the National Bank’s website

User pays for custom industry bench marking and ratios of competitors, customers or prospects (but one self-owned company benchmarking remains free)

Belgian example 3:

Crossroads bank for enterprises

Input

Creation of data sets

Change of data, such as address move or registering extra business entity,…

Output

User pays for use of structured data sets (copy of public part of database with names of company stakeholders and self employed persons at 75.000 EUR/year

User pays for real-time data sets, which reflect current state in authentic data source (daily update versus monthly update) via API (2.000 API request for 50 EUR in prepaid balance)

User pays for removed data for change log (historic overview)

Users pays to Service Level Agreement (eg guaranteed bandwidth or outside business hours)

Excuse 1:

Avoid conflict of interest for gov

agencies

Battle for budget: creates competition between government agencies

Inequality in support services and quality between paying and non-paying customers or agencies

Battle to secure authentic source as single gatekeeper and extend reach

Creates competition with private sector. Due to government agencies acting as commercial data brokers selling whole sale personal contact details to intermediates

Excuse 2:

Our data quality is too low to release

Open Data is not your real challenge, you have much bigger data quality issues…

Accuracy: is the data correctly representing the real-world entity or event?

Completeness: Does the data include all data items representing the entity or event?

Conformance: Is the data following accepted standards?

Consistency: Is the data not containing contradictions?

Credibility: Is the data based on trustworthy sources?

Processability: Is the data machine-readable?

Relevance : Does the data include an appropriate amount of data?

Timeliness: Is the data representing the actual situation and is it published soon enough?

Excuse 3: Yes, the data are open but

the process and partner chain is

not…

Document data process

partners

Describe steps in

information chain upward

of your authentic source

(data.be had to reverse

engineer processes)

Excuse 4: We think we might have

some privacy sensitive data

elements…

Keep the lawyers out of your open data project if you want to make a fast start

It’s complicated

It’s Personal

Privacy concept evolves over time and is culturally defined

Many grey zones

Don’t forget to try to anonymise your unstructured data too… accidents will happen

We can technologically do much more than we are permitted to culturally, morally or legally…

Beware that very few data points are needed to identify a person in this big data era. Eloquently phrased by Jonathan Mayer: “The idea of personally identifiable information not being identifiable is completely laughable in computer-science circles”.

Excuse 5: On second thought, we’re

not that open…

Availability: Can the data be accessed now and over time?

Be consistent and offer long term commitments and stable data set formats (integration mapping)

Data.be received a ‘Cease & Desist’ after a government hackathon: “Our government website is the only authentic source for air quality measurement. Stop using our data immediately or …”

Excuse 6: We opened the data in a

layer on our WMS…

Web Map Service (WMS) is a standard

protocol for serving geo-referenced map

images over the Internet that are generated by

a map server using data from a GIS database.

It is very hard to share the layer data…in other

applications

Next frontiers for Open Data

Linked & graph data

Metadata

Unstructured data

Structured feedback loops

Barriers to open data reuse

© 2013 European Commission training manual

Gatekeepers to the rescue

Don’t just ‘input’ the data which are presented

Inform general public on long term use of their ‘public’ data.

Once online, always online…

Evangelise the use of open data inside and outside your organisation

Open up your organisation

Invite a data scientist to work. Share insights internally, learn, optimize quality of data sets

Be open about quality and refresh rates

Specify the license under which the data may be re-used.

Provide a feedback loop (now data.be often is feedback for outdated company data…)

Maintenance of metadata and data is critical!

Toon Vanagt CEO [email protected]

@Toon

THANK YOU3rd Dec 2014 #OUP14

Opening up conference in Brussels

Picture copyright & attribution

The brick laying machine pictures can be found at Tiger Stone:http://www.tiger-stone.nl/index.php?option=com_content&view=article&id=47&Itemid=55

Keep calm cup: http://www.keepcalm-o-matic.co.uk/product/mug/keep-calm-and-open-up-67/

Storify with pictures of opening-up.eu event: https://storify.com/openingup_eu/opening-up-final-conference-1