29
Hamish James Statistics New Zealand Open data and data curation

Hamish James Statistics New Zealand Open data and data curation

Embed Size (px)

Citation preview

Hamish JamesStatistics New Zealand

Open data and data curation

Outline

1. Setting the scene

2. Open data

3. How open data and data curation are related

information

structured

digital analogue

unstructured

Quick definitions

data

open datadata curation

Defining data

Data consists of sets of structured values that can be organised, analysed and manipulated by a software application or some other means of calculation. This includes data collected directly through surveys and administrative systems, as well as data created or compiled by aggregating or reanalysing other sources. A defining characteristic of data is that it is machine-readable.

Open data, data curation

Open data is a philosophy based on the idea that that data is more valuable if more people can use it, and that technology has made the cost of sharing data negligble

Data curation is a field of research and work focusing on the long-term management of data, built on the argument that the opportunity cost of losing data is high

Open data highlights benefits Data curation worries about costs

data knowledge value

Focus of open data activities

• Data collected and held by governments

• Data collected or generated through publically funded research

• http://wiki.opengovdata.org/index.php?title=OpenDataPrinciples

Reasons to make data open

• The underlying purposes of making publically funded data more accessible are to:• inform decision making by government, businesses and

communities

• increase transparency and accountability in government decision making

• assist informed participation by the public in government decision making

• promote economic development through the innovate application of data collected for one purpose to other tasks

• gain greater value from research data

Barriers to reuse of government data

Agency culture (reluctance or hostility to data sharing)

Funding constraints Ensuring data confidentiality Shared ownership Poor dissemination practices

Open Government Data Principles

• Government data shall be considered open if it is made public in a way that complies with the principles below:

1. Complete. All public data is made available. Public data is data that is not subject to valid privacy, security or privilege limitations.

2. Primary. Data is as collected at the source, with the highest possible level of granularity, not in aggregate or modified forms.

3. Timely. Data is made available as quickly as necessary to preserve the value of the data.

4. Accessible. Data is available to the widest range of users for the widest range of purposes.

5. Machine processable. Data is reasonably structured to allow automated processing.

6. Non-discriminatory. Data is available to anyone, with no requirement of registration.

7. Non-proprietary. Data is available in a format over which no entity has exclusive control.

8. License-free. Data is not subject to any copyright, patent, trademark or trade secret regulation. Reasonable privacy, security and privilege restrictions may be allowed.

Characteristics of open data

Open data: Free and open access to the data Freedom to redistribute the data Freedom to reuse the data No restriction of the above based on who someone

is (e.g. their nationality) or their field of endeavour (e.g. commercial or non-commercial)

c.f. http://www.okfn.org/about/

Creative Commons

Attribution

Share-alike

No derivative works

Non-commercial

Creative Commons licence conditions

Linked data

• Linked data uses semantic web approaches (especially RDF) to describe data and make it accessible to machines – a web of linked data

• RDF ‘triples’ are used to describe things• Subject – predicate – object

• Hamish – is a – presenter

Linking Open Data dataset cloud

What is missing?

46

Of a

person

Census

2006

As at 7 March 2006

In years

Age

46

Data needs context

Examples

“Which town or city in the UK has the highest proportion of students?"

“Which town or city in the UK is home to one or more university campuses whose registered full or part time (non-distance) students divided by the local population gives the largest percentage?”

http://digitalcuration.blogspot.com/2010/03/linked-data-and-reality.html

render explain

re/use

Documentation:• Standards• Meaning• Interpretation

Technology:• Hardware• Formats• Software

data knowledge value

Technology to render data

Documentation to explain

What is missing? Context

• Data is not self-describing

• Who provides the description?

• What does it cost to provide the description?

• How much of the description is held as tacit knowledge?• Expert’s personal knowledge

• Rules and meaning encoded into the data and software

Data curation

• Data curation involves:• Data management

• Adding value to data

• Data sharing for re-use

• Data preservation for later re-use

http://www.dcc.ac.uk/news/what-makes-data-curation

= open data = data curation

Digital Curation Centre

DDI Alliance

Open data brings benefits and risks

open data

more users

highlights data

curation failures

justifies data

curation costs

pressure for more

user support

expands expert

community

increases risk of poor

analysis

Complementary ideas

• Actively curated data will:• Remain technologically accessible

• Be easier to understand (and therefore use)

• Data curation will benefit from data being made more open:• Data that is in active use tends to remain usable

• Widely used data is better understood than isolated data

Thank you

Hamish James

Manager, Information Management

[email protected]

04 931 4237