Upload
colin-hoover
View
213
Download
0
Embed Size (px)
Citation preview
The Value of a High Quality Data
Digital Library
Ross WilkinsonAustralian National Data ServiceSeoul, December, 2015
1
Outline The Value of Data The Value of a Data Library Trends The research data assets of Australia The challenges for Data Libraries The opportunities for Data Libraries Conclusions
2
A growing Seoul: What data is needed to
research the best forms of growth for Seoul?
Data will come from government, environmental monitors, public transport data, research into urban design…
….just as most cities in the world
The data needs integration, protection, reliability
The data will need to be accessed through a single location
….an urban digital library3
What data environment is needed for: Understanding where and how
to build in bushfire prone areas Understanding the largest living
thing in Australia – the Great Barrier Reef
The effective use of Australia’s soil
?????
4
Professor Peter Rathjen, VC of University of Tasmania:
“Why should Universities care about research data?”Reputation is very important to research institutions.Libraries can make a substantial contribution to that reputation.Libraries are known for their collections, so creating world class data collections can help a library build an institution’s reputation.
5
What’s going on? Data is no longer a by-product of research Data is valuable Data practice is changing in many research
disciplines Funders and Government want more from their
research investments So do research institutional leadership
6
Data Value Stronger research More efficient research Stronger partnerships More industry engagment – data as a trust builder
7
The Value of Open Data ReportThe analysis in the report suggests that the value of data in Australia’s public research is at least $1.9 billion per annum and possibly up to $6 billion per annum – at 2012-13 levels of expenditure and activity.It is more valuable if it is available through appropriate research data infrastructuree.g. users of the British Atmospheric Data Centre report an average of 56% of their time working with data – that data is open and with appropriate tools.
8
9
What if we could transform research effort..
By dramatically reducing the cost of gathering and publishing?? 10
Some Trends: Reproducible Science Open Science Open Data Data Citation Data Citation
Bibliometrics Data Journals
Data Repositories Trusted Data
Repositories FAIR Data Funded Fair Data
11
Australian Research Data Activity Data Policy Capturing data valuable over long periods in
Marine, Astronomy, Earth Sciences, Ecosystems …for a wide range of research purposes
Supporting the storage of data Supporting the management of data Supporting the enhancement of data Building Institutional Research Data Capacity
12
Research Data Policy ARC and NHMRC: Treat data as an asset Department of Environment: Requirement that
data is open, discoverable, and available Department of Education: The Australian Research
Data Infrastructure Strategy provides recommendations for coherent approach to research data and research data infrastructure
13
Integrated Marine Observing System IMOS is designed to be a fully-
integrated, national system, observing at ocean-basin and regional scales, and covering physical, chemical and biological variables.
The IMOS Ocean Portal allows scientists to discover and explore data streams coming from the Facilities - some in near-real time, and all as delayed-mode, quality-controlled data. These data streams, long time-series that are 'under construction', represent the actual research infrastructure being created and developed by IMOS.
14
Data is Transformative Governments are not investing in research data to
make life easier for researchers Investments in research data to enable societal
problems to be addressed This requires data to be in a form that allows a
wide variety of use
15
AURIN – Urban data infrastructure How can I increase the value of my suburban
property development? How do I make it more “liveable” to attract more
buyers? Integrate data from developers, local government,
state government, federal government, mapping data, roads data, public transport maps….
Apply University of Melbourne developed “walkability” index 16
How do you develop suburbs that work for residents, developers and local government?
Along the Maribyrnong River, 10 km from Melbourne’s CBD, 128 ha of government land is ripe for redevelopment
It could accommodate 3000 dwellings and offices for 3000 people Planning a sustainable, liveable community integrated into its urban surrounds demands
information on transport, health services, environment, housing prices, recreation facilities and more
This comes from Federal and State government agencies, local councils, utilities and private companies
For Maribyrnong, data and 80 tools to manage it are being made available through the Australian Urban Research Intelligence Network (AURIN) and the Australian National Data Service (ANDS)
New tools—such as employment opportunities and walkability—are being added Similar projects can facilitate development across Australia’s cities and towns
17
Australian National Data Service:
To make Australia’s research data assets more valuable for its researchers, research institutions and the nation
18
So we need to transform:Data that are:
UnmanagedDisconnectedInvisibleSingle use
To Structured Collections that are:ManagedConnectedFindableReusable
so that researchers can easily publish, discover, access and use research data.
Value
Research Data Australia
20
What worked well: Getting going Establish a “voice for data” Coherence of research data infrastructure Coordination of policy and infrastructure Establishing research institutions at the centre of research
data system Establishing a national system of infrastructure
complementing institutional and thematic infrastructure Establishing international cooperation
21
Major Open Data Program Connecting mining data, to
research techniques, to industry exploration
Connecting twitter data to Jakarta map to analytics for managing flooding
Collecting tropical data to institutional strategy
Collecting ancient DNA for forming international partnerships for new results
22
Achievements to Date: Australian Research Data Commons established 100,000 data collections are described and discoverable ANDS has formed partnerships with most Australian
universities and publicly funded research organisations Research Institutions have substantially greater research
data management capacity than 5 years ago Research data is on the agenda of DVC’s-R Jointly Australia has world leading research data
infrastructure Australia has a leading role in world research data
infrastructure through the Research Data Alliance23
Data Opportunities – and threats Data sharing is great for trust development Data openness challenges traditional business
models Data partners can be anywhere – EU is investing
€1.4B in open data to drive jobs and innovation
24
25
From G. Boulton
Royal Society publishes “Science as an open enterprise” – written by Geoffrey Boulton
Influential in EU/UK
26
FAIR Data – (FORCE 11)To be Findable:(meta)data are assigned a globally unique and eternally persistent identifier.data are described with rich metadata. (meta)data are registered or indexed in a searchable resource.metadata specify the data identifier.To be Accessible: (meta)data are retrievable by their identifier the protocol is open, free, and universally implementablethe protocol allows for an authentication and authorization procedure, where necessary.metadata are accessible, even when the data are no longer available.
To be Interoperable:(meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation(meta)data use vocabularies that follow FAIR principles. (meta)data include qualified references to other (meta)data.To be Re-usable:meta(data) have a plurality of accurate and relevant attributes.(meta)data are released with a clear and accessible data usage license(meta)data are associated with their provenance. (meta)data meet domain-relevant community standards.
27
EU Open Data “Pilot” 1.4B Euros as part of H2020 80% take up
28
Data citation Data that is used
should be cited – just as other work is cited
Provides appropriate credit
Enables reproduction
29
DataCite provides reliability
Agreed basic information: Creator (Publication year), Title, Publisher, Identifier
Suitably formatted DOI
Data citation works with.. ORCID – for people Crossref – for papers Fundref – for funders IGSN – for specimens … Can we measure the
value? Bibliometricians arise!
30
Connection is key And the connections
should be machine operable
Research is more valuable if it is more connected
Data Journals
Geoscience Data Journal (Wiley)
Scientific Data (Nature) Journal of Open
Archaeology Data (Ubiquity)
Biodiversity Data Journal (Pensoft)
A means of describing the data – its formation, properties, usage
Enables recognition of a contribution
Enhances usage of the data
Enables “traditional” bibliometrics
31
So data is more valuable if: It supports
Reproducible Science It supports Open
Science Is Open Is Citable Is published
Is reliably available Is available form a
reliable digital library Is FAIR It reliably uses the data
services that are discussed at ADLC 2015
32
Advertisement: Research Data Alliance
- You may agree that data preservation is important- You may agree that international agreements are important- Using the Research Data Alliance working groups is a good way of getting wider agreement for issues that are important to you
Data Libraries (repositories):
Provide: Data storage Metadata storage Data access methods Data management
software Data analysis services? Data processing
services?
But also: Integrated approach
to content and metadata
Policies, processes, services, and people
Overall commitment to the stewardship of digital materials
34
Trusted data repositories (libraries) Need for reliable data Trusted repositories:
Trusted Repositories Audit & Certification (TRAC) -ISO 16363 Data Seal of Approval e.g. Pacific and Regional Archive for
Digital Sources in Endangered Cultures (PARADISEC) Often required by publishers May be increasingly required (and funded) by
research funders
35
36
The Opportunity Fully integrated publication of all outputs of a
scholarly endeavour with rich connection FAIR data in a trusted repository Fully explorable scholarly journals Researchers get much better exposure of their
research The outcomes are defensible New research and partners become available
37
So that’s good… But a full function digital library has more to offer Where is the biggest saving in research? Where do the breakthroughs come from?
38
From a bioinformatician – Matjaz Hren Biggest waste of time in research are: Meetings – need ELN integration Data entry – need automated data and metadata
capture tools Data search – need rich data catalogues
39
Dan Steinberg, Salford Systems
In community of data miners and statistical modelers
Most working at major corporations supporting extensive analytical projects
Spend 80% of their effort in manipulating the data so that they can analyze it
40
Ashley Buckle, Protein Chrystalographer Required to prepare rich descriptions of data for
associated publication Took he and a librarian a week of effort A tool that automated the capture of data from
the synchrotron, migrated it, added metadata, added project information, added DOI
Takes 15 minutes to prepare data
41
Long Term Ecological Research NetworkFrom the report at http://knowledgeinfrastructures.org:
"Our call for methodological and collaborative innovation is best explained via an analogy in the natural sciences. Twenty years ago, the average ecologist worked on a patch of land no larger than a hectare, typically for a few months or a year, gathered data over a thirty-year career, published results, and then gradually lost the data. With the creation of the Long Term Ecological Research Network (LTER), the National Science Foundation began to change the nature of research. Today, at a number of sites nationally and in consonance with international projects, ecologists are able to look beyond the scale of a field and timeframe of a career: they now have the prospect of studying ecology and climate locally, nationally, globally, and over spans of time that more closely match those of ecological change.
42
So research is changing More, and more complex data Its getting harder to wade through it Yet insight is often connecting the pieces, seeing
patterns, using new techniques…not being a poor information professional with
home grown data and tools
43
A key role: A data library AND a data librarian can play a key
role in reducing both the cost of data capture, gathering, preparation, as well as data publication
Thus effort is transferred from researchers to information systems and information professionals
..to where it should be because it saves money, and adds reliability to research
44
What’s needed of a digital repository? You can find the data you’ve generated or need You can open the data you’ve generated or need You understand what the data is and what it’s
about You can use or work with the data in the way you
need You trust the data is what is says it is
Managing Digital Continuity UK National Archives 2011
45
So we really can change the picture:
By dramatically reducing the cost of gathering and publishing, through reliable data libraries and librarians
46
Big data: Data size, complexity, reliability
Conclusions Research data is valuable It should be expected that the data underpinning
findings are available for scrutiny Far greater value is available, especially if it is
findable, accessible, interoperable and reusable This is helped if data is collected, used and
published with reliable data libraries
47
48This work is licensed under a Creative Commons Attribution 3.0 Australia License
ANDS is supported by the Australian Government through the National Collaborative Research Infrastructure Strategy (NCRIS).
Thank you!