View
578
Download
3
Category
Preview:
Citation preview
From Open Data to Open Science
Geoffrey BoultonUniversity of Edinburgh & CODATA
“Learn” WorkshopUniversity College, LondonJanuary 2016
:
Knowledge and understanding - the engines of material progress
depend on technologies that enable their accumulation and communication
1454 2002
Openness – the bedrock of science in the modern era
Henry Oldenburg
Scientific self correction
/var/folders/ls/nv6g47p94ks4d11f1p72h2ch0000gn/T/com.apple.Preview/com.apple.Preview.PasteboardItems/rutford_avo_afi_ed_july2010 (dragged).pdf
The Challenge: the “Data Storm” is undermining “self correction”
THEN AND NOW
A crisis of reproducibility and credibility?
Why such low levels of reproducibility?• Misconduct/fraud• Invalid reasoning• Absent or inadequate data and/or metadata
19
Exab
ytes
280
Exab
ytes
Based on: http://www.martinhilbert.net/WorldOnfoCapacity.html 1 Exabyte=1018 bytes
The digital revolutionGlobal information storage capacityIn optimally compressed bytes
DigitalStorage
Analogue Storage
Explosion of the Digital revolution
1986 1993
2000
2007
2014
- 4
000
Exa
byte
s
http://www.wired.co.uk/news/archive/2014-01/15/1000-dollar-genome/viewgallery/331679
Data acquistion: Cost down – Flux up
Information: how much is crystallised into knowledge?
Reinventing reproducibility for the digital age
How do we retain an essential principle?
The data providing the evidence for a published concept MUST be concurrently published, together
with necessary metadata and computer code.To do otherwise is scientific MALPRACTICE
Ozone Levels
Four key drivers of change for science
• Big data• Semantically-linked data• Open data• Cost reduction
Micro-satellite
Looking at clouds
Pillars of the Digital Revolution
Big Data
Volume
Velocity
Variety
Linked Open DataMany databases
SemanticRelations
Deeper meaning
Foundations : Openness
Machine analysis & learning Text and data mining
The opportunity: data from “simple” to complex systems from uncoupled to highly coupled behaviour
Uncoupledsystems
Simulating behaviour of highly coupled systems
Simulating system dynamics Mapping a complex state
Image of brain cells in a rat
Emergent behaviour of a specific 6-component coupled system
• patterns not hitherto seen• unsuspected relationship• complex systems
e.g. complexity: dynamic evolution and system state
Scientific opportunities
Satellite observation Surface monitoring
The opportunity: data-modelling: iterative integration
Initial conditions
Model forecast
Model-data iteration - forecast correction
Linear regression
Cluster analysis
Dynamic/complex behaviour
Complex systemsNo mathematical pipeline
Simple relationshipsClassical statistics
System characterisations: from simple to complex
Glucose in type II diabetes
Topological analysis
A barrier to openness? - Analytic overload.E.g. - Global Earth Observation System of Systems
• What is the human role? • Can we analyse & scrutinise what is in the
black box? - &who owns the box?• What does it mean to be a researcher in a
data intensive age?
A disconnect between machine analysis & human cognition?
Mathematics related discussions
Tim Gowers- crowd-sourced mathematics
An unsolved problem posed on his blog.
32 days – 27 people – 800 substantive contributions
Emerging contributions rapidly developed or discarded
Problem solved!
“Its like driving a car whilst normal research is like pushing it”
What inhibits such processes?- The criteria for credit and
promotion – ALTMETRICS THE ANSWER?
New modes of technology-enabled creativity:
e.g Crowd-sourcing
The Open Data Iceberg
The Technical Challenge
The Consent Challenge
The Ecosystem Challenge
The Funding Challenge
The Support Challenge
The Skills Challenge
The Incentives Challenge
The Mindset Challenge
Processes &Organisation
People
motivation and ethos.
Developed from: Deetjen, U., E. T. Meyer and R. Schroeder (2015). OECD Digital Economy Papers, No. 246, OECD Publishing.
A National Infrastructure
Technology
The “Science International” Accord:principles of open data
(www.icsu.org/science-international)
Responsibilities1-2. Scientists3. Research institutions & universities4. Publishers5. Funding agencies6. Scholarly societies and academies7. Libraries & repositories
8. Boundaries of openness
Enabling practices9. Citation and provenance10. Interoperability11. Non-restrictive re-use12. Linkability
Responsibilities
Scientistsi. Publicly funded scientists have a responsibility to contribute to the
public good through the creation and communication of new knowledge, of which associated data are intrinsic parts. They should make such data openly available to others as soon as possible after their production in ways that permit them to be re-used and re-purposed.
ii. The data that provide evidence for published scientific claims should be made concurrently and publicly available in an intelligently open form. This should permit the logic of the
link between data and claim to be rigorously scrutinised and the validity of the data to be tested by replication of experiments or observations. To the extent possible, data should be deposited in well-managed and trusted repositories with low access barriers.
African Open Data/Open Science Platform
Platform ForumCoordination
GovernmentPriority setting
FundersFunding
Incentives
Capacity Building
Training and Skills
Infrastructure Roadmaps
Flagship Co-Designed
Data Intensive Projects
InternationalStandards
Programmes
Shared infrastructure investment; shared good practice; capacity building; system development
Disciplinary communities can lead the way
e.g. Elixir programme in life sciences/bio-informatics
Regional Platforms for Open Science
AfricanPlatform?
Asian Platform?
European
Platform?
AustralianPlatform
Shared investment in infrastructure; harvesting and circulating good ideas; spreading and supporting good practice; capacity building; promoting applications; linking to international programmes and standards.
S.AmericanPlatform?
Inputs Outputs
Open access
Administrative data (held by
public authorities e.g.
prescription data)
Public Sector Research data
(e.g. Met Office weather
data)
Research Data (e.g.
CERN, generated in universities)
Research publications
(i.e. papers in journals)
Open data
Open science “science as a public enterprise”
Collecting the data
Doing research
Doing science openly
Researchers - Govt & Public sector - Businesses - Citizens - Citizen scientists
(communication/dialogue – joint production of knowledge)
Stakeholders
• Communication/dialogue must be audience-sensitive• Is it – with all stakeholder groups?
Open ScienceData / Publications
Rese
arch
ers
Mon
o/M
ulti I
nter
Tr
ansd
iscip
linar
y
Sta
keho
lder
s
Rigo
ur
Inno
vatio
n P
olic
y So
lutio
ns
Open Knowledge
A national data-intensive system
International Research Data Collaboration
CODATA Policies & practice Frontiers of data
science Capacity Building
WDS• Data stewardship• Data standards
RDA• Interoperability
1. Maintaining “self-correction”
2. Open knowledge is creative & productive
“If you have an apple and I have an apple and we
exchange these apples, then you and I will still each have
one apple. But if you have an idea and I have an idea and
we exchange these ideas, then each of us will have two
ideas.”
3. Open data enables semantic linking
George Bernard Shaw
Why openness & sharing?
• Openly collected science is already helping policy makers.• AshTag app allows users to submit photos and locations of sightings to a team who will refer them on to the Forestry Commission, which is leading efforts to stop the disease's spread with the Department for Environment, Food and Rural Affairs (Defra).
Chalara spread: 1992-2012
Citizen Science
Recommended