60
Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thoughts and Data

Bryan LawrenceOxford

March 2006

Page 2: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 2

Outline

• Intro to BADC• Drivers for Change:

– Open Access– Data as Evidence

• Handling Data – what is this metadata stuff?• Improving our ability to find and utilise data:

– The NERC DataGrid– Importance of Climate Forecast Conventions– NumSim

• Are we changing our methods of communicating?– Communication Timescales, blogging and preprints– CLADDIER

Page 3: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 3

BADC Role

The BADC role is to assist UK researchers to locate, access and interpret atmospheric data and to ensure the long-term integrity of atmospheric data produced by NERC projects.– Facilitation and Curation/Preservation!

Page 4: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 4

BADC Data Holdings

• A BADC dataset is an aggregation of data files, documents and metadata sharing common administrative policies. These policies could be file validation, access control or retention schemes.

• Datasets vary from TBs in millions of files to a few MBs in a single file.

• There are presently over 100 datasets.

Page 5: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 5

User examples

• Atmospheric chemistry models.

• Pollution chemistry measurement campaigns.

Page 6: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 6

User examples

• Bird feeding habits.

Page 7: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 7

User examples

• Radio communication modelling.

• Wind power research.

• A & E influenza cases.

Page 8: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 8

User examples

• Castle mortar decay.

• Discomfort indices.

Page 9: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Drivers for Change

Page 10: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 10

Climate in 20010 – A graphic Illustration

Figures from Gary Strand, NCAR, ESG website

March 2006, 2.5 PB

Typically, two-thirds of this data will never see the light of the day: why?

No one can remember what it was, or, if they can remember that, where it is!

Page 11: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 11

http://www.realclimate.org/index.php?p=121

Data as Evidence

http://www.uoguelph.ca/~rmckitri/research/trcback.html

What McIntyre got right:

Page 12: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 12

RCUK Position Statement on Access to Research Outputs

http://www.rcuk.ac.uk/access/statement.pdfResearch Data

8. RCUK also notes that one of the benefits of digitisation and publication in digital formats is the ability to provide access to primary research data alongside the traditional article; and it shares the Select Committee’s and the Government’s view that the data underpinning the published results of publicly-funded research should be made available as widely and rapidly as possible. For a number of years, Research Councils including the AHRB, ESRC and NERC have funded data centres and services which are responsible for preserving, managing and providing access to research data; and these Councils have well-established policies and procedures for preservation and access. CCLRC is currently leading cross-Council consideration of how policy and practice need to be developed with regard to the curation of the data created through the research projects they support. Further work is needed to develop a common framework of policies and procedures for determining what sets of data are collected, whether in university or in Council-run repositories or elsewhere; and how and on what terms they are made accessible to the research community and others

New methods of publication

9. The development of web and associated Internet technologies providing access to a range of distributed information resources has enabled new possibilities for the delivery of research publications. This has also led to a change in expectations as to how and when research publications are accessed. E-print repositories (see paragraphs 10-15 below) and open access journals (see paragraphs 25-27 below) have both developed as part of this change in technology and expectation. Indeed, the economic model for open access journals depends on the web to provide a low-cost delivery mechanism. RCUK considers that both e-print repositories and open access journal can help improve access to the results of publicly funded research.

Page 13: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 13

Data Retention Policies

University of Cambridge Research Division: Data generated in the course of research should be kept securely in paper or electronic format, as appropriate. Back-up records should always be kept for data stored on a computer. The [AMRC] considers a minimum of ten years to be an appropriate period. However, research based on clinical samples or relating to public health may require longer storage to allow for long-term follow-up to occur.[AMRC: Association of Medical Research Charities]

University of Oxford Research Services Office: A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result. …. A successful laboratory notebook allows for ready verification of quality and integrity of research data and enables another investigator to reproduce the procedure which has been documented and get the same result.

Natural Environment Research Council: … Scientists will frequently process the data they have collected selectively, or with specific application packages, in order to prepare material for publication in the scientific literature. But the full value of the data collected may only be realised if the entire dataset is subjected to generic processing (eg to ensure calibration and adequate quality control) and is sufficientlydocumented to allow others to re-use it at a later date. The original collector may be the onlyperson in a position to undertake such work, and so to unlock the full potential of the data. Thoseholding data collected under NERC funding will be expected to cooperate in validating andpublishing them in their entirety - when this can be justified in terms of their scientific value -rather than merely creaming off a subset for immediate publication in the literature. …

Page 14: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

What is this stuff called metadata?

Page 15: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 15

Preserving data is not just about backups!

One could argue that the writers of these documents did a brilliant job of preserving the bits-and-bytes of their time …

And yes they’ve both been translated … many times, it’s a shame the meanings are different …

Phaistos Disk, 1700BC

Page 16: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 16

Wider Internet

Research Group

Satellite

SuperComputer

Shared Resources

DB

Research Group

Research Group

Metadata Origins

Consider a hierarchy of data users beginning with an individual scientist, who may herself be part of a research group, itself part of a community sharing resources, lying in the wider internet …To be well integrated the metadata should have a role at each level!(The data portal client and server interface may be different at each level).At each level “extra” metadata will be required, probably produced by dedicated staff at the research group, or data centre.

Page 17: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 17

NDG Metadata Taxonomy

… not one schema, not one solution!

CSMLNCML+CF

MOLES THREDDS

DIF -> ISO19115

CLADDIER

Page 18: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

The NERC DataGrid

Page 19: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 19

http://ndg.nerc.ac.uk

British Atmospheric Data Centre

British Oceanographic Data Centre

Complexity + Volume + Remote Access = Grid Challenge

NCAR

Page 20: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 20

Wider InternetNERC Grid

taperobot

XML data-base

XML data-base

BADC NDG Wrapper

OnlineData

OnlineData

BODC NDGWrapper

OnlineData

XML data-base

Group NDGWrapper

Software Agent

Grid User

Satellite Supercomputer

Research Group DataSources

Internet Link

Internet User

Internet LinkESG (&other)Applications

Wider Internet

NDGWeb

Portal

XML data-base

Page 21: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 21

e.g.: ERA40 re-analysis surface air temperature, 2001-04-27– deegree open-source WMS modified with netCDF connector

Overlaid with rainfall from globe.digitalearth.gov WMS server

NetCDF + WMS

NB: Now using Mapserver for Interoperability experiments

Page 22: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 22

Climate Science Modelling Language

• CSML feature types– defined on basis of geometric and topologic

structure

CSML feature type Description Examples

TrajectoryFeature Discrete path in time and space of a platform or instrument.

ship’s cruise track, aircraft’s flight path

PointFeature Single point measurement. raingauge measurement

ProfileFeature Single ‘profile’ of some parameter along a directed line in space.

wind sounding, XBT, CTD, radiosonde

GridFeature Single time-snapshot of a gridded field. gridded analysis field

PointSeriesFeature Series of single datum measurements. tidegauge, rainfall timeseries

ProfileSeriesFeature Series of profile-type measurements.vertical or scanning radar, shipborne ADCP, thermistor chain timeseries

GridSeriesFeature Timeseries of gridded parameter fields.numerical weather prediction model, ocean general circulation model

Page 23: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 23

Climate Science Modelling Language

• CSML feature types– examples...

ProfileSeriesFeature

ProfileFeature

GridFeature

Page 24: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 24

Climate Science Modelling Language

• Application schema– logical structure and semantic content of NDG

‘Dataset’– Based on Geography Markup Language 3.1

«Type»GML::AbstractGMLType

«Type»Dataset

«Type»UnitDefinitions

«Type»ReferenceSystemDefinitions

«Type»PhenomenonDefinitions

«Type»AbstractArrayDescriptor

«Type»GML::FeatureCollection

**

*

*

*

*

*

*

*

*

Page 25: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 25

Climate Science Modelling Language

• Numerical array descriptors– provides ‘wrapper’

architecture for legacy data files

– ‘Connected’ to data model numerical content through ‘xlink:href’

• Three subtypes:– InlineArray– ArrayGenerator– FileExtract (NASAAmes,

NetCDF, GRIB)

• Composite design pattern for aggregation

+arraySize[1]+uom[0..1]+numericType[0..1]+numericTransform[0..1]+regExpTransform[0..1]

«Type»AbstractArrayDescriptor

+aggType[1]+aggIndex[1]

«Type»AggregatedArray

1

+component

*

+values[*]

«Type»InlineArray

+expression[1]

«Type»ArrayGenerator

+fileName[1]

«Type»AbstractFileExtract

+variableName[1]+index[0..1]

«Type»NASAAmesExtract

+variableName[1]

«Type»NetCDFExtract

+parameterCode[1]+recordNumber[0..1]+fileOffset[0..1]

«Type»GRIBExtract

+id+metaDataProperty+description+name

«Type»GML::AbstractGMLType

Page 26: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 26

Climate Science Modelling Language

• Inline array

• Array generator

<NDGInlineArray><arraySize>5 2</arraySize><uom>udunits.xml#degreeC</uom><numericType>float</numericType><regExpTransform>s/10/9/ge</regExpTransform><numericTransform>+5</numericTransform><values>1 2 3 4 5 6 7 8 9 10</values>

</NDGInlineArray>

<NDGArrayGenerator><arraySize>10001</arraySize><uom>udunits.xml#minute</uom><numericType>float</numericType><expression>0:5:50000</expression>

</NDGArrayGenerator>

Page 27: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 27

Climate Science Modelling Language

File extract<NDGNASAAmesExtract>

<arraySize>526</arraySize><numericType>double</numericType><fileName>/data/BADC/macehead/mh960606.cf1</fileName><variableName>CFC-12</variableName>

</NDGNASAAmesExtract>

<NDGNetCDFExtract gml:id="feat04azimuth"><arraySize>10000</arraySize><fileName>radar_data.nc</fileName><variableName>az</variableName>

</NDGNetCDFExtract>

<NDGGRIBExtract><arraySize>320 160</arraySize><numericType>double</numericType><fileName>/e40/ggas1992010100rsn.grb</fileName><parameterCode>203</parameterCode><recordNumber>5</ recordNumber><fileOffset>289412</fileOffset>

</NDGGRIBExtract>

Page 28: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 28

XM

L P

arser

SeeMyDENC

Data Dictionary

S52 Portrayal Library

SENC

MarineGML

(NDG) Feature

Types

XML

XML

XML

Biological Species

Chl-a from Satellite

ModelledHydrodynamics

XSLT

XSLT

XSLT

For each XSD (for the source data) there is an

XSLT to translate the data to the Feature

Types (FT) defined by CSML. The FT’s and

XSLT are maintained in a ‘MarineXML registry’ The FTs can then

be translated to equivalent FTs for

display in the ECDIS system

XSLT

Features in the source XSD must be present in

the data dictionary.

XSD

XSD

XSD

XML

XML

The result of the translation is an encoding that contains the

marine data in weakly typed (i.e. generic) Features

XSLT

XSLT

Phenomena in the XSD must have an associated

portrayal

ECDIS acts as an example client for

the data.

Data from different parts of the marine

community conforming to a variety of schema

(XSD)

MeasuredHydrodynamics

S-57v3 GML

XML

XSD

XML

XSD

Feature described using S-57v3.1Application

Schema can be imported and are equivalent to the same features in CSML’

Slide adapted from Kieran Millard (AUKEGGS, 2005)

MarineXML Testbed

Page 29: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 29

Biological sampling station with attributes for the species sampled at each

Grid of Chl-a from the MERIS instrument on ENVISAT

Predicted and measured wave climate timeseries (height, direction and period)

Vectors of currents from instruments

MarineXML Testbed

Slide adapted from Kieran Millard (AUKEGGS, 2005)

Page 30: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 30

Re-using Features

Here structured XML is converted to plain ascii text in the form required for a numerical model

HTML warning service pages are generated ‘on the fly’XML can also be converted to SVG to display data graphically

Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.

All this requires agreement on standards

Slide adapted from Kieran Millard (AUKEGGS, 2005)

Page 31: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 31

Climate Science Modelling Language

• Status:– Initial feature types defined– First draft application schema complete– Trial software tooling being coded (parser, netCDF

instantiation)– Initial deployment trial across BODC, BADC datasets

• Future:– Separate out wrapper implementation (array descriptors)– Disallow ‘internal’ dictionaries– More strongly-typed features?

• Complex features• Implicit Ensemble Support• Swathes

– Follow (and pursue!) GML evolution, enhance compliance– Expand tooling

Page 32: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 32

CSML Round Tripping

Managing semantics

UGAS

GML app schema

XML

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

instance

Class1

Class2

-End1

1

-End2

*

«datatype»DataType1

conceptual model

Conforms to

101010

New Dataset

Application

produces

parser

Under Development

Page 33: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 33

CSML Round Tripping

Managing data - 1

parser

Under Development

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

scanner

Under Development

GML app schema

XML

instance

101010

CF Dataset

Application

producesCF

Page 34: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 34

Climate Forecasting Conventions

1. Data should be self-describing. No external tables are needed to interpret the file. For instance, CF encodings do not depend upon numeric codes (by contrast with GRIB).

2. The convention should be easy to use, both for data-writers and users of data.

3. The metadata and the semantic meaning encoding though the metadata should be readable by humans as well as easily utilized by programs.

4. Redundancy should be minimised as much as possible (because it reduces the chance of errors of inconsistency when writing data)

Page 35: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 35

CF

CF consists of:• Vocabulary management• Semantic concepts (axes, cells etc), • and format specific conventions (NetCDF now)

CF is at the heart of • IPCC data comparison• Academic earth system science data

exploitation (and archival).

Page 36: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 36

CF

CF• Exploits 100’s of man years of effort on

NetCDF evolution and tools• Is one of the means by which we can

take NetCDF data and make meaningful feature types.

• Helps future proof your data!

Page 37: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 37

Managing Data 2

101010

CF Dataset

<gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList>

GML dataset

scanner

XSLT

ISO19115

XMLPUBLISH

DECISIONPROCESSES

101010

CF Dataset

Define Dataset

Add Information

Page 38: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 38

http://ndg.nerc.ac.uk/discovery

Page 39: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 39

Choose to return either data or “B-”Metadata

Look at DIFs in either HTML or XML

Can order responses by Title, Data Centre or Temporal coverage (default random)

Page 40: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 40

Page 41: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 41

Page 42: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 42

Page 43: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 43

Page 44: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 44

Page 45: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 45

Page 46: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 46

Page 47: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 47

Page 48: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 48

Background activity being parallelised with GODIVA/CCLRC e-science collaboration (spectral -> gridpoint + CDMS + visualisation tools)

Download either plot or the data that went into the plot.

Page 49: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 49

ERA40:

•All driven from one CDML file, 9 TB online spherical harmonics, looking like 40 TB “virtual” gridded!

Page 50: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 50

NumSim.xsdhttp://proj.badc.rl.ac.uk/ndg/wiki/NumSim

See also: http://www.cgam.nerc.ac.uk/pmwiki/NMM/index.php/8

Page 51: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Changing CommunicationsBlogging, Trackback and

CLADDIER

Page 52: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 52

Blogging

Wednesday 15th of March:• Google search on “climate blogs” yields

33,900,000 hits.• www.technorati.com is following 30 million

blogs– 269,404 have climate posts– 1,953 climate posts in “environmental” blogs– 131 posts about potential vorticity (mainly in

weather/hurricane blogs)

• Very few “professional” standard blogs in our field, but gazillions in others!:– Notwithstanding: http://www.realclimate.org and

others

Page 53: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 53

Traditional Scientific Publishing

Pluses: • “Peer-Review”; the gold

standard• Copy-editing• Reliable indexing (Web of

Science etc)• Paper is nice to read.Minuses:• “Peer-Review”; “support

your mates”• Often (very) slow to “print”• Proprietary indexing (Role

on Google-Scholar)• Libraries can’t afford to buy

copies! Limited Readership.

SELF PUBLISHINGPluses: • No Peer Review: say what you think,

citation and annotation measure quality!

• Feedback: comments and trackback. Hyperlinks to publications AND data.

• Immediacy• Reliable Accessible Indexing• You can print things out to read …• You can still publish in the traditional

media (while it lasts).Minuses:• No Peer Review: plenty of garbage.• Spam.Conclusion: how can we do peer review

without traditional journals? Because the days of traditional journals (apart from as formal records) are numbered!

Page 54: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 54

What is trackback?

• Hyperlinks forward in time!• If a web resource (paper, page, data) is configured

correctly, software is able to accept trackback “pings” and update that web resource with annotations. One such annotation type is effectively a citation: – “I (<blog_name>) have cited <yourURL> with something

with this <title> found at this <url>)– Real Time citation of a resource so that it shows what

people have said *after* it has been published!– (Some blogging providers do it automatically, using search

engines to find all links and enter them appropriately)

• BNL just joined a working group to “standardise” trackback, and I’ll be working to make sure the format includes Academic Citation.

Page 55: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 55

Trackback Example

Page 56: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 56

Institutional Repositories

• E-print repositories (from the RCUK document cited earlier)– For the purpose of this document, e-print repositories8 are always

understood to open access. RCUK believes that such institutional and subject-based repositories, where researchers deposit copies of the articles they publish (ie post-print), provide an opportunity significantly to enhance access to research publications … Importantly, there is a small but growing body of evidence demonstrating the increased impact and visibility of material made available in open access through e-print repositories.

(ignoring issues for the publishers for the moment)

• RCUK further recomends:– Where research is funded by the Research Councils and undertaken by

researchers with access to an open access e-print repository (institutional or subject-based), Councils will make it a condition for all grants awarded from 1 October 2005 that a copy of all resultant published journal articles or conference proceedings (but not necessarily the underlying data) should be deposited in and/or accessible through that repository, subject to copyright or licensing arrangements … Such deposit requires relatively little effort and, for each published paper, should not take more than 15- 20 minutes of an author’s or repository manager’s time. There is no reason why this should be seen as an infringement of researchers’ freedom …

Page 57: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 57

IR Examples

Page 58: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 58

CLADDIER

Page 59: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 59

CLADDIER Use Case

Sequence:1. Joanna reads paper2. Joanna acquires data3. Joanna analyses data4. Joanna deposits data

• Data Centre generates trackbacks to cited data and papers (in the metadata)

5. Joanna creates paper6. Joanna deposits paper

• Institutional Repository generates trackbacks to cited data and papers

7. Fred reads Joanna’s new paper

8. Fred directly acquires EXACTLY the same data she used for his own project

Page 60: Communicating Scientific Thoughts and Data Bryan Lawrence Oxford March 2006

Communicating Scientific Thought and DataOxford March, 2006 60

Summary

A bit of pot-pourri:• Data Reuse depends on metadata, and

eventual reuse depends on the originator doing it right!– Use CF, get involved in NumSim etc …– NDG will hopefully make it easier to exploit data!

• Timeliness of information is important and may become more relevant than quality (alone)!

• Boundary between “papers” and “data” is blurring! – The next but one RAE (if it happens) may reflect this!

• Automated linking of resources will proliferate– Use your IR and your data centre (BADC!)