72
Repositories and Linked Open Data: the view from myExperiment David De Roure

Repositories and Linked Open Data: the view from myExperiment

  • Upload
    urit

  • View
    26

  • Download
    0

Embed Size (px)

DESCRIPTION

Repositories and Linked Open Data: the view from myExperiment. David De Roure. Overview. http://www.myexperiment.org/packs/131. Motivation: the primacy of method myExperiment and Other Animals Design and implementation The future of research. Virtual Learning Environment. Reprints. - PowerPoint PPT Presentation

Citation preview

Page 1: Repositories and Linked Open Data: the view from myExperiment

Repositories and Linked Open Data: the view from myExperiment

David De Roure

Page 2: Repositories and Linked Open Data: the view from myExperiment

Overview

• Motivation: the primacy of method

• myExperiment and Other Animals

• Design and implementation

• The future of research

http://www.myexperiment.org/packs/131

Page 3: Repositories and Linked Open Data: the view from myExperiment

scientists

LocalWeb

Repositories

Graduate Students

Undergraduate Students

Virtual Learning Environment

Technical Reports

Reprints

Peer-Reviewed Journal &

Conference Papers

Preprints & Metadata

Certified Experimental Results & Analyses

experimentation

Data, Metadata, Provenance, Scripts, Workflows, Services,Ontologies, Blogs, ...

Digital Libraries

The social process of Science 1.02.0

Next Generation Researchers

Page 4: Repositories and Linked Open Data: the view from myExperiment

• Workflows are the new rock and roll

• Machinery for coordinating the execution of (scientific) services and linking together (scientific) resources

• The era of Service Oriented Applications

• Repetitive and mundane boring stuff made easier

E. Science laboris E. Science laboris

Carole Goble

Page 5: Repositories and Linked Open Data: the view from myExperiment

• Access to distributed and local resources

• Automation of data flow• Iteration over data sets• Interactive • Agile software development• Experimental protocols• Declarative mashups• But...

• Can be hard to build• Can “decay” as services change

Taverna Workflows Taverna Workflows

Page 6: Repositories and Linked Open Data: the view from myExperiment

• Paul writes workflows for identifying biological pathways implicated in resistance to Trypanosomiasis in cattle

• Paul meets Jo. Jo is investigating Whipworm in mouse.

• Jo reuses one of Paul’s workflow without change.

• Jo identifies the biological pathways involved in sex dependence in the mouse model, believed to be involved in the ability of mice to expel the parasite.

• Previously a manual two year study by Jo had failed to do this.

Reuse, Recycling, RepurposingReuse, Recycling, Repurposing

Page 7: Repositories and Linked Open Data: the view from myExperiment

Kepler

Triana

BPEL

Ptolemy II

Taverna

Trident

Meandre

Page 8: Repositories and Linked Open Data: the view from myExperiment

Sharing pieces of processSharing pieces of process

Page 9: Repositories and Linked Open Data: the view from myExperiment

data

method

Page 10: Repositories and Linked Open Data: the view from myExperiment

“There are these great collaboration tools that 12-year-olds are using. It’s all back to front.”

Robert Stevens

Carole Goble “e-Science is me-Science: What do Scientists want?”, EGEE 2006

Page 11: Repositories and Linked Open Data: the view from myExperiment

“A biologist would rather share their toothbrush than their gene name”

Mike Ashburner and othersProfessor in Dept of Genetics,

University of Cambridge, UK

Page 12: Repositories and Linked Open Data: the view from myExperiment

“Data mining: my data’s mine and your data’s mine”

Page 13: Repositories and Linked Open Data: the view from myExperiment

Overview

• Motivation: the primacy of method

• myExperiment and Other Animals

• Design and implementation

• The future of research

Page 14: Repositories and Linked Open Data: the view from myExperiment

mySpace for scientists!Facebook for scientists!Not Facebook for scientists!

Page 15: Repositories and Linked Open Data: the view from myExperiment

Web 2

Open Repositories

Researchers

Social Network

The experiment that is

Developers

Social Scientists

Page 16: Repositories and Linked Open Data: the view from myExperiment

“Facebook for Scientists” ...but different to Facebook!

A repository of research methods

A community social network of people and things

A Social Virtual Research Environment

Open source (BSD) Ruby on Rails app

REST and SPARQL interfaces, Linked Data compliant

Basis or inspiration for multiple projects including BioCatalogue, MethodBox and SysmoDB

myExperiment currently has 4060 members, 231 groups, 1175 workflows, 326 files and 119 packs

Page 17: Repositories and Linked Open Data: the view from myExperiment
Page 18: Repositories and Linked Open Data: the view from myExperiment
Page 19: Repositories and Linked Open Data: the view from myExperiment

• User Profiles• Groups• Friends• Sharing• Tags• Workflows• Developer interface• Credits and Attributions• Fine control over privacy• Packs• Multiple instances• Enactment

myExperiment FeaturesmyExperiment FeaturesD

istin

ctive

s

Page 20: Repositories and Linked Open Data: the view from myExperiment

ResultsLogs

Results

Metadata PaperSlides

Workflow 16

Workflow 13

Common pathways

QTL

A PackA Pack

Page 21: Repositories and Linked Open Data: the view from myExperiment

Taverna PluginsTaverna Plugins

Bringing myExperiment to the Taverna userBringing myExperiment to the Taverna user

Page 22: Repositories and Linked Open Data: the view from myExperiment

Google GadgetsGoogle Gadgets

Bringing myExperiment to the iGoogle userBringing myExperiment to the iGoogle user

Page 23: Repositories and Linked Open Data: the view from myExperiment

FacebookFacebook

Page 24: Repositories and Linked Open Data: the view from myExperiment

Windows 7Windows 7

Page 25: Repositories and Linked Open Data: the view from myExperiment

http://www.openarchives.org/ore/terms/aggregates

http://eprints.ecs.soton.ac.uk/id/eprint/20817

Page 26: Repositories and Linked Open Data: the view from myExperiment

EPrintsEPrints

Page 27: Repositories and Linked Open Data: the view from myExperiment

ECS idECS id

Page 28: Repositories and Linked Open Data: the view from myExperiment
Page 29: Repositories and Linked Open Data: the view from myExperiment

Overview

• Motivation: the primacy of method

• myExperiment and Other Animals

• Design and implementation

• The future of research

Page 30: Repositories and Linked Open Data: the view from myExperiment

The Long TailData is the Next “Intel Inside”Users add valueNetwork effects by defaultSome Rights ReservedThe Perpetual BetaCooperate, don’t ControlSoftware above the level of the single device

Web 2.0 patternsWeb 2.0 patterns

http://oreilly.com/web2/archive/what-is-web-20.html

Page 31: Repositories and Linked Open Data: the view from myExperiment
Page 32: Repositories and Linked Open Data: the view from myExperiment

1. Fit in, Don’t Force Change2. Jam today and more jam

tomorrow

3. Just in Time and Just Enough

4. Act Local, think Global 5. Enable Users to Add Value6. Design for Network Effects

1. Fit in, Don’t Force Change2. Jam today and more jam

tomorrow

3. Just in Time and Just Enough

4. Act Local, think Global 5. Enable Users to Add Value6. Design for Network Effects

Six Principles of Software Design to Empower ScientistsSix Principles of Software Design to Empower Scientists

1. Keep your Friends Close2. Embed3. Keep Sight of the Bigger

Picture4. Favours will be in your

Favour5. Know your users6. Expect and Anticipate

Change

1. Keep your Friends Close2. Embed3. Keep Sight of the Bigger

Picture4. Favours will be in your

Favour5. Know your users6. Expect and Anticipate

Change

De Roure, D. and Goble, C. "Software Design for Empowering Scientists," IEEE Software, vol. 26, no. 1, pp. 88-95, January/February 2009http://eprints.ecs.soton.ac.uk/15032/

Page 33: Repositories and Linked Open Data: the view from myExperiment

Search Engine

reviewsratingsgroupsfriendships

tags

Enactor

filesworkflows

`

HTML

For DevelopersFor Developers

RDF Store

SPAR

QL

endp

oint

Managed REST API

face

book

iGoo

gle

andr

oid

XML

APIconfig

mySQL

profiles

packscredits

Page 34: Repositories and Linked Open Data: the view from myExperiment

reviewsratingsgroupsfriendships

tags

filesworkflows RDF

Store

SPAR

QL

endp

oint

mySQL

profiles

packscredits

Modularised myExperiment Ontology

myExperiment data model (evolving!)

SPARQL endpointSPARQL endpoint

rdf.myexperiment.org

DC, FOAF, SIOC(Semantically-Interlinked Online Communities)

Page 35: Repositories and Linked Open Data: the view from myExperiment

David Newman

Page 36: Repositories and Linked Open Data: the view from myExperiment

myExperiment modularised ontologymyExperiment modularised ontology

David Newmanhttp://eprints.ecs.soton.ac.uk/17787/

Page 37: Repositories and Linked Open Data: the view from myExperiment

Exporting packsExporting packs

Page 38: Repositories and Linked Open Data: the view from myExperiment

Linked Open Data

Page 39: Repositories and Linked Open Data: the view from myExperiment

Levels of (social) compliance?

• 303s• 303s + RDF• 303s + RDF + SPARQL• Being on the diagram!

http://www.w3.org/DesignIssues/LinkedData.html

Page 40: Repositories and Linked Open Data: the view from myExperiment

Hugh Glaser

Page 41: Repositories and Linked Open Data: the view from myExperiment
Page 42: Repositories and Linked Open Data: the view from myExperiment
Page 43: Repositories and Linked Open Data: the view from myExperiment
Page 44: Repositories and Linked Open Data: the view from myExperiment

David Newman

Page 45: Repositories and Linked Open Data: the view from myExperiment

David Newman

Page 46: Repositories and Linked Open Data: the view from myExperiment

The hidden costs of linked data

• Usability– We had a perfectly good scheme before and now

we change it for something more complicated!

• Performance– All those 303s!– Rumoured that on some sites developers

append .xml to save round trips

Page 47: Repositories and Linked Open Data: the view from myExperiment

www.myexperiment.org/packs/112

www.myexperiment.org/packs/112.html *

* actually this works, sssh!

Used to share this...

Non Information Resource Usability Hacks

Still do, but browser now shows this...

Page 48: Repositories and Linked Open Data: the view from myExperiment

BioCatalogueBioCatalogueJiten Bhagat

Page 49: Repositories and Linked Open Data: the view from myExperiment

NIRNIR

myExperimentmyExperiment

Page 50: Repositories and Linked Open Data: the view from myExperiment

Overview

• Motivation: the primacy of method

• myExperiment and Other Animals

• Design and implementation

• The future of research

Page 51: Repositories and Linked Open Data: the view from myExperiment

Packs in Practice

Packs in Practice

Page 52: Repositories and Linked Open Data: the view from myExperiment

Results

Log BookProvenance

Publications and Presentations

Trainingmaterial

Related Workflows

Version history

MetadataReviewsData & Configuration

Knowledge Packages – More than MethodsKnowledge Packages – More than Methods

Carole Goble

Page 53: Repositories and Linked Open Data: the view from myExperiment

Results

Logs

Results

Metadata PaperSlides

Feeds into

produces

Included in

produces Published in

produces

Included in

Included in Included in

Published in

Workflow 16

Workflow 13

Common pathways

QTLPaul’s PackPaul’s PackPaul’s Research

Object

Paul’s Research

Object

Paul Fisher

Page 54: Repositories and Linked Open Data: the view from myExperiment

Example Investigation. Contains multiple Studies, Assays and Assets (SOPs,Models,Datafiles) Stuart Owen

SysmoDB

Page 55: Repositories and Linked Open Data: the view from myExperiment

Basic ISA structure – Investigation → Study → Assay

Page 56: Repositories and Linked Open Data: the view from myExperiment

Research Objects enable data-intensive research to be:

•Replayable – go back and see what happened•Repeatable – run the experiment again•Reproducible – independent expt to reproduce•Reusable – use as part of new experiments•Repurposeable – reuse the pieces in new expt•Reliable – robust under automation•Referenceable – citable and traceable

The Six Rs of Research Object BehavioursThe Six Rs of Research Object Behaviours

http://blog.openwetware.org/deroure/?p=56

Page 57: Repositories and Linked Open Data: the view from myExperiment

Stereotypes

• Publication Object– Record of Activity– Credit/attribution

• Live Object– RO as work in progress– Up to date references to

appropriate resource• Archived Object

– RO as a record of what happened

– Curated, “fossilised”, immutable aggregation

• View Object– Named Graphs for LD

• Exposing Object– Standardised wrapper

around data sources• Method Object

– RO as protocol

Graceful Degradation

Research Object services are able to consume Research Objects without necessarily understanding or processing all of their content

Graceful Degradation

Research Object services are able to consume Research Objects without necessarily understanding or processing all of their content

Sean Bechhofer

Page 58: Repositories and Linked Open Data: the view from myExperiment

SALAMISALAMI

Generating musicological research resources usingInternet Archive + Music Info Retrieval Algorithms + Supercomputer + Crowdsourced ground truth

http://www.diggingintodata.org/

Page 59: Repositories and Linked Open Data: the view from myExperiment

Stephen Downie

Page 60: Repositories and Linked Open Data: the view from myExperiment

“Signal”

SALAMISALAMI

Digital Audio

“Ground Truth”

StructuralAnalysis

Community

It’s web-like!

Q. If and when should community-generated content be assimilated into managed repositories?

Page 61: Repositories and Linked Open Data: the view from myExperiment

How Country is my Country?How Country is my Country?

A researcher explaining their “workflow”... 1) Use SPARQL to generate a collection of signal2) Publish that collection3) Our local signal repository has copies of the actual signal, and

publishes sub-graphs of linked data asserting what those signals are of (using the URI for that track/record etc.)

4) The workflow performing the feature extraction combines (2) and (3) when fetching the signal for feature extraction and classification, and persists the URI for the signal artefact (track/record etc.)

5) The results are published (e.g. of genre classification) and reference that URI

Kevin Page

Page 62: Repositories and Linked Open Data: the view from myExperiment

Find all artists and show their countries

PREFIX geo: <http://www.geonames.org/ontology#>SELECT ?name ?countryWHERE{ ?artist a mo:MusicArtist; foaf:based_near ?place; foaf:name ?name. ?place geo:inCountry ?country }ORDER BY ?name

Find all records by artists from France

PREFIX geo: <http://www.geonames.org/ontology#>SELECT DISTINCT ?recordWHERE{ ?artist a mo:MusicArtist; foaf:name ?name; foaf:based_near ?place. ?place geo:inCountry <http://www.geonames.org/countries/#FR>. ?record a mo:Record; foaf:maker ?artist }ORDER BY ?record

Find all tracks from records by artists from FrancePREFIX geo: <http://www.geonames.org/ontology#>SELECT DISTINCT ?trackWHERE{ ?artist a mo:MusicArtist; foaf:name ?name; foaf:based_near ?place. ?place geo:inCountry <http://www.geonames.org/countries/#FR>. ?record a mo:Record; foaf:maker ?artist; mo:track ?track }ORDER BY ?track

Kevin Page

Page 63: Repositories and Linked Open Data: the view from myExperiment

Francois Belleau

Page 64: Repositories and Linked Open Data: the view from myExperiment

Evolution of our research environmentEvolution of our research environment

1st GenerationCurrent practices of early adoptors of tools.Characterised by researchers using tools within their particular problem area, with some re-use of tools, data and methods within the discipline. Traditional publishing is supplemented by publication of some digital artefacts like workflows and links to data. Provenance is recorded but not shared and re-used.Science is accelerated and practice beginning to shift to emphasise in silico work.

2nd GenerationProjects delivering now.Some institutional embedding.Key characteristic is re-use - of the increasing pool of tools, data and methods across areas/disciplines. Contain some freestanding, recombinant, reproducible research objects. Provenance analytics plays a role.New scientific practices are established and opportunities arise for completely new scientific investigations.Some expert curation.

3rd GenerationThe solutions we'll be delivering in 5 yearsCharacterised by global reuse of tools, data and methods across any discipline, and surfacing the right levels of complexity for the researcher. Routine use.Key characteristic is radical sharing .Research is significantly data driven - plundering the backlog of data, results and methods. Increasing automation and decision-support for the researcher - the VRE becomes assistive. Provenance assists design.Curation is autonomic and social.

Page 65: Repositories and Linked Open Data: the view from myExperiment

Deluge of data => Deluge of methods to process it?

Recording, re-using and sharing methods: Supports reproducible science Enables interpretation & trust of results Supports re-use and re-purposing Shares know-how Builds capability to understand data

Methods should be first class citizens!

Though this be madness, yet there is method in it*Though this be madness, yet there is method in it*

* Polonius in Hamlet

Page 66: Repositories and Linked Open Data: the view from myExperiment

• How we share– We are co-evolving a social infrastructure for sharing

• What we share– In the future we’ll be saying “Could I have a copy of your

Research Object please?” (if we didn’t pick it up from the tweet...)

• Current work– Comunity curation, expert curation, assisted curation– Emerging practice in automation over linked data– Boundaries and guarantees: “the Web – particle duality”

Linked Open Methods*Linked Open Methods*

* Sean Bechhofer

Page 67: Repositories and Linked Open Data: the view from myExperiment

• Linked Data community has guidelines and tooling for production

• Production practice will improve as consumption increases– e.g. Discovery– e.g. Versioning

• Issues of authority, licence, governance and curation are perhaps best addressed by the open repository community

• Balancing freshness with persistence

Repositories & Linked DataRepositories & Linked Data

Page 68: Repositories and Linked Open Data: the view from myExperiment

Contact

David De [email protected]

Carole [email protected]

Visit wiki.myexperiment.org

Page 69: Repositories and Linked Open Data: the view from myExperiment

The Team

Sergejs Aleksejevs Mark Borkum Sean Bechhofer Jiten Bhagat Simon Coles Don Cruickshank Cat De Roure Paul Fisher Jeremy Frey Matt Gamble Duncan Hull Kumar Kollara Peter Li Ravi Madduri Danius Michaelides Paolo Missier David Newman Cameron Neylon Stuart Owen Kevin Page Rob Procter Marco Roos Stian Soiland Shoaib Sufi Mannie Tagarira Andrea Wiggins Alan Williams Katy Wolstencroft Tom Eveleigh June Finch Antoon Goderis Andrew Harrison Matt Lee Yuwei Lin Kurt Mueller Savas Parastatidis Meik Poschen Marcus Ramsden Ian Taylor Alexander Voss David Withers Ed Zaluska

Page 70: Repositories and Linked Open Data: the view from myExperiment

Funders

JISC Virtual Research Environments and Repositories programmes

EPSRC myGrid ande-Research South platform awards

Microsoft Research Technical Computing Initiative

Andrew W. Mellon Foundation

Page 71: Repositories and Linked Open Data: the view from myExperiment

Publications De Roure, D., Goble, C. and Stevens, R. (2009) “The Design and Realisation

of the myExperiment Virtual Research Environment for Social Sharing of Workflows,” Future Generation Computer Systems 25, pp. 561-567.

Goble, C.A., Bhagat, J., Aleksejevs, S., Cruickshank, D., Michaelides, D., Newman, D., Borkum, M., Bechhofer, S., Roos, M., Li, P., and De Roure, D.: myExperiment: a repository and social network for the sharing of bioinformatics workflows, Nucl. Acids Res., 2010. doi:10.1093/nar/gkq429

De Roure, D. and Goble, C. (2009) "Software Design for Empowering Scientists," IEEE Software, vol. 26, no. 1, pp. 88-95, January/February 2009.

Newman, D.R., Bechhofer, S. and De Roure, D. (2009) “myExperiment: An ontology for e-Research,” Workshop on Semantic Web Applications in Scientific Discourse at 8th International Semantic Web Conference (ISWC 2009), Washington DC, October 2009.

Bechhofer, S., De Roure, D., Gamble, M., Goble, C. and Buchan, I. (2010) Research Objects: Towards Exchange and Reuse of Digital Knowledge. In: The Future of the Web for Collaborative Science (FWCS 2010), April 2010, Raleigh, NC, USA.

http://wiki.myexperiment.org/index.php/Papers

Page 72: Repositories and Linked Open Data: the view from myExperiment