38
J u n e 2 0 0 8 Andy Powell, Eduserv Foundation [email protected] www.eduserv.org.uk/foundation Web 2.0 and repositories… …have we got our repository architecture right?

Web 2.0 and repositories - have we got our repository architecture right?

Embed Size (px)

DESCRIPTION

A presentation given at the Talis Xiphos Research Day, 10 June 2008.

Citation preview

Page 1: Web 2.0 and repositories - have we got our repository architecture right?

Jun

e 2

00

8

Andy Powell, Eduserv [email protected]

www.eduserv.org.uk/foundation

Web 2.0 and repositories…

…have we got our repository architecture right?

Page 2: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 2

Outline

• where are we now?

• what’s wrong with where we are now?

• what can we do about it?

• do we need a new vision?

Page 3: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 3

Where are we now?

where are we now?

Page 4: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 4

What is a repository?

a university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution. … An institutional repository is not simply a fixed set of software and hardware

(Cliff Lynch, 2003)

Page 5: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 5

Repository “doing” words

• manage

• deposit

• disclose

• make openly available

• curate

• preserve

Page 6: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 6

Repository content

• all sorts… but most “academic” focus currently on

– scholarly publications

– learning objects

– research data

Page 7: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 7

Repository content

• all sorts… but most “academic” focus currently on

– scholarly publications

– learning objects

– research data

• this talk focuses on the first of these, but with the intention that most of what I say will be generic

Page 8: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 8

Repository architecture

• largely institutional focus though some exceptions – arXiv, RePEC, JORUM, etc.

• interoperability through centralised aggregators (national and global)

– search services (OAIster, Intute, …)

– registries (DOAR, ROAR, …)

• harvesting metadata about content using OAI-PMH (metadata = simple Dublin Core)

• content = PDF

• SWORD as deposit API

Page 9: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 9

What’s “wrong” with where we are now?

what’s “wrong” with where we are now?

Page 10: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 10

#1 We talk about “repositories”…

Page 11: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 11

…rather than “the Web”

a focus on ‘making content available on the Web’ would be more intuitive to

researchers

Page 12: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 12

Whatever happened to the CMS?

• a focus on ‘content management’ would change our emphasis

• OAI-PMH out…

• search engine optimisation, usability, accessibility, Web design, tagging, information architecture, cool URIs in…

Page 13: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 13

#2 We don’t emphasise…

• Google indexing

• RSS feeds

• widget technology – embedding functionality into other sites

Page 14: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 14

#3 Our focus is on sharing metadata…

• …even though we have full-text to share

• worse… the full-text we share tends to be PDF rather than native Web format

– the Web equivalent of a cul de sac

• and the metadata we share tends to be “simple Dublin Core”

– little consistency in approaches to describing ‘files’ vs. ‘documents’

– little consistency in naming authors and subjects

– ultimately, it is both too simple and too complex!

Page 15: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 15

pbo31 @ flickr

#4 We ignore the Web Architecture

• we have tended to adopt service oriented approaches

• in line with longtradition from Z39.50to SOAP/WSDL

– e.g. JISC eFramework

• focus is on building“services on content”rather than on the“content”

Page 16: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 16

REST is good

• we don’t tend to adopt a resource oriented approach

• we don’t adopt REST – an architectural style with a focus on resources, their identifiers (e.g. URIs), and a simpleuniform set of operationsthat each resourcesupports (e.g. GET,PUT, POST, DELETE)

• we don’t encourage aWeb style “follow your nose” approach

Page 17: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 17

#5 We are antisocial…

• … at least, we tend to treat “content” in isolation from the “social networks” that need to grow around that content

• successful “repositories” (Flickr, YouTube, Slideshare, etc.) promote the social activity that takes place around content as well as the content management and disclosure activity

– friends, groups, social tagging, comments, embedding, re-purposing, etc.

Page 18: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 18

But not just about functionality…

• the institutional approach has fundamental mismatch with the real-life social networks adopted by researchers

– subject-based

– cross-institutional

– global

• while institutional approach isgood from perspective of institutional management, preservation, etc.

• globally “concentrated” repositories might better reflect the social networks that need to arise

Page 19: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 19

The net effect…

• …is that there is no net effect

• repositories remain uncompelling places to disclose scholarly publications from POV of the researcher

• perceived cost of deposit remains higher than perceived benefits

• we resort to institutional or funder mandates, “thou shalt deposit”, to fill what would otherwise remain empty

Page 20: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 20

Wait just a minute…

• didn’t we used to have globally “concentrated” repository services?

• arXiv – the firstWeb 2.0 service?

• invented beforethe Web

• unfortunately, alsoinvented beforeAmazon S3

• i.e. before we knew how to scale things

Page 21: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 21

Wait just another minute…

• …doesn’t the blogsphere successfully layer a set of globally concentrated services over a distributed network of content?

– e.g. Technorati

• yes… but…

• the content is under the control of ‘individuals’ rather than ‘institutions’, and…

• the interoperability “glue” (RSS and tagging) is very lightweight and RESTful

Page 22: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 22

Having the conversation is hard

• highly political space

• strong “open access” voices who, understandably, don’t want their agenda de-railed by discussion about

– preservation

– search engine optimisation

– Web 2.0

– social networks

– semantic Web

– the future of peer review

• it can be hard to get the conversation started

Page 23: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 23

What can we do about it?

what can we do about it?

Page 24: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 24

Things can go two ways…

I think that things can go two ways…

The Web 2.0 Way

or

The Semantic Web Way

…possibly both

Page 25: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 25

Things can go two ways…

what would a Web 2.0 repository

look like?

Page 26: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 26

Like this?

Page 27: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 27

A Web 2.0 repository?

• high-quality browser-based document viewer (not Acrobat!)

• tagging, commentary, more-like-this, favorites, …

• persistent (cool) URIs to content

• ability to form simple social groups

• ability to embed documents in other Web sites

• high visibility to Google

• offer RSS as primary API

• use of Amazon S3 to cope with scalability

Page 28: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 28

In short… we go “simple”

• we develop simple(ish) repositories

• and complex aggregators and search engines

• RSS/Atom as primary “glue”

• social tagging as “description”

• full-text indexing

• microformats

• Google Sitemaps to guide harvesters to content

• complex functional requirements (e.g. author disambiguation) either ignored or met thru complexity in aggregators

Page 29: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 29

Alternatively… we go “complex”

• …we look to the Semantic Web

• we create and share muchricher metadata aboutscholarly publications thanwe do currently

• we explicitly modelcomplexity (a la FRBR)

• and aggregations

• we expose resulting metadatathru the SW “graph”

Page 30: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 30

We go “complex”...

SWAP and ORE

Page 31: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 31

We go “complex”…

• SWAP – Scholarly Works Application Profile

• an application of the Dublin Core Abstract Model and Application Profiles

• capturing relationships between works, expressions, manifestations, items and agents

• ORE – OAI Object Re-use and Exchange

• capturing relationships between aggregations and aggregated resources

• note that ORE not tied to specific entity in FRBR

• note that ORE implemented as profile of Atom

Page 32: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 32

SWAP application profile model

ScholarlyWork

Expression0..∞

isExpressedAs

Manifestation

isManifestedAs

0..∞

Copy

isAvailableAs

0..∞

0..∞

0..∞

isCreatedBy

isPublishedBy

0..∞isEditedBy

0..∞isFundedBy

isSupervisedBy

AffiliatedInstitution

Agent

Page 33: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 33

OAI ORE

Page 34: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 34

Summary

• what can we learn from Web 2.0?– user interface design matters

– global ‘concentration’ is an enabler of social interaction

• simple DC is both too simple and too complex

• richer DC application profiles such as SWAP and/or RDF applications like ORE may be a way forward

• but need to ensure that their use does not over-complicate user interfaces and workflows

Page 35: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 35

A new vision?

a new vision?

Page 36: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 36

Flickr and digital cameras…

• didn’t just take the practice of photography and put it on the Web

• they fundamentally changed what photography was about

Page 37: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 37

What’s our vision?

• the standards we adopt in the scholarly communication space…

• OAI-PMH, OpenURL, DOI, PDF, …

• are primarily about replicating in a Web world what we have always done on paper

• this is not surprising given the necessary inertia of the scholarly communication life-cycle

• but… do we need to re-envision scholarly communication as a true Web process?

• if so, what would a repository look like?

Page 38: Web 2.0 and repositories - have we got our repository architecture right?

June 2008Talis "Project Xiphos" Research Day, Birmingham 38

thank you