21
1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories

1 CS 502: Computing Methods for Digital Libraries Lecture 22 Repositories

  • View
    219

  • Download
    0

Embed Size (px)

Citation preview

1

CS 502: Computing Methods for Digital Libraries

Lecture 22

Repositories

2

Administration

Final examination

May 19, 20001:00 - 2:30pm Phillips 219

5 or 6 questions on whole courseDo you want an open laptop examination?

There will be a make up examination near the beginning of the examination period. Please send me email if you might wish to take it.

3

Administration

Discussion class, Wednesday April 19

One class only, from 7:30 to 8:30 p.m.

Online survey

http://create.hci.cornell.edu/cssurvey.cfm

4

Repositories

Definitions

A repository is any computer system whose primary function is to store digital material for use in a library.

An archive is a repository that is organized to emphasize the long-term preservation of information.

5

Requirements 1

Information hiding

Internal organization should be hidden from client computers.

6

Repository layers and interfaces

Interface

Persistent Store

Object Management Layer

External Interface

Shell API

Store API

Clients

7

Requirements 2

Object models

• Support for a flexible range of object models.

• Few restrictions on data, metadata, external links, and internal relationships.

• New categories of information do not require fundamental changes to other aspects of the digital library.

8

Multiple disseminations

Client can access a choice of forms of digital object:

• Format -- PDF or HTML

• Performance -- 8 bit/pixel or 24 bit/pixel color

• Content -- thumbnail, medium-resolution, high-resolution

Repository might store alternative disseminations or derive them when requested.

9

Dynamic content

Dissemination is produced by executing code at time client makes request

• Real-time sensor, e.g., traffic camera, satellite picture

• User characteristics, e.g., location, user profile

• Dissemination is intrinsically dynamic, e.g.,

simulation

virtual reality

computer program

Java applet

10

Metadata

Metadata can be linked to digital object:

• external catalog or index

• embedded in the digital object

• generated at run time

Granularity of metadata

• collection of digital objects

• digital object

• element of digital object

11

Requirements 3

Open protocols and formats • Clients use well-defined protocols, data types, and formats. • Architecture must allow incremental changes of protocols. Access management • Allow a broad set of policies• All levels of granularity• Prepared for future developments.

Reliability and performance • Very large volumes of data• Absolutely reliable in retention of data• Good performance

12

Repository systems

Core Repository

13

Repository systems

Core Repository

LoadServices

14

Repository systems

Core Repository

LoadServices

PresentationServices

15

Common repository systems

Web server

• File-based object model plus hyperlinks

• Good tools for access

• Weak on long-term preservation

Relational database

• Table-based object model -- schema and data dictionary

• Good tools for data management

• Used for long-term preservation in data processing

16

Dumb and smart objects

Smart repositories objects

• behaviors provided by the repository• e.g., relational database

Smart clients

• behaviors provided by the client• e.g., web server

Smart objects

• repository is very simple• digital objects provide their own behaviors• compare with object-oriented programming (data + code)

17

Example: CNRI repository

Dumb repository for access to digital objects

• All information stored as typed data in digital objects.

• A single digital object has both data and metadata.

• Identification of digital objects is by location independent, persistent URNs.

• Access controls built into methods for accessing digital object.

18

Repository Access Protocol (RAP)

RAP is a simple protocol with two main groups of commands:

• Deposit digital object• Verify digital object• Delete digital object• Edit digital object

• Access digital object• Access metadata

19

Repository layers and interfaces

RAP Interface

Persistent Store

Object Management Layer

RAP Interface

Shell API

Store API

RAP Command

20

Client and repository architectures

RAP Interface

Object Management

Digital Object Processing

RAP Interface

Object Management

Object Persistence

ORB

StoreEnd Client

RAP Requests RAP Replies

Client Repository

21

Components

Hardware

• Repository: Sun Sparc with Solaris or IBM RS/6000 with AIX.

Software

• Communications: CORBA/IIOP distributed object system.

• Repository shell and object management layer: CORBA and Python.

• Persistent store: Unix file system, Oracle, Shore.

• Client: CGI scripts, Java applets.