Digital Libraries, Electronic Theses and Dissertations (ETDs), and NDLTD

Preview:

DESCRIPTION

Symposium: Open Access to Information Panel 2: Open Access & Institutional Repositories 24 August 2006, Brasilia. Digital Libraries, Electronic Theses and Dissertations (ETDs), and NDLTD http://fox.cs.vt.edu/talks/2006/20060824IBICTp2 Edward A. Fox, fox@vt.edu Executive Director, NDLTD - PowerPoint PPT Presentation

Citation preview

1

Symposium: Open Access to Information

Panel 2: Open Access & Institutional Repositories24 August 2006, Brasilia

Digital Libraries, Electronic Theses and Dissertations (ETDs), and NDLTD

http://fox.cs.vt.edu/talks/2006/20060824IBICTp2

Edward A. Fox, fox@vt.eduExecutive Director, NDLTD

Chair, IEEE-CS Tech. Committee on Digital LibrariesProfessor, Department of Computer ScienceDirector, Digital Library Research Laboratory

Virginia Tech, Blacksburg, VA 26061 USA

2

Outline

• Key Ideas• Acknowledgements• Digital Libraries• DLs & Scholarly Communication• Institutional Repositories• NDLTD• Summary• DL Futures

3

Key Ideas - Overview

• Theorem 1: Supporters of Open Access should support NDLTD.

• Theorem 2: 5S can guide us to better support of Open Access.

4

Acknowledgements

• Students

• Faculty, Staff

• Collaborators

• Support

• Mentors

5

Acknowledgements: Students

• Pavel Calado, Yuxin Chen, Fernando Das Neves, Shahrooz Feizabadi, Robert France, Marcos Gonçalves, Nithiwat Kampanya, S.H. Kim, Aaron Krowne, Bing Liu, Ming Luo, Paul Mather, Fernando Das Neves, Unni. Ravindranathan, Ryan Richardson, Rao Shen, Ohm Sornil, Hussein Suleman, Ricardo Torres, Wensi Xi, Baoping Zhang, Qinwei Zhu, …

6

Acknowledgements: Faculty, Staff

• Lillian Cassel, Debra Dudley, Roger Ehrich, Joanne Eustis, Weiguo Fan, James Flanagan, C. Lee Giles, Eberhard Hilf, John Impagliazzo, Filip Jagodzinski, Rohit Kelapure, Neill Kipp, Douglas Knight, Deborah Knox, Aaron Krowne, Alberto Laender, Gail McMillan, Claudia Medeiros, Manuel Perez, Naren Ramakrishnan, Layne Watson, …

7

Other Collaborators (Selected)

• Brazil: FUA, IBICT, UFMG, UNICAMP, USP• Case Western Reserve University• Emory, Notre Dame, Oregon State• Germany: Humboldt U., U. Oldenburg• Mexico: UDLA (Puebla), Monterrey• College of NJ, Hofstra, Penn State, Villanova• University of Arizona• University of Florida, Univ. of Illinois• University of Virginia• VTLS (slides on digital repositories, NDLTD)

Acknowledgements: Support

• Course: UNESCO, CETREDE, IFLA-LAC, AUGM, CLEI, UFC

• Sponsors: ACM, Adobe, AOL, CAPES, CNI, CONACyT, DFG, IBM, Microsoft, NASA, NDLTD, NLM, NSF (IIS-9986089, 0086227, 0080748, 0325579, 0535057; ITR-0325579; DUE-0121679, 0136690, 0121741, 0333601), OCLC, SOLINET, SUN, SURA, UNESCO, US Dept. Ed. (FIPSE), VTLS

9

Acknowledgements - Mentors

• JCR Licklider – undergrad advisor (1969-71)– Author in 1965 of “Libraries of the Future”– Before, at ARPA, funded start of Internet

• Michael Kessler – BS thesis advisor– Project TIP (technical information project)– Defined bibliographic coupling

• Gerard Salton – graduate advisor (1978-83)– “Father of Information Retrieval”

10

Digital Libraries

• Definitions

• DL Manifesto – Reference Model

• Book in process (Fox & Gonçalves), 5S

• DL Curriculum Project

11

DL Definitions - 1

• “A digital library is an organized and focused collection of digital objects, including text, images, video, and audio, along with methods of access and retrieval, and for selection, creation, organization, maintenance, and sharing of the collection.”

• Witten & Bainbridge – “How to Build a Digital Library” – Morgan Kaufmann 2003

12

DL Definitions - 2

• “Digital libraries are organizations that provide the resources, including the specialized staff, to select, structure, offer intellectual access to, interpret, distribute, preserve the integrity of, and ensure the persistence over time of collections of digital works so that they are readily and economically available for use by a defined community or set of communities”

• Waters,D.J. CLIR Issues, July/August 1998• www.clir.org/pubs/issues/issues04.html

13

DL Definitions - 3

• Issues and Spectra

– Collection vs. Institution

– Content vs. System

– Access vs. Preservation

– “Free” vs. Quality

– Managed vs. Comprehensive

– Centralized vs. Distributed

14

DL Definitions - 4

• NOT a “digitized library”• NOT a “deconstruction” of existing

systems and institutions, moving them to an electronic box in a Library

• IS a new way to deal with knowledge– Authoring, Self-archiving, Collecting,– Organizing, Preserving,– Accessing, Propagating, Re-using

15

D ig ita l L ib ra r y C o n te n t

A rtic le s ,R e p o rts,

B o o ks

T e xtD o cum e n ts

S p ee ch ,M u s ic

V id eoA u d io

(A e ria l)P h o tos

G e og rap h icIn fo rm ation

M o d e lsS im u la tio ns

S o ftw a re ,P ro g ra m s

G e no m eH u m a n,a n im a l,

p la n t

B ioIn fo rm ation

2 D , 3 D ,V R ,C A T

Im ag es a ndG ra p h ics

C o nte n tT yp e s

16

DL Manifesto - 1

• DL Reference Model• In support of the future European Digital Library• Developed by team connected with DELOS

(Candela, Casteli, Ioannidis, Koutrica, Meghini, Pagano, Ross, Schek, Schuldt)

• Draft 2.2 presented in Frescati, near Rome, June 2006 – 79 pages

• Could be integrated with work of DLF, JISC, etc.

17

DL Manifesto – 2: 3 Tiers

18

DL Manifesto – 3: Main Concepts

19

DL Manifesto – 4: Actor Roles

20

Fox & Gonçalves DL Book Parts

• Ch. 1. Introduction (Motivation, Synopsis)

• Part 1 – The “Ss”

• Part 2 – Higher DL Constructs

• Part 3 – Advanced Topics

• Appendix

21

Book Parts and Chapters - 1

• Ch. 1. Introduction (Motivation, Synopsis)

• Part 1 – The “Ss”– Ch. 2: Streams

– Ch. 3: Structures

– Ch. 4: Spaces

– Ch. 5: Scenarios

– Ch. 6: Societies

22

Informal 5S & DL Definitions

DLs are complex systems that

• help satisfy info needs of users (societies)

• provide info services (scenarios)

• organize info in usable ways (structures)

• present info in usable ways (spaces)

• communicate info with users (streams)

23

Digital Object

RepositoryCollection Minimal DL

Metadata Catalog

Descriptive Metadata

Specification

A Minimal DL in the 5S Framework

Structural Metadata

Specification

Streams Structures Spaces Scenarios Societies

indexing

browsing searching

services

hypertext

Structured Stream

24

Book Parts and Chapters - 2

• Part 2 – Higher DL Constructs– Ch. 7: Collections

– Ch. 8: Catalogs

– Ch. 9: Repositories and Archives

– Ch. 10: Services

– Ch. 11: Systems

– Ch. 12: Case Studies

25

Book Parts and Chapters - 3

• Part 3 – Advanced Topics– Ch. 13: Quality– Ch. 14: Integration– Ch. 15: How to build a digital library– Ch. 16: Research Challenges, Future Perspectives

• Appendix– A: Mathematical preliminaries– B: Formal Definitions: Ss – C: Formal Definitions: DL terms, Minimal DL– D: Formal Definitions: Archeological DL– E: Glossary of terms, mappings

26

DL Curriculum FrameworkSemester 1:

DL collections:development/creation

Semester 2:DL services and

sustainability

CO

UR

SE

ST

RU

CT

UR

E

DigitizationStorage

Interchange

Digital objectsCompositesPackages

MetadataCataloging

Author submission

NamingRepositories

Archives

Spaces(conceptual,geographic,2/3D, VR)

Architectures(agents, buses,

wrappers/mediators)Interoperability

Services(searching,

linking, browsing, etc.)

Intellectual property rights mgmt.

PrivacyProtection (watermarking)

Archiving and preservation

Integrity

Architectures(agents, buses,

wrappers/mediators)Interoperability

CO

RE

DL

TO

PIC

S

DocumentsE-publishing

Markup

Info. NeedsRelevanceEvaluation

Effectiveness

ThesauriOntologies

ClassificationCategorization

Bibliographic information

BibliometricsCitations

RoutingFiltering

Community filtering

Search & search strategyInfo seeking behavior

User modelingFeedback

Info summarizationVisualization

Multimedia streams/structures

Capture/representationCompression/coding

Content-based analysis

Multimedia indexing

Multimediapresentation,

rendering

RE

LA

TE

DT

OP

ICS

27

Project Teams/NSF Grant

• Project Team at VT (IIS-0535057): – PI: Dr. Edward A. Fox (fox@vt.edu) – GRA: Seungwon Yang (seungwon@vt.edu)

• Project Team at UNC-CH (IIS-0535060): – Co-PI: Dr. Barbara Wildemuth

(wildem@ils.unc.edu) – Co-PI: Dr. Jeffrey Pomerantz

(pomerantz@unc.edu) – GRA: Sanghee Oh (shoh@email.unc.edu)

28

DLs & Scholarly Communication

• Asynch

• Information Life Cycle

• Flattening

• Author skills, toward Semantic Web

• Crossing the Chasm

• OAI

29

Asynchronous, Digital Library Mediated Scholarly Communication

Different time and/or place

30

Information Life Cycle

AuthoringModifying

OrganizingIndexing

StoringRetrieving

DistributingNetworking

Retention/ Mining

AccessingFiltering

UsingCreating

31

Digital LibrariesShorten the Chain from

Editor

Publisher

A&I

Consolidator

Library

Reviewer

32

DLs Shorten the Chain to

Author

Reader

Digital

LibraryEditor

Reviewer

Teacher

Learner

Librarian

33

Important skills for authors

• Authoring (Word Processing ->e-pub)

• Rendering, presenting

• Tagging, Markup (XML, SGML)

• “Semi-structured information”

• Dual-publishing, eBooks

• Styles (XSL, XSLT)

• Structured queries

34

35

36

37

38

OAI – Repository PerspectiveRequired: Protocol

DODO DO DO

MDO

MDO MDOMDOMDO

MDOMDOMDO

39

OAI – Black Box Perspective

OA 1

OA 2

OA 4

OA 3

OA 5OA 6

OA 7

40

DiscoveryCurrent

AwarenessPreservation

Service Providers

Data Providers

Meta

data

harv

estin

g

The World According to OAI

41

Institutional Repositories

• Definitions, Goals

• Eprints

• DSpace

• Fedora, VITAL

• Comparisons

• ODL + 5S Suite (not shown)

42

Institutional Repositories - 1

• “Institutional repositories are digital collections that capture and preserve the intellectual output of a single university or a multiple institution community of colleges and universities.”

• Crow, R. “Institutional repository checklist and resource guide”, SPARC, Washington, D.C., USA

• www.arl.org/sparc/IR/IR_Guide_v1.pdf

43

Institutional Repositories - 2

• “A university-based institutional repository is a set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members. It is most essentially an organizational commitment to the stewardship of these digital materials, including long-term preservation where appropriate, as well as organization and access or distribution.”

• Lynch, C.A. In ARL Bimonthly Report 226, pp. 1-7, Feb. 2003, www.arl.org/newsltr/226/ir.html

44

What is aDigital Object Repository?

Also called: digital rep., digital asset rep., institutional repository

Stores and maintains digital objects (assets)Provides external interface for Digital Objects

Creation, Modification, Access

Enforces access policiesProvides for content type disseminations

Adapted from Slide by V. Chachra, VTLS

45

Goals of Institutional Repositories (by Steven Harnad, U. Southampton) Self Archiving of Institutional ResearchSelf Archiving of Institutional Research

Thesis and Dissertations (VTLS NDLTD Project)Thesis and Dissertations (VTLS NDLTD Project)Article preprints and post printsArticle preprints and post printsInternal documents and mapsInternal documents and maps

Management of digital collectionsManagement of digital collections

Preservation of materials – decentralized approachPreservation of materials – decentralized approach

Housing of teaching materialsHousing of teaching materials

Electronic Publishing of journals, books, posters, maps, Electronic Publishing of journals, books, posters, maps, audio, video and other multimedia objectsaudio, video and other multimedia objects

Adapted from Slide by V. Chachra, VTLS

46

47

48

49

50

51

52

53

54

What is Fedora™?

• Slides courtesy Vinod Chachra of VTLS

Flexible Extensible Digital Object Repository Architecture

55

History of Fedora™• 1997-Present

– DARPA and NSF-funded research project at Cornell (Conceptual framework developed by Sandra Payette and Carl Lagoze)

– Reference implementation developed at Cornell

• 1999-2001– University of Virginia digital library prototype (Thornton

Staples and Ross Wayland)

• 2002-Present– Andrew W. Mellon Foundation granted Virginia and Cornell

$1 million to develop a production-quality Fedora system– Fedora 1.0 released in May 2003 as Open Source under the

Mozilla public license.

56

Fedora™ Terms

MetadataDigital Objects (data)Complex Objects (Object consisting of many

objects in a complex/hierarchical relationship)Content (Data and Metadata together)Data-streams (are content for dissemination) Disseminators (are services) – A dissemination

is defined as a stream of data that manifests a view of the digital objects content.

57

Digital Object w. multiple datastreams

Digital ObjectDigital Object

DCDC

EADEAD

DatastreamsDatastreamsDatastreamsDatastreams

Admin

Metadata

Admin

Metadata

EAD

EAD

58

Example DisseminatorsPersistent ID (PID)

Default

Disseminators

Simple Image

System Metadata

Datastreams

Get ProfileList ItemsGet Item

List MethodsGet DC Record

Get ThumbnailGet Medium

Get HighGet VeryHigh

59

Fedora™Repository

E x ter n a lC o n ten tS o u r c e

E x ter n a lC o n ten tS o u r c e

HT

TP

E x ter n a l C o n ten tR etr iev er

X M L F ile s

Re la t io n a l D B

S e s s io n M a n a g e me n tU s e r A u th e n t ic a t io n

P o l icies

U s ers /G ro u p s

H T T P

F T P

D atas tr eam s

D ig ita l O b jec tsS to rag e S u b s ys te m

S e c u rityS u b s ys te m

W e b Se r vi c eE xpo s ur eL aye r

SO

AP

R em o teS er v ic e

L o c alS er v ic e

M an ag e A c c e s s S e arc h O A I P ro v id e r

M an ag e m e n tS u b s ys te m

A c c e s sS u b s ys te m

HT

TP

FT

P

H T T PH T T P S O A P H T T P S O A P H T T P S O A P

C lie n tA pplica t io n

B a tchPro g ra m

S e rv e rA pplica t io n

W e bB ro ws e r

Co mp o n e n t M g mt

O b je c t M g mt

O b je c t Va lid a t io n

P ID Ge n e ra t io n

O b je c t D is s e min a t io n

O b je c t Re fle c t io n

P o lic y En fo rc e me n t

P o lic y M g mt

Co n te n t

Web Service Web Service Exposure Exposure LayerLayer

Adapted from Slide by V. Chachra, VTLS

60

Fedora Advantage

• Extensible digital object model• Repository exposed by Web services APIs

– Management (Creation, Deletion, Maintenance, Validation)

– Access (Search, Disseminations)

• Scalable, persistent storage for content and metadata

• Content can be local and/or remote• Content versioning• Open source solution

61

Comparison of DSpace and Fedora

Dspace is a standalone product in a box whereas Fedora can be standalone or integrated with ILS

In Fedora the metadata and the content are treated the same way as data-streams; in Dspace the metadata and content get separate treatments.

Fedora can define complex objects easier Dspace is not as extensible as Fedora as it deals both with

the repositories and workflows. Fedora focuses only on the data model.

Fedora uses the Mozilla licensing model and Dspace uses GNU license. It makes it easier for software companies to provide extensions to the model.

62

VITAL / Fedora Relationship

63

Prospero: Summary of features of the three software packages compared

DSpace E-prints Fedora

What you get A package with front-end web interface directly linked to a database

A package with front-end web interface directly linked to a database

A repository database, with internal database.

Server require- ments

Unix environment, Java, Apache Ant, Apache Tomcat, PostgreSQL or Oracle

Unix environment, Perl, Apache+mod-perl, MySQL

Unix or Windows, Java. (optional: MySQL or Oracle)

Subject class- ification

Yes Yes Yes

Community groups

Yes No Possible but … (see below)

Where from? MIT and Hewlett-Packard.

Southampton University, outcome of a JISC project.

Cornell University and the University of Virginia Library.

64

65

66

67

68

NDLTD

• DL case study

• Goals

• How, Workflow

• Union Catalog

• Services atop the Union Catalog

• Sustainability and Impact

• UK related report (Aug. 2006)

A Digital Library Case Study

• Domain: graduate education, research

• Genre:ETDs=electronic theses & dissertations

• Submission: http://etd.vt.edu

• Collection: http://www.theses.org

Project: Networked Digital Library of Theses & Dissertations (NDLTD) http://www.ndltd.org

70

NDLTD Goals

• For Students:– Gain knowledge and skills for the Information Age,

especially about Digital Libraries– Richer communication (digital information, multimedia, …)

• For Universities: – Easy way to enter the digital library field and benefit

thereby

• For the World: – Global digital library – large, useful, many services

NDLTD: How can a university get involved?

• Select planning/implementation team– Graduate School– Library– Computing / Information Technology– Institutional Research / Educ. Tech.

• Join online, give us contact names– www.ndltd.org/join

• Adapt Virginia Tech or other proven approach– Build interest and consensus– Start trial / allow optional submission

Student Gets CommitteeSignatures and Submits ETD

Signed

Grad School

Library Catalogs ETD, Access isOpened to the New Research

WWW

NDLTD

74

Union catalog: OCLC

• OCLC will expand OAI data provider on TDs.

• Is getting data from WorldCat (so, from many sites!).

• Will harvest from all others who contact them.

• Need DC and either ETD-MS or MARC.

• Has a set for ETDs.

75

76

77

ETD Union Search Mirror Site in China (CALIS)(http://ndltd.calis.edu.cn – popular site!)

78

79

VTLS Union CatalogContent Languages

The VTLS NDLTD Union Catalog has data in 6 different languages. These are: English German Greek Korean Portuguese Spanish

Examples follow

80

Full-text Services

• Running since Sept 2005: Scirus

• In beta test: Google Scholar

• Challenges:– Data quality problems– Inconsistency in way to get from metadata to

the full-text file(s)– Broadening the coverage since OAI use has

not spread as widely as we would like

81

• Aiding universities to enhance graduate education, publishing and IPR efforts

• Helping improve the availability and content of theses and dissertations

• Educating ALL future scholars so they can publish electronically and effectively use digital libraries (i.e., are Information Literate and can be more expressive) -> support Open Access

What are we doing?

83

UK Report of Aug. 2006

• EVALUATION OF OPTIONS FOR A UK ELECTRONIC THESIS SERVICE

• Study report edited by Alma Swan• Key Perspectives Ltd & UCL Library Services• EThOS project (Electronic Theses Online

Service) - commissioned to develop a model for a workable, sustainable and acceptable national service for the provision of open access to electronic doctoral theses.

84

EThoS: Stakeholders

• Academic registrars

• University administrators (graduate schools)

• Librarians

• Repository managers (3; 2)

• Authors (or potential authors) of theses and dissertations

85

Assessment of the organisational modelsDistributed model Centralised model Mixed architecture

modelViability Dependent upon individual

institutions’ capabilities and resources, which are highly variable

Good, providing service provider selects correct business model and satisfies HEI concerns on rights, liabilities, etc)

Good, providing service provider selects correct business model and satisfies HEI concerns on rights, liabilities, etc)

Dis-advantages

Dependent upon individual institutions’ capabilities and resources, which are highly variable. This would lead to a service of patchy quality for at least a decadePotentially chaotic with respect to standards and consistency levels

HEIs lose control to an extent and may lose some benefits in terms of PR and other institutional-purpose benefits that accrue with local service provision

Offers potential for inconsistencies unless well-managed by hub provider

Advantages Self-organising, cheap, simple HEIs need only to provide access to e-theses: central service provider does the rest:Standards applied across the board:Guaranteed consistent access:Scope for added-value services:One interface; a true national collection as well as a national gateway:Easy to hook up to other national or international services.

Gives the greatest flexibility to HEIs to select the most appropriate options; HEIs can retain control of selected elements:Standards applied across the board:Guaranteed consistent access:Scope for added-value services:One interface (multiple sites of supply): National gateway:Easy to hook up to other national or international services.

HEI commun- ity views

Strong feeling against this option Second most popular option Highest level of support for this option

Comments No support in the HEI community Strong support within HEI community

Very strong support within HEI community

86

EThoS Survey: familiar with IPR issues related to e-theses

• 8% know very little

• 30% not very familiar

• 51% familiar

• 11% very familiar

87

EThoS Survey: my institution’s handling of PhD e-theses

• 83% not yet

• 11% from some students

• 5% from most students

• 1% from all students

88

EThoS Survey: my institution’s policy position on PhD e-theses

• 55% no policies yet

• 34% current planning policies

• 11% has a policy

89

EThoS: Benefits

• Hugely increased visibility of UK doctoral research output

• Resulting in increased usage and impact of UK doctoral research output

• The opportunities for resulting new research efforts and collaborations

90

Summary: Key Ideas

• Theorem 1: Supporters of Open Access should support NDLTD.

• Theorem 2: 5S can guide us to better support of Open Access.

91

Theorem 1: Supporters of Open Access should support NDLTD - 1

• DLs will lead to enormous benefit at all levels, from personal to global.

• An IR is a type of DL, in the middle of the levels (requiring support from below, and providing support for above levels).

• Having a DL at every university (i.e., IR) greatly encourages Open Access.

92

Theorem 1: Supporters of Open Access should support NDLTD - 2

• The easiest way to launch an IR at a university is with ETDs.

• NDLTD is the lead world organization promoting ETD activities.

• NDLTD’s goals are all in support of Open Access and IRs.

93

Theorem 2: 5S can guide us to better support of Open Access - 1

• 5S helps us think formally about Open Access, hence clearly, hence to find focus.

• 5S helps us design and build DLs, hence IRs.

• Societies– Individuals: members of institution, discipline– Social influence can promote DL (re)use.– Economic and political and social issues lead us

to a distributed architecture.

94

Theorem 2: 5S can guide us to better support of Open Access - 2

• Distributed infrastructure + services lead us to harvesting (vs. federation, gathering).

• 5S helps make harvesting a success:– Streams of content flow from individuals.– Structures: ETD-ms, (browsing) classification– Spaces: indexes, interfaces– Scenarios: submission, workflow, harvesting– Societies (see above)

• More collaboration (social networks)• Prestige is more widely spread.• Access if more open

95

DL Futures

• History

• People, Content, Tools

• Sustainable Infrastructure

• Future Work

• Links

• For More Information

96

97

98

99

People

• Digital librarians

• DL system developers

• DL system administrators

• DL managers

• DL collection development staff

• DL evaluators

• DL users

Recommended