24
Establishing National Digital Repository System employing Harvesting Model Surinder Kumar *Technical Director, NIC, New Delhi [email protected] , 011- 24305503

Digitisation and institutional repositories 2

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Digitisation and institutional repositories 2

Establishing National Digital Repository System employing Harvesting Model

Surinder Kumar

*Technical Director, NIC, New Delhi

[email protected], 011-24305503

Page 2: Digitisation and institutional repositories 2

IRs…contd

At present, the University of Southampton’s worldwide registry of OAI compliant open access repositories lists more than 1000 repositories. Number of IRs produced by India is around 50. To make it available as single virtual archive and also means of providing seamless search, it is becoming essential to form a network of connected research repositories and resource discovery services to form National digital repository system. Examples are CARL, ARROW, DRIVER etc

Page 3: Digitisation and institutional repositories 2

National Digital Repository System

To build an appropriate NDRS, analysis of existing infrastructure are analyzed.

Technology Components– Requisite Hardware– OS– IRs software such as DSpace, Eprints– Interoperability among IRs is proven with the

development of OAI-PMH protocol by OAI.

Page 4: Digitisation and institutional repositories 2

Technical Model of NDRS

Alma Swan and Chris Awre has mentioned three models in “Linking UK Repositories. These are:

Centralized Model Distributed Model Harvesting Model

Page 5: Digitisation and institutional repositories 2

Centralized Model

metadata and content are submitted directly to a central server.

Advantages Have complete control of the whole process from article

deposition through to the user interface Software selection Able to manage preservation issueDisadvantages It is an expensive option It may surpass the existing institutional repositories

Page 6: Digitisation and institutional repositories 2

Distributed Model

All metadata and content remain in their source locations and metadata is searched on the fly.

Advantages providing up-to-date metadata as it provides instant access to

source locations of metadata Relatively very less expensive as compared to centralized model

Disadvantages No enhancement of metadata Network dependent Not many IRs support Z39.50 or SRU/W

Page 7: Digitisation and institutional repositories 2

Harvesting Model

It is a hybrid model where metadata is harvested into a central searchable server and also distributed as content (full text) would be provided by individual repositories. Under this model, service provider would harvest metadata from existing institutional repositories using the Open Archives Initiatives Protocol for Metadata harvesting (OAI-PMH). Service provider can enhanced the quality of metadata and provide the various services from their centralized server. The metadata canbe further exposed via OAI_PMH, SRU/W, RSS feed for use by other service providers.

Page 8: Digitisation and institutional repositories 2

Harvesting Model-advantages

Advantages OAI-PMH is a standard protocol which is easy to

implement Unqualified Dublin Core is mandated to be OAI-

compliant, however, more complex metadata schemas can be employed.

The institutional archives employ software which supports OAI-PMH

Harvesting can be carried out by automatic scheduled tasks

Page 9: Digitisation and institutional repositories 2

Harvesting Model-disadvantages

Only Unqualified dublin core is mandated for harvesting, it lacks rich semantic as compared to other metadata schema

The metadata exposed by the services may not always latest. Also changes made in metadata may not be reflected in the central server.

Page 10: Digitisation and institutional repositories 2

NDRS-Accepted Harvesting Model

It is clear that OAI-PMH model has much advantages as compared to other model

It has gained worldwide acceptance It makes easy to share information about

scholarly resources and to offer enhanced resource discovery tools.

It has been adopted by thousands of institutions around the world.

Page 11: Digitisation and institutional repositories 2

NDRS-benefits

National Digital Repository system would offer number of benefits to end users as well to the various stake holders of the

Institutions. Benefits to IR Administrator IR administrator would only maintain the content of the

repository while offering metadata to service provider. NDRS would be inbetter position to provide long term

preservation through appropriate metadata provision and/or content package

It would offer an enhanced metadata to the end users

Page 12: Digitisation and institutional repositories 2

NDRS-benefits…contd

End Users as readers and searchers NDRS would provide end users access to a large number of

repositories rather than accessing individual repository. It would push the content to end users through RSS/ATOM

feed. It would provide document delivery services to the end users

Page 13: Digitisation and institutional repositories 2

NDRS-benefits…contd

End Users as a content manager NDRS would provide means to expose

authors’ work so as to make their work widely available to their peers throughout the globe.

It would able to provode provide preservation and metadata enhancement capabilities to support the long term storage and access to the content.

Page 14: Digitisation and institutional repositories 2

NDRS-benefits…contd

Content Aggregators NDRS would offer added-value services of their own

to enhance aggregated metadata and supply this back to the repository concerned.

IT would provide a single point of information for statistics about access and downloads of data.

It would offer a single point of information to multiple source of research and other materials to aid discovery.

It would able to provide certain collections by adding value added services on top of it.

Page 15: Digitisation and institutional repositories 2

Impediments in implementing in NDRS

Technical issues at data provider levels such as installation of IR software, server, server malfunctioning, backup of data and updating of IR software etc whereas in case of service provider level, successful harvesting of data involves error free network, the proper use of Dublin core metadata field, data sets and problems with the correct use of date stamp etc.

Coordination among IR members Federated Authentication and Authorization Long term preservation, format, migration and access Sustainability in providing ling term access to NDRS

Page 16: Digitisation and institutional repositories 2

Current Scenarios of Institutional Repositories in India

Registry of Open Access Repositories (ROAR) lists 52 repositories have been registered, however, this number may be higher as certain repositories have yet not been registered with ROAR.

Analysis of IRs in India Out of 52, 13 were not functional at the time of writing paper Number of them have not been updating To look further, it is not reaching the critical mass

Page 17: Digitisation and institutional repositories 2

Current Scenario..contd

As per survey conduced by Webometrics 2010 for latest ranking of World’s open access repositories for visibilities, quality and available items[18], there are seven repositories listed from India and their details as given in the following table.

Page 18: Digitisation and institutional repositories 2

Sr No. Rank Name of IR Year of establishment

No of records

1 82 Indian Institute of Science

05-04-2004

19477

2 148 OpenMed, National Informatics Centre

22-03-2005

2645

3 180 Indian Statistical Institute digital Library

17-01-2004

188

4 218 Indian Institute of Astrophysics

11-11-2004

2468

5 245 National Institute of Oceanography Digital library

06-04-2010

3528

6 278 Raman Research Institute Digital Library

19-04-2005

3731

6 278 National Aerospace Laboratories Institutional Repository

9-11-2004

3164

Page 19: Digitisation and institutional repositories 2

Current Scenario-service providers

There are 9 service providers in the country who are harvesting data majority of them follows OAI-PMH and harvesting software used is PKP Harvester. Out of 9, four are not functional, though these are highly cited in the literature.

Page 20: Digitisation and institutional repositories 2

Proposed NDRS

Establishing successful, well populated National level repositories, we need to look at prevailing information system in our country. For example, ICMR, CSIR, ICAR, Envis, Deptt of Atomic Energy, ISRO. Onus should be on those national information system should able to provide “publications arising out of public funded research should make it available free of cost to researchers”

Page 21: Digitisation and institutional repositories 2

NDRS

ICMR CSIR Agriculture Inflibnet

IR IR IR IR IR IR IR IR IR

Metadata refine

DocumentDelivery

Alert serviceSocial Science

IISc

RSS(for further processing)

OAI-PMH

OAI-PMH

OAI-PMH

OAI-PMHOAI-PMH

OAI-PMH

OAI-PMHOAI-PMH

OAI-PMH

Page 22: Digitisation and institutional repositories 2

NDRS-Recommendations

There is a need of national body in the country as in JISC in UK who is providing advisory as well technical services to individual repositories

Responsibility should be given to National level organizations to set up a national resource centre that should harvest data from their respective institutional repositories

Develop strategies to make institutional repositories a permanent and sustainable part of the national and local research infrastructure

• Guidelines to the respective institutional members mediate deposit or voluntary deposit and needs for mandatory deposit of papers and dissertation

Develop guidelines for metadata entry and best practices followed

Page 23: Digitisation and institutional repositories 2

Conclusion

There is a new challenge to create an environment based on OAI protocol so that public funded research should be made

available to the whole community

National level body is needed so that development in institutional repositories should be more coherent as it may able to provide the best advisory services and adoption of guidelines set and best practices followed by various national

level systems such as DRIVER, DAREnet, HAL

Page 24: Digitisation and institutional repositories 2