12
This article was downloaded by: [Uniwersytet Warszawski] On: 10 December 2014, At: 02:39 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Journal of Electronic Resources Librarianship Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/wacq20 IVDB … for Free! Implementing an Open- Source Digital Repository in a Corporate Library Alicia Verno Published online: 04 Jun 2013. To cite this article: Alicia Verno (2013) IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library, Journal of Electronic Resources Librarianship, 25:2, 89-99, DOI: 10.1080/1941126X.2013.785286 To link to this article: http://dx.doi.org/10.1080/1941126X.2013.785286 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

  • Upload
    alicia

  • View
    216

  • Download
    3

Embed Size (px)

Citation preview

Page 1: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

This article was downloaded by: [Uniwersytet Warszawski]On: 10 December 2014, At: 02:39Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Journal of Electronic ResourcesLibrarianshipPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/wacq20

IVDB … for Free! Implementing an Open-Source Digital Repository in a CorporateLibraryAlicia VernoPublished online: 04 Jun 2013.

To cite this article: Alicia Verno (2013) IVDB … for Free! Implementing an Open-Source DigitalRepository in a Corporate Library, Journal of Electronic Resources Librarianship, 25:2, 89-99, DOI:10.1080/1941126X.2013.785286

To link to this article: http://dx.doi.org/10.1080/1941126X.2013.785286

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

Journal of Electronic Resources Librarianship, 25: 89–99, 2013Published with license by Taylor & FrancisISSN: 1941-126X print / 1941-1278 onlineDOI: 10.1080/1941126X.2013.785286

IVDB . . . FOR FREE! IMPLEMENTING AN OPEN-SOURCEDIGITAL REPOSITORY IN A CORPORATE LIBRARY

Alicia Verno

In 2011, Boston Biomedical Consultants, Inc. (BBC) created a digital archive of the informationcollected over its 35-year history in IVD consulting. BBC investigated open-source programsto build and maintain a digital archive of the company’s intellectual property without a largemonetary investment from the firm. Using the DSpace Digital Repository Software, the In VitroDiagnostics Database (IVDb) was created, archiving the company’s library files and providinga searchable online database for staff use. The result has been a user-friendly, robust databasethat has proven to be a valuable research tool created with little monetary investment (∼$600in hardware costs).

KEYWORDS Digitization, open-source, controlled vocabulary, digital repository, corporatelibraries

Digitization is becoming increasingly popular in today’s libraries and information centers.As more and more users are digital natives and technology is easily accessible to almosteveryone, electronic resource delivery is expected by library patrons, and “traditional”library services are commonly seen as outdated and difficult to use. With the economicclimate under continued strain, resources for library services are diminishing, and projectssuch as creating a digital library are being waylaid, even though their value is clearlyevident. This is true of libraries of all kinds: Public, academic, special, and corporatelibrarians are struggling with the questions such as “Should I digitize?” “What are myoptions for digitization software?” and “How on Earth can I manage to digitize without anymoney in my budget?”

There are a number of open-source software packages available at no cost that allowfor inexpensive, in-house digitization by those who are comfortable with computers but arenot technology experts. While this article focuses on the endeavors of a small corporatelibrary, the lessons we have learned can be applied to any library that wishes to build adigital library or archive. Digitization is not an unattainable goal; if one is willing to takethe time to learn something new and is not afraid to experiment (and probably fail beforesucceeding), then one can digitize on a budget.

© Alicia VernoAddress correspondence to Alicia Verno, Information Services Manager, Boston Biomedical Consultants,

Inc., 410 Totten Pond Road, Suite 300, Waltham, MA 02451. E-mail: [email protected]

89

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 3: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

90 A. VERNO

DIGITIZATION AND CORPORATE LIBRARIES

While an extensive review of literature discussing the benefits of digitization incorporate libraries would be beyond the scope of this article, there are a multitude ofreasons why digitization is beneficial to libraries. As Biswas and Paul (2010) note, thereare a number of characteristics that digital libraries have that physical collections do not,including the ability to search physical and digital content simultaneously and the abilityfor more than one user to access a resource at the same time. They go on to say that digitallibraries also preserve valuable documents without taking up physical space, shorten thedelivery chain from author to user, and support an online learning environment.

Digitization has been found to be particularly beneficial in the corporate library envi-ronment. Modern business professionals work on a variety of tasks that must be completedby a set deadline, and thus they have little time to spend searching for information, leadingmany to prefer digital resources. In a 2006 study of the Case Corporation’s virtual libraryservices, Xie notes that over half of survey respondents preferred to use electronic re-sources, while only a fourth used human resources. The reasons for this, Xie said, included“search capability, convenience, familiarity, and to avoid overloading colleagues” (2006,p. 138). Stratigos and Strauss (2001) echo this sentiment, saying that convenient desktopaccess to library information is a must-have for corporate employees in order for the libraryto maintain a positive image. In addition, as more work is being completed outside theoffice setting, having mobile access to content and services is an essential corporate libraryfunction (Dugdale & Felix, 2011). For example, Pack (2000) noted that access increasedfor remote staff at Compaq when the company’s WebLibrary was introduced, as thoseemployees were never before able to use the physical libraries.

Digitization can also be useful in sharing a company’s proprietary information orintellectual property. The Ford Motor Company initiated a digital library for its FordResearch Laboratory (FRL), which included technical reports documenting the progressof company research projects (Primich & Varnum, 1999). Primich and Varnum report thataccess to FRL reports increased exponentially within a 12-month period after digitization.The project allowed for FRL reports, which were rarely accessed in physical format, tobecome easily accessible to all Ford employees, showcasing the FRL’s expertise and currentprojects to the rest of the company.

It is evident that digitization can increase use of information and simplify access forusers in any location. In the corporate setting, digitization helps bridge the gap betweentelecommuters and physical resources while saving employees some of their very valuabletime. While there is an abundance of literature telling us why we should digitize, there isnot a lot telling us how. Below is a narrative of how a small corporate library undertook adigitization project from start to finish. The focus is on the resources and tools used alongthe way to simplify the process in the hope that others may learn from our experiences.

Digitization in a Small Corporate Library

Boston Biomedical Consultants, Inc. (BBC) is a consulting firm based in Boston,Massachusetts that specializes in the in vitro diagnostics (IVD) industry. BBC providesmarket assessment and business-strategy consulting to IVD industry competitors and otherinterested parties. Since its founding in 1977, BBC has provided its clients with a va-riety of services to support their strategic planning and business development, includingwritten reports, virtual presentations, and on-site meetings. BBC has a vast internal li-

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 4: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

IVDB . . . FOR FREE! 91

brary containing all of its historical consulting reports as well as supporting materialsfrom 35 years of IVD-related research. The collection is a mix of proprietary materialsand public domain information, spanning the company’s history, that are stored in pa-per and electronic formats. The physical library documents were stored in a rolling filesystem with no sort of tracking or cataloging system in place; files were simply put intofolders on a shelf following a basic organizational scheme based on The Merck Manual(http://www.merckmanuals.com/professional/index.html). The electronic client-based filesare arranged by project, and supporting research is grouped by subject and by year on thecompany server. Files can exist in both physical and electronic form and can exist inmultiple places within each format, making locating information extremely difficult.

Over time, it became evident to BBC’s executive staff that information managementwas a development area for the company. The company’s scope of surveillance acrossthe IVD industry as well as health care, diseases, and emerging technologies has greatlyincreased over time, and given BBC’s need for current and reliable information sourcesfor client deliverables, the data collection and storage methods in use were proving to beinsufficient to manage the explosion of relevant data. With this in mind, in 2010 BBCcreated a new corporate librarian position with the title of information services manager(ISM). The ISM’s role within BBC is twofold: to provide research support for the analystsand consultants and to manage the BBC library and update the services offered to BBCstaff.

Before initiating new services, the ISM interviewed a cross-section of BBC staffto discuss their information needs and possibilities for new services. After speaking with10 members of the professional staff, it was clear that the current resources available toBBC staff were critically underutilized. The physical files, while extensive and thorough,were perceived as difficult to use because there was no way to tell whether the informationrequired existed there or exactly where it would be located. In addition, the electronic re-sources available to the staff were also unused due to the lack of in-depth search capabilitiesand lack of awareness of the resources.

In order to resolve these issues and increase the use of the library materials, theISM received approval from the executive staff to implement an internal digital archive thatwould encompass the physical library files as well as electronic resources on BBC’s internalnetwork. The benefits to creating a digital archive for BBC, similar to those discussed above,included

• increased use of company resources due to simplified access;• easier access to library materials for staff, especially for BBC’s remote employees;• access for staff to a centralized data source for searching/browsing;• full-text search capability to assist with data recall; and• long-term preservation of BBC proprietary materials.

Given that BBC is a small company with limited resources, the ISM chose to investi-gate open-source digital archive systems to lessen the financial burden of the project on thefirm. In lieu of hiring an information technology (IT) professional to manage the systems,the ISM decided to take on this role as an additional cost-saving measure.

Software Selection, Testing, and Installation

BBC’s criteria for software systems, based on the discussions with company staff,included a Web-based interface, full-text search capability, and the ability to input files

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 5: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

92 A. VERNO

in a variety of formats. Given these criteria as well as the desire to use an open-sourceplatform, the ISM narrowed the search to two systems: the DSpace Digital RepositorySystem (DSpace), created by the Massachusetts Institute of Technology and Hewlett-Packard, and the Greenstone Digital Library Software (Greenstone), created by the NewZealand Digital Library Project. Both systems are well established (in use for 10 yearsor more) and release software updates regularly to correct ongoing issues and add newfeatures. In addition, both systems are compatible with multiple file formats, includingHTML, PDF, and Microsoft Office files, and allow for full-text searching of all file types.The one major difference between the systems is that DSpace has a more extensive searchcapability than Greenstone, which allows for exact term, fielded, wild card, proximity, andrange searches (Biswas & Paul, 2010). Inevitably, the ISM decided to implement DSpace(version 1.7.1) due to its expanded search abilities as well as its large community of users(there are over 1,000 registered DSpace repositories; DuraSpace, 2012) and availability offee-based support services if needed (there are currently no U.S.-based companies offeringsupport for Greenstone).

Given that BBC’s existing server equipment did not meet the minimum requirementsfor the DSpace software, it was necessary to purchase another server to manage the system.A Dell PowerEdge T310 server was selected based on its affordability and technical speci-fications. As a cost-saving measure, BBC purchased a server with no operating system (OS)installed. Instead of paying for a Windows Server 2008 license, Ubuntu Server 10.10, anopen-source, Linux-based program, would be used. In doing so, BBC saved $800 in start-upcosts. However, the ISM had no prior experience with Linux or other command-based OS,so using Ubuntu would likely bring additional challenges.

In order to avoid major errors in installing either Ubuntu or DSpace, the ISM decidedto first test the processes using a virtual machine (VM). A VM creates “a completely isolatedoperating system installation within your normal operating system” (Caprio, 2006). Usinga VM allows the user to test—without causing irreparable damage—software installationsand perform other tasks that may harm a computer. VMs can be reset to particular pointsin time, called “snapshots,” if errors are made, allowing the user to learn from his or herexperiences without suffering any consequences. The ISM chose the Oracle VM VirtualBoxsoftware from Sun Microsystems, given that it is an open-source VM program. The VMwas configured to run Ubuntu Linux through a Windows laptop.

The Ubuntu setup was similar to that of a Windows software installation andposed no major issues; the screen prompts are easy to follow throughout the installa-tion process. Operating Ubuntu, however, was another matter. There are a multitude ofWeb sites and blogs dedicated to learning Linux, but the Ubuntu official documentation(https://help.ubuntu.com/10.04/serverguide/index.html) and the FOSSWire Ubuntu CheatSheet (http://fosswire.com/post/2008/04/ubuntu-cheat-sheet/) proved to be the most usefulin adjusting to the differences between Windows and Ubuntu as well as learning basic com-mands. Additionally, given that the ISM had never worked in a command-line OS such asLinux, using the VM gave the ISM the opportunity to learn command-based tasks withoutworrying about making mistakes. A simple snapshot restore was all that was needed if anerror could not be corrected.

The most difficult adjustment with Ubuntu was the lack of a graphical user interface(GUI) to complete tasks such as browsing the computer’s hard drive and file editing. In acommand line–based OS, these tasks can be quite tedious. As a solution to this issue, arecommendation was made to the ISM by an IT professional familiar with Linux to installWebmin, a Web-based administration for Unix-based systems. Webmin allows the user to

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 6: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

IVDB . . . FOR FREE! 93

complete administrative tasks (including those mentioned above) through a GUI accessedvia a Web browser from any computer on the Ubuntu server’s network. Webmin includesmodules for system administration (backups, software updates, etc.), server maintenance(e.g., SQL databases, Web servers, and e-mail servers), networking, managing hardware,and more. File browsing, for example, is structured similarly to Windows/Macintosh OS,making the task simpler to navigate. For those who have used only an OS with a GUI,Webmin adds a level of comfort to using Linux in that it presents the software in afamiliar fashion. While the installation of Webmin requires installing multiple softwarepackages as well as editing of text files in Ubuntu, step-by-step directions from UbuntuGeek (http://www.ubuntugeek.com/install-gui-in-ubuntu-server.html) proved to be moreuseful than the Webmin documentation (http://www.webmin.com/tgz.html), which seemedto be written for more advanced Linux users.

With Webmin installed successfully, the next step was to install DSpace. InstallingDSpace has been noted as a challenge for many digital library administrators. Accordingto Korber and Suleman, in 2008, 12% of messages posted to the DSpace-tech mailinglist included the word “installation,” and 10% included the word “configuration” (2008).After studying users installing DSpace, the authors go on to say that “the installation andconfiguration process of DSpace is too complex for users who are not administrators”(section 3.2, para. 3). They also noted that 8 of 10 participants strongly disagreed with astatement that DSpace is simple to install. While there was a learning curve for BBC, theISM discovered a few “tricks” to help install DSpace successfully.

Before installing DSpace in the VM, Dietz’s (2011) documentation on DSpace 1.7.1was thoroughly perused to determine the complexity of the installation as well as what otherprograms would need to be installed for DSpace to operate properly. The manual includesstep-by-step instructions for installing DSpace, including the Linux commands needed toexecute each step. For those who are more experienced Linux users, there is also a setof abbreviated instructions. However, given the ISM’s lack of experience, the full-lengthinstructions were used.

Initially, there were some setbacks with the DSpace installation. The process wouldinexplicably stop about halfway through the steps to build the DSpace directories, thoughit seemed that all of Dietz’s directions were followed properly. The benefit to using the VMin this case was being able to revert to a previous snapshot and repeat the process. However,the error kept occurring. After some investigation via Web searching and discussions withIT professionals, the issue turned out to be that one of the required software packages (JavaJDK6) was not installed correctly on the VM. The directions from Dietz do not specifythe names of the Linux software packages required for DSpace; the manual simply liststhe required programs. Being a Linux novice, the ISM inadvertently installed the wrongsoftware package. With the error corrected, the DSpace directories were able to buildproperly. However, this could have been avoided if the DSpace Manual included the nameof the prerequisite software packages (Table 1).

After the directories were built correctly, the DSpace install configuration files re-quired editing to add information about network settings and the SQL database created forDSpace. This is a step that causes difficulty for many users working in the Linux com-mand shell (Korber & Suleman, 2008). The ISM instead used Webmin to edit the DSpaceconfiguration file. As mentioned above, using Webmin for system administration allowsfor simpler file editing. Using this method, the process was comparable to editing a filein the Notebook program in Windows, and the necessary edits were made within minutes.

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 7: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

94 A. VERNO

Table 1 Software packages required for successful DSpace installation

Program Unix Software Package

Oracle Java JDK 6 openjdk6-jdkApache Maven 2.2.x maven2Apache Ant 1.7 ant1.7PostgreSQL postgresqlApache Tomcat 6 tomcat6

Note: Package names and software versions listed correspond to installingDSpace version 1.7.1 on Ubuntu Server 10.10; updated software packages maybe required for newer versions of DSpace or Ubuntu. All packages can beinstalled in Ubuntu with the command “apt-get install <package name>.”

Following the configuration-file editing, the remaining steps in the DSpace install werecompleted without errors.

After the installation process was successfully completed once, the ISM repeated theprocess multiple times using the VM. This way, a greater comfort level could be establishedbefore attempting the install on the live server. After about a week of testing, DSpace wasinstalled successfully on the Ubuntu server in April 2011 with no technical issues duringthe process. Through repeated practice, the ISM was able to reduce the installation time ofDSpace from over an hour on the first VM install to roughly 30 minutes when the install wascompleted on the actual server. For comparison, the average installation time for Korberand Suleman’s survey participants was more than 45 minutes (2008).

Using open-source tools such as Oracle VM VirtualBox and Webmin helped theISM, who had no prior experience with Linux-based software, to be comfortable with theprocesses involved in a DSpace installation. The VM allowed for trial and error withoutrisking irreparable damage to the server, and Webmin provided a familiar GUI throughwhich to complete tasks that otherwise would have been tedious in the command shell.While installing DSpace is not as easy as point and click, some headaches and frustrationcan be avoided by using these two open-source tools.

System Customization: DSpace Becomes IVDb

With DSpace installed and running (without blowing up the computer!), the softwarewas ready for customization. During the test process, the ISM decided to give the repositorya catchy name to use within BBC to help increase staff buy-in and make the project moreunique. After some deliberation, the database was renamed from DSpace to the In VitroDiagnostics Database (IVDb) as a play on the popular Internet Movie Database (IMDb)web site. Given that DSpace is an open-source program, it was possible to make editsto the source code to replace all instances of the word “DSpace” throughout the site toinstead say “IVDb.” Using Webmin, the ISM altered the source code to change headings,breadcrumbs, and other text where DSpace is referenced. For example, instead of saying“Search DSpace” over the search boxes, the system now says “Search IVDb.”

In addition to changing the repository name, BBC also added the company’s logo tothe top bar of all pages within DSpace by editing the cascading style sheet (CSS) of theselected XML-based theme to alter the template for each page. Extensive CSS editing wasnot done by BBC due to the fact that the preexisting Mirage theme sported a color scheme

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 8: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

IVDB . . . FOR FREE! 95

Figure 1 The in vitro diagnostics database (IVDb) home page (color figure available online).

similar to BBC’s logo and consulting reports. However, if desired, CSS editing is greatlysimplified through Webmin, much like other file editing discussed above.

Finally, text was added to the home page by editing the “news-xmlui” file saved inthe DSpace configuration folder. A few lines were added to the home page to describe thepurpose and scope of the project. An additional note about confidentiality was includedas well, given that BBC’s proprietary materials are covered by mutual non-disclosureagreements with all BBC clients and employees. With these few simple changes, IVDb wasborn (Figure 1).

Collection Setup and Data Entry

With the frameworks of IVDb in place and functioning properly, attention was thenturned to data entry and developing the structure of the collections within IVDb. In order tosimplify site navigation for the staff, the collections within IVDb mimic the organizationalscheme of the physical library. With that in mind, collections were made for BBC’s pro-prietary documents: consulting reports, e-mail briefings for clients written by BBC staff,background notes pertaining to IVD-related topics, and published documents that wereeither written by BBC staff or quote BBC staff. Additional collections were also createdfor research and supporting materials used in the development of BBC deliverables. Withthis configuration of collections, BBC staff can search either the entire database for itemsor a particular collection if the user knows what type of document he or she is searching for(e.g., a client report written in 2010), allowing for greater flexibility and increased precisionin searching.

Metadata are entered in IVDb using the standard input forms from DSpace. WhileBBC does not use all the fields, customization of the data entry screens did not seemnecessary. Dublin Core metadata fields used by BBC include dc.contributor.author, dc.date.issued, dc.description, dc.description.abstract, dc.publisher dc.relation.ispartofseries,dc.subject, and dc.title. In addition, references are made in the identifier metadata tag to

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 9: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

96 A. VERNO

Figure 2 A page from BBC’s controlled vocabulary/taxonomy used in entering subjects and keywords into IVDb(color figure available online).

where the original item is stored, whether it is a physical item in the library or an electronicfile saved on BBC’s network server. This is done so that the user can locate the originaldocument if needed for reference or to be used as a template for a new document (in thecase of electronic files).

All cataloging done in IVDb is original because records do not exist for BBC’s propri-etary materials. Metadata are manually entered via the DSpace interface, and one or morefiles are linked to each record. In order to maintain consistency in subjects/keywords, theISM created a BBC-specific controlled vocabulary. Existing subject heading schemes, suchas Library of Congress or Medical Subject Headings, do not contain enough terminologyspecific to IVD testing to be of use at BBC. With that in mind, the ISM created a listof terms commonly used by BBC with which to develop subject headings. Similar to theapproach taken by Harmsen (1998) in developing keywords for searching in an engineeringvirtual library, the ISM scoured BBC reports and other documents to find company names,testing technologies/techniques, and other IVD-related keywords. From this list of terms,broad topics were also developed to further organize the terminology into a basic taxonomy(Figure 2). The keywords are entered into IVDb with the full taxonomy entry (e.g., broadterm–narrow term–keyword). Abbreviations for terms are used where available in order tosimplify the manual entry required for cataloging and are indicated in the vocabulary list.

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 10: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

IVDB . . . FOR FREE! 97

Before developing IVDb, BBC had two Microsoft Access databases used to list BBCe-mail deliverables and public domain news articles pertaining to IVD. Both databasescontained roughly 10 years of entries and were accessed regularly by the staff, makingtheir inclusion in IVDb very beneficial. However, it would be incredibly time-consumingto manually enter all of the items into IVDb individually. DSpace has a batch item-importfunction available for such tasks, but it requires that each item and its accompanyingmetadata (in XML format) be packaged together in separate file folders, called SimpleArchive Format (SAF) by DSpace (Dietz, 2011). This process, while simple in theory,would have been a tremendous undertaking for BBC given the size of its existing databases.

After doing some Internet research on the subject, the ISM discovered that PeterDietz, who authored the DSpace Manual, created a tool to automatically create the formatrequired for batch imports in DSpace (2012). The program, called SAFBuilder, createsSAF packages using data compiled into a spreadsheet (.csv) by the user. The spreadsheetlists the filename as well as any accompanying metadata, and the SAFBuilder program usesthese data to create the SAF packages needed to ingest items into DSpace (i.e., a folderfor each item containing the file and its related metadata in XML format). For BBC, theexisting Access databases, which included the document title, date issued, and file name,were exported into Excel spreadsheets, and the headers were modified to match Dublin Corevalues. The spreadsheet was then run through the SAF builder program, and the generatedSAF packages were ingested into IVDb using Dietz’s documentation for the SAFBuilderprogram. Using this process, BBC was able to add 40,000 records to IVDb in 1 month. TheSAFBuilder program is simple to use and is highly recommended for integrating any largegroups of items into DSpace as well as existing databases in Microsoft Access or Excel.

Results: IVDb Today

After initial configuration in April 2011, the Information Services team spent about6 months entering items into IVDb with a focus on documents created in 2010 and 2011.In November 2011, IVDb was demonstrated for the BBC staff and made available forcompany-wide use. As of October 2012, there are about 58,000 item records in IVDb. Giventhat some records have more than one item attached, it is estimated that IVDb containsaround 100,000 documents. All records are original, and BBC created all metadata usingits unique taxonomy for keywords. Using SAFBuilder, 70% of the current records weretransferred from existing BBC internal databases, and the remaining records were manuallyentered. Database entry is ongoing with historical archiving of proprietary material as themain priority. IVDb currently contains 5 years of BBC client reports, 10 years of e-mailbriefings, 30 years of articles quoting BBC or authored by BBC, and 10 years of IVD-relatednews items.

Since its implementation, there have been no technical issues with the DSpace soft-ware or the Linux server. System updates are run regularly using Webmin, and backupsof original items as well as the DSpace database are run daily to ensure data protection incase of system failure. While multiple updates to DSpace have been released since IVDb’simplementation, BBC has yet to upgrade its software package.

BBC staff has not yet been surveyed regarding IVDb, but the following commentswere made by BBC Senior Management regarding the project:

My biomedical consulting activity has me tracking a host of diverse topics that oftenspan ‘historical BBC reports’ in addition to the wave of ‘current content’ on health care

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 11: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

98 A. VERNO

policy and financing, medical practice, new biomedical technology, macro economicfactors, industry events, etc. The IVDb has already impacted my efficiency and that ofmy professional staff; it is very value-added– and we are still at the beginning of its fullimplementation. –CEOI was immediately impressed by the ease-of-use of the IVDb database relative to BBC’sprevious data archiving system(s). Additionally, it was clear to me that there waspotential longer-term value to the database in the form of new client offerings. –VicePresident, DiabetesThe ability to have all relevant information in one centralized, searchable database is akey tool for our organization. Being able to search and sort by key word, date, company,etc. is a great time-saver. In addition, being able to access the information from anylocation allows for greater ease-of-use. –Vice President, Lab SystemsHaving been with the company for nearly 10 years, the IVDb is a significant improvementin BBC’s data tracking, organization, and accessibility to its employees; the creation ofthis database was a milestone for one of the company’s core capabilities. –Sr. Consultant,NAT

CONCLUSIONS: DIGITIZATION FOR DUMMIES

It is important for libraries to initiate digital archiving/digital library projects inorder to stay relevant to users and provide simplified access to information in a varietyof formats. Many libraries, particularly small libraries, have shied away from digitizationgiven the perceived high cost and extensive technical knowledge required to initiate theseprojects. However, the creation of IVDb by BBC proves that any library can digitize on alimited budget. The only monetary investment made by BBC was an inexpensive server tohouse the DSpace database and scans of paper documents. While the ISM invested manyhours in the database configuration and data entry, the cost to the firm beyond the ISM’ssalary was minimal (see Table 2). Relying solely on open-source software, BBC was able tocreate a robust and user-friendly archive for staff use. In addition, BBC was able to managethe project internally without paying for contract IT support. The ISM, who had no formaltechnical training, was able to become familiar enough with Linux to install and manageDSpace using publically-available resources found on the Internet. BBC’s story proves thatwith some intuition, a lot of determination, and a little fearlessness, digitization is possiblefor all. If we can do it, anyone can do it.

Table 2 Total cost of DSpace installation∗

Equipment/Software Cost

Dell PowerEdge Server $636∗∗Ubuntu Server software FREEDSpace Digital Repository software FREEWebmin Linux Interface FREEOracle VM VirtualBox FREESAFBuilder program FREETotal Cost $636

∗BBC owned scanning equipment and Adobe Acrobat licenses before im-plementing this project. Additional costs may arise if these items must bepurchased, ∗∗PowerEdge server cost $599 plus shipping and taxes.

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14

Page 12: IVDB … for Free! Implementing an Open-Source Digital Repository in a Corporate Library

IVDB . . . FOR FREE! 99

REFERENCES

Biswas, G., & Paul, D. (2010). An evaluative study on the open-source digital library softwares forinstitutional repository: Special reference to DSpace and Greenstone digital library [Electronicversion]. International Journal of Library and Information Science, 2(1), 001–010.

Caprio, G. (2006). Virtual machines: Virtualization vs. emulation [Web log comment]. Retrieved fromhttp://www.griffincaprio.com/blog/2006/08/virtual-machines-virtualization-vs-emulation.html

Dietz, P. (2011). DSpace 1.7.1 documentation. Retrieved from https://wiki.duraspace.org/display/DSDOC/All+Documentation

Dietz, P. (2012). Simple archive format packager. Retrieved from https://wiki.duraspace.org/display/DSPACE/Simple+Archive+Format+Packager

Dugdale, S., & Felix, E. (2011). Libraries as hubs in the new workplace. In S. E. Kelsey & M. L.Porter (Eds.), Best practices for corporate libraries (1st ed., pp. 25–46). Santa Barbara, CA:ABC-CLIO.

DuraSpace. (2012). Why use DSpace? Retrieved from http://www.dspace.org/why-useHarmsen, B. (1998). Tailoring WWW resources to the needs of your target group: An intranet virtual

library for engineers. In B. McKenna, C. Graham, & J. Kerr (Eds.), Proceedings of the 22ndInternational Online Information Meeting (pp. 311–316). Oxford, UK: Learned InformationEurope.

Korber, N., & Suleman, H. (2008). Usability of digital repository software: A study of DSpaceinstallation and configuration. Retrieved from http://pubs.cs.uct.ac.za/archive/00000483/

Pack, T. (2000). Fulfilling the vision of the virtual library: The cutting-edge WebLibrary at CompaqComputer Corporation [Electronic version]. Online, 24(5), 42–48.

Primich, T., & Varnum, K. (1999). A corporate library making the transition from traditional to webpublishing [Electronic version]. Computers in Libraries, 19(10), 58–61.

Stratigos, A., & Strouse, R. (2001). Going virtual with the corporate library [Electronic version].Online, 25(2), 66–68.

Xie, H. (2006). Understanding human–work domain interaction: Implications for the design of acorporate digital library [Electronic version]. Journal of the American Society for InformationScience and Technology, 57(1), 128–143.

APPENDIX

RESOURCES FOR IMPLEMENTING DSPACE

DSpace web site: http://www.dspace.org/Current DSpace download: http://sourceforge.net/projects/dspace/files/DSpace%20Stable/1.8.2/Ubutnu Linux: http://www.ubuntu.com/DSpace Resources wiki: https://wiki.duraspace.org/display/DSPACE/DSpaceResourcesDSpace Tech email list archive: http://sourceforge.net/mailarchive/forum.php?forum=dspace-techDSpace installation- Linux commands: https://wiki.duraspace.org/display/DSPACE/Inst-alling+DSpace+1.5+on+Ubuntu+8.10+ServerWebmin Linux administrator interface: http://www.webmin.com/index.htmlOracle VM VitrualBox: https://www.virtualbox.org/SAFBuilder: https://wiki.duraspace.org/display/DSPACE/Simple+Archive+Format+Packager

Dow

nloa

ded

by [

Uni

wer

syte

t War

szaw

ski]

at 0

2:39

10

Dec

embe

r 20

14