Upload
owen-barber
View
213
Download
0
Tags:
Embed Size (px)
Citation preview
1
The Vietnam Center and ArchiveThe Vietnam Center and ArchiveStephen Maxner, Ph.D.Stephen Maxner, Ph.D.Director Director [email protected]@ttu.edu
2
SARBICA EXECUTIVE MEETING& SEMINAR
“Managing and providing online access to digital collections”
October 1, 2009
3
Presentation Outline Brief overview of the Virtual Vietnam Archive Brief overview of the Virtual Vietnam Archive
at Texas Tech Universityat Texas Tech University How we planned the Virtual ArchiveHow we planned the Virtual Archive How we select collections and materials for How we select collections and materials for
digitizationdigitization How we digitize and organize informationHow we digitize and organize information How we deliver online digital materialsHow we deliver online digital materials
4
Overview: The Virtual Vietnam Archive Created in 2001Created in 2001
Contains over 3 million pages of digitized historical Contains over 3 million pages of digitized historical texts and materials.texts and materials.
Documents, photographs, slides, oral histories, Documents, photographs, slides, oral histories, artifacts, films and moving images, audio recordings, artifacts, films and moving images, audio recordings, maps, finding aids, etc…maps, finding aids, etc…
We add approximately 15,000 pages/monthWe add approximately 15,000 pages/month Available free to researchers worldwide through Available free to researchers worldwide through
Internet.Internet.
5
How we planned the Virtual Vietnam Archive
Defined mission, objectives, and purposeDefined mission, objectives, and purpose Researched software and equipmentResearched software and equipment Developed best practices, policies and Developed best practices, policies and
procedures for digital projectprocedures for digital project Provide and manage online materials and Provide and manage online materials and
accessaccess
6
Mission and Objectives of Virtual Vietnam
Archive Increase user access to archive collections.Increase user access to archive collections. Increase control over archive collections and Increase control over archive collections and
the information they contain.the information they contain. Provide alternate user-friendly format for at-Provide alternate user-friendly format for at-
risk, inaccessible, and fragile materials (assist risk, inaccessible, and fragile materials (assist with material conservation and preservation).with material conservation and preservation).
Provide innovative user environment.Provide innovative user environment.
7
Selecting Software and Equipment SoftwareSoftware
• Data Management and User AccessData Management and User Access• Digital Capture and processingDigital Capture and processing
Computers, Networks, and StorageComputers, Networks, and Storage Digitization EquipmentDigitization Equipment
8
Software – Archive Management Cuadra Star Archives Management SoftwareCuadra Star Archives Management Software
• Inverse Relational DatabaseInverse Relational Database• Unlimited data entry fieldsUnlimited data entry fields• Cross-platform and cross-database searchCross-platform and cross-database search• Excellent web-interface for user accessExcellent web-interface for user access• Fully customizable and scalableFully customizable and scalable
9
Software - Digital Capture Software Adobe Acrobat - DocumentsAdobe Acrobat - Documents
Adobe Capture - Document textAdobe Capture - Document text Adobe Photoshop - ImagesAdobe Photoshop - Images Cool Edit Pro (Adobe Audition) - AudioCool Edit Pro (Adobe Audition) - Audio Pinnacle Studio - VideoPinnacle Studio - Video Windows Media Encoder – Streaming VideoWindows Media Encoder – Streaming Video
10
Hardware and Equipment ServersServers StorageStorage Tape Backup SystemsTape Backup Systems Desktop computersDesktop computers Desktop ScannersDesktop Scanners Specialized Digital Scanners and Conversion Specialized Digital Scanners and Conversion
SystemsSystems
11
Servers and Digital File Storage
Dell PowerEdge Servers, Multi-Processor, RAID 5Dell PowerEdge Servers, Multi-Processor, RAID 5Storage Area Network (SAN) (60 Terabytes)Storage Area Network (SAN) (60 Terabytes)
12
Dell PowerVault Tape Library
13
Document and ManuscriptsDocument and Manuscripts
Fujitsu fi4220cFujitsu fi4530c
14
Maps and Large Format ItemsMaps and Large Format Items
HP Designjet 815mfp
Epson Expression 10000XL
15
Best Practices, Policies and Procedures
Collection assessment, selection, handling, and Collection assessment, selection, handling, and scanningscanning
Digital file formats and file qualityDigital file formats and file quality Naming conventionsNaming conventions Data and Metadata captureData and Metadata capture Information and Digital File SecurityInformation and Digital File Security Online Access SystemOnline Access System
16
Evaluate Collections
Evaluate content and usefulness to Evaluate content and usefulness to researchersresearchers
Determine which collections will translate well Determine which collections will translate well to an electronic environment.to an electronic environment.
Prioritize: collections of high research or Prioritize: collections of high research or historical value go at the top of the list.historical value go at the top of the list.
17
Consider Physical Condition Will handling materials during digitization Will handling materials during digitization
process cause damage?process cause damage? Will digitization prevent further damage?Will digitization prevent further damage? Choose “archive-friendly” scannersChoose “archive-friendly” scanners Train scanning staff to properly handle archive Train scanning staff to properly handle archive
documents documents
18
Digital File Formats and File Quality Balance digital archive file storage and online access Balance digital archive file storage and online access
copiescopies Documents and Manuscripts – PDFDocuments and Manuscripts – PDF
300 – 600 DPI (100 DPI for online)300 – 600 DPI (100 DPI for online) Images – TIFF / JPGImages – TIFF / JPG
300 – 600 DPI (100 DPI for online)300 – 600 DPI (100 DPI for online) Moving Images – AVI (30fps)Moving Images – AVI (30fps)
online WMV (256 kbps)online WMV (256 kbps) Audio – WAV (44100 Hz)Audio – WAV (44100 Hz)
online MP3 (128 kbps)online MP3 (128 kbps)
19
Naming Convention
PDF document is saved as a 10 digit numberPDF document is saved as a 10 digit number For example: 2123206063For example: 2123206063 212 = Collection Number212 = Collection Number 32 = Box Number from the Collection32 = Box Number from the Collection 06 = Folder Number from the Box06 = Folder Number from the Box 063 = Document Number from the Folder (63063 = Document Number from the Folder (63rdrd document document
in the folder)in the folder) Retains original order of collection and allows researchers Retains original order of collection and allows researchers
to view electronic files as if in archiveto view electronic files as if in archive
20
Metadata Collected Develop Metadata Collection Develop Metadata Collection
Plan to ensure consistencyPlan to ensure consistency Document TitleDocument Title Item NumberItem Number Number of PagesNumber of Pages Collection NameCollection Name AuthorAuthor Copyright StatusCopyright Status Document LanguageDocument Language
21
Metadata Collected Document DateDocument Date
Document Date RangeDocument Date Range Subject Terms/KeywordsSubject Terms/Keywords Document Full TextDocument Full Text Document ConditionDocument Condition Physical Location of the Physical Location of the
CollectionCollection Record Creation and History Record Creation and History
of Updatesof Updates
22
Paper Capture (OCR) OCR is used to capture the words from the OCR is used to capture the words from the
document so the words can copied and then document so the words can copied and then pasted into the database to become search terms. pasted into the database to become search terms.
After the Paper Capture is pasted into the database After the Paper Capture is pasted into the database it is not saved because it would make the file size it is not saved because it would make the file size too big.too big.
Paper Capture is also not saved because it can Paper Capture is also not saved because it can sometimes distort the image online.sometimes distort the image online.
23
Information Security: Copyright Documents that are not copyrighted can be Documents that are not copyrighted can be
seen online and downloadedseen online and downloaded
Documents that are copyrighted can be Documents that are copyrighted can be digitized but can only be viewed on site via digitized but can only be viewed on site via INTRANET – not the Internet (per Digital INTRANET – not the Internet (per Digital Millennium Copyright Act)Millennium Copyright Act)
24
Digital Information SecurityDigital Backup Systems
Redundant SystemsRedundant Systems Online Storage – on servers and SANOnline Storage – on servers and SAN Magnetic Tapes – SDLT 1 (160/320GB)Magnetic Tapes – SDLT 1 (160/320GB)
Nightly BackupsNightly Backups Weekly BackupsWeekly Backups Offsite StorageOffsite Storage
Archival (dual layer acrylic) Gold CDs and DVDsArchival (dual layer acrylic) Gold CDs and DVDs
25
Online Access: Simple Search
26
Advanced Search Allows for multiple keywordsAllows for multiple keywords Document and Collection TitleDocument and Collection Title Media formatMedia format Date and date spanDate and date span LanguageLanguage Date when placed onlineDate when placed online Folder and Box view of collectionFolder and Box view of collection
27
Advanced Search
28
Online Summary Information
29
Online Summary Information Results
30
Search Results
31
Benefits of Digital Online Archives Researcher access: 3 million pages online in Researcher access: 3 million pages online in
Virtual Vietnam ArchiveVirtual Vietnam Archive Researcher use: Virtual Vietnam Archive hosts Researcher use: Virtual Vietnam Archive hosts
more than 1.5 million online search more than 1.5 million online search sessions/year with downloads of approximately sessions/year with downloads of approximately 2 million documents/year 2 million documents/year
Conservation and Preservation – fewer Conservation and Preservation – fewer researchers requesting to handle actual researchers requesting to handle actual documents. Less Handling = Conservationdocuments. Less Handling = Conservation
32
Thank you very much!Thank you very much!Stephen Maxner, Ph.D.Stephen Maxner, [email protected]@ttu.edu