Upload
dorothy-chapman
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Building Building collections with collections with
GreenstoneGreenstone
How to Build a Digital LibraryHow to Build a Digital LibraryIan H. Witten and David BainbridgeIan H. Witten and David Bainbridge
Digital Library Digital Library CollectionsCollections
There is a distinction betweenThere is a distinction between BUILDING collectionsBUILDING collections DELIVERING information to usersDELIVERING information to users
Similar to ‘compile-time’ versus Similar to ‘compile-time’ versus ‘runtime’ distinction in computer ‘runtime’ distinction in computer programmingprogramming
Information structures should Information structures should usually be prepared in advanceusually be prepared in advance
Building a CollectionBuilding a Collection
The CollectorThe Collector A subsystem that takes you step by step A subsystem that takes you step by step
through building a simple collectionthrough building a simple collection Conceals details behind the scenesConceals details behind the scenes
First locate information on your First locate information on your computer or the Webcomputer or the Web Plain text, HTML, Word, PDF, email file, Plain text, HTML, Word, PDF, email file,
etc. etc.
Plug-insPlug-ins
Plug-ins are software modules that Plug-ins are software modules that handlehandle Format conversionFormat conversion Metadata extractionMetadata extraction
Plug-ins promote extensibilityPlug-ins promote extensibility
Greenstone Archive Greenstone Archive FormatFormat
Greenstone Archive FormatGreenstone Archive Format XML-based file formatXML-based file format File format for:File format for:
DocumentsDocuments MetadataMetadata
Collection Configuration Collection Configuration FileFile
Collection Configuration FileCollection Configuration File Defines the structure of a collectionDefines the structure of a collection Governs how the collection is builtGoverns how the collection is built Specifies how the collection will appear Specifies how the collection will appear
to usersto users
Greenstone Extended Greenstone Extended CapabilitiesCapabilities
Extending the Capabilities of Extending the Capabilities of GreenstoneGreenstone Plug-insPlug-ins
Handle different document and metadata Handle different document and metadata formatsformats
ClassifiersClassifiers Handle different kinds of browsing structuresHandle different kinds of browsing structures
Format statements and MacrosFormat statements and Macros Govern the user interface content and Govern the user interface content and
appearanceappearance
Why Greenstone?Why Greenstone?
Benefits of GreenstoneBenefits of Greenstone
General system for constructing and General system for constructing and presenting digital collectionspresenting digital collections
Handles millions of documents, text, Handles millions of documents, text, images, audio, videoimages, audio, video
User interfaces identical in Web-User interfaces identical in Web-based and CD-ROM versionsbased and CD-ROM versions
Installs on Windows and LinuxInstalls on Windows and Linux Access locally or remotely using web Access locally or remotely using web
browserbrowser
Organization of Organization of CollectionsCollections
Each collection can be organized Each collection can be organized differently:differently: Format of source documentsFormat of source documents MetadataMetadata Directory structureDirectory structure Document structureDocument structure Searching and browsing servicesSearching and browsing services PresentationPresentation Auxiliary servicesAuxiliary services
Variation of Source Variation of Source FormatFormat
Source documents can be supplied in:Source documents can be supplied in: Plain textPlain text HTMLHTML PostScriptPostScript PDFPDF WordWord E-mailE-mail Other file typesOther file types ImagesImages VideoVideo AudioAudio
Variation of MetadataVariation of Metadata
Different types of metadataDifferent types of metadata Metadata can be supplied differentlyMetadata can be supplied differently
‘‘fields’ in MS Wordfields’ in MS Word <meta> tags in HTML<meta> tags in HTML Information coded into filename and Information coded into filename and
directoriesdirectories Spreadsheet or other data fileSpreadsheet or other data file Explicit metadata format like MARCExplicit metadata format like MARC
Variation of Directory Variation of Directory StructureStructure
Collections can vary in the directory Collections can vary in the directory structure in which the information is structure in which the information is locatedlocated
Variation of Document Variation of Document StructureStructure
Document structureDocument structure FlatFlat Divided sequentially into pagesDivided sequentially into pages Hierarchical organizationHierarchical organization
Title or other metadata available at each Title or other metadata available at each levellevel
Variation of ServicesVariation of Services
SearchingSearching MetadataMetadata IndexesIndexes Hierarchical levelsHierarchical levels
BrowsingBrowsing MetadataMetadata Browser typeBrowser type
Variation of PresentationVariation of Presentation
Results can be presented to users in Results can be presented to users in various ways:various ways: Format that target documents are Format that target documents are
shown inshown in Search results pageSearch results page Metadata browsersMetadata browsers Interface languageInterface language
Variation of Auxiliary Variation of Auxiliary ServicesServices
A collection may require additional A collection may require additional servicesservices User loggingUser logging Etc.Etc.
Collection Configuration Collection Configuration FileFile
Allows VariationAllows Variation A digital library collection is made A digital library collection is made
byby Gathering raw materialGathering raw material Designing the collectionDesigning the collection Putting design information about the Putting design information about the
structure and presentation of the structure and presentation of the collection in the Collection collection in the Collection Configuration FileConfiguration File
Front Page of CollectionFront Page of Collection
Statement of collection’s purposeStatement of collection’s purpose
Statement of collection’s coverageStatement of collection’s coverage
Explanation of how collection is Explanation of how collection is organizedorganized
Searching Involves Searching Involves IndexesIndexes
Searching is provided by indexes Searching is provided by indexes built from different parts of the built from different parts of the documentsdocuments Entire documentsEntire documents ParagraphsParagraphs TitlesTitles SectionsSections Section headingsSection headings Figure captionsFigure captions
IndexesIndexes
Indexes can be created automatically Indexes can be created automatically usingusing DocumentsDocuments Supporting filesSupporting files
Indexes can be rebuilt automaticallyIndexes can be rebuilt automatically New document in the same format New document in the same format
becomes availablebecomes available Process can awake, check for new material, Process can awake, check for new material,
and rebuild the indexesand rebuild the indexes
Plug-ins for IndexingPlug-ins for Indexing
Source documents are converted into Source documents are converted into standard XML form for indexing using plug-standard XML form for indexing using plug-insins
Standard plug-ins processStandard plug-ins process Plain textPlain text HTMLHTML WordWord PDFPDF Usenet and email messagesUsenet and email messages
New plug-ins can be written for other New plug-ins can be written for other document typesdocument types
Browsing Involves ListsBrowsing Involves Lists
Browsing involves lists that can be Browsing involves lists that can be examined by the userexamined by the user AuthorsAuthors TitlesTitles DatesDates Hierarchical classification structuresHierarchical classification structures
Classifier ModulesClassifier Modules
Modules called classifiers are used to Modules called classifiers are used to create browsers and build browsing create browsers and build browsing structures from metadatastructures from metadata Scrollable listsScrollable lists Alphabetic selectorsAlphabetic selectors DatesDates HierarchiesHierarchies
Programmers can write new Programmers can write new classifiers to create novel browsing classifiers to create novel browsing capabilitiescapabilities
Search TermsSearch Terms
Search Terms in Greenstone:Search Terms in Greenstone: Alphabetic charactersAlphabetic characters DigitsDigits
Separated by white spaceSeparated by white space Punctuation acts as white spacePunctuation acts as white space
Two Types of QueriesTwo Types of Queries
Query for ALL of the wordsQuery for ALL of the words Boolean ANDBoolean AND
Query for SOME of the wordsQuery for SOME of the words Ranked Ranked
Indexes to SearchIndexes to Search
In most collections, you can choose In most collections, you can choose different indexes to searchdifferent indexes to search
Examples:Examples: Author and title indexesAuthor and title indexes Chapter and paragraph indexesChapter and paragraph indexes
Usually the full matching document is Usually the full matching document is returned regardless of index searchedreturned regardless of index searched
Preferences PagePreferences Page
Preferences PagePreferences Page Allows advanced control over search Allows advanced control over search
operation:operation: Case-folding and stemming Case-folding and stemming Advanced query mode where users specify Advanced query mode where users specify
Boolean operatorsBoolean operators Large-query interfaceLarge-query interface Display search historyDisplay search history
Preferences PagePreferences Page
Preferences PagePreferences Page Specify subcollections to be included in Specify subcollections to be included in
searchessearches Specify presentation languageSpecify presentation language Customize interfaceCustomize interface
Textual vs. standard interfaceTextual vs. standard interface Suppress navigation barSuppress navigation bar Suppress alert systemSuppress alert system
Using the Using the CollectorCollector
The Greenstone CollectorThe Greenstone Collector
Easiest way to build a simple Easiest way to build a simple collectioncollection
The Collector allows you to:The Collector allows you to: Create a new collectionCreate a new collection Modify or add to an existing collectionModify or add to an existing collection Delete a collectionDelete a collection
Starting the CollectorStarting the Collector
Click the Collector link from the Click the Collector link from the default Greenstone home pagedefault Greenstone home page
Log inLog in When Greenstone is installed, an When Greenstone is installed, an
account called account called adminadmin is set up with a is set up with a password chosen during installationpassword chosen during installation
The Collector works through a The Collector works through a standard web interfacestandard web interface
Creating a New Creating a New CollectionCollection
Collector’s main purpose is to build Collector’s main purpose is to build a new collectiona new collection
Structure of a collection is Structure of a collection is determined when the collection is determined when the collection is set upset up
Simplest to copy the structure of an Simplest to copy the structure of an existing collection and then editexisting collection and then edit
Collection Building StepsCollection Building Steps
1.1. Collection InformationCollection Information
2.2. Source DataSource Data
3.3. ConfigurationConfiguration
4.4. BuildingBuilding
5.5. ViewingViewing
Collection Building StepsCollection Building Steps
☐ ☐ Collection InformationCollection Information
☐ ☐ Source DataSource Data
☐ ☐ ConfigurationConfiguration
☐ ☐ BuildingBuilding
☐ ☐ ViewingViewing
1. Collection Information1. Collection Information
Give the collection a name and Give the collection a name and provide associated informationprovide associated information TitleTitle
Short phrase used to identify the collection Short phrase used to identify the collection within the digital librarywithin the digital library
Contact e-mail addressContact e-mail address Brief descriptionBrief description
Sets out the principles that govern what is Sets out the principles that govern what is included in the collectionincluded in the collection
Collection Building StepsCollection Building Steps
☑ ☑ Collection InformationCollection Information
☐ ☐ Source DataSource Data
☐ ☐ ConfigurationConfiguration
☐ ☐ BuildingBuilding
☐ ☐ ViewingViewing
2. Source Data2. Source Data
Specify the location of the sourcesSpecify the location of the sources Clone existing collectionClone existing collection
Specify on a pull-down menu the existing Specify on a pull-down menu the existing collectioncollection
Create a completely new collectionCreate a completely new collection
2. Source Data2. Source Data
In the provided boxes, indicate In the provided boxes, indicate where Source Documents are where Source Documents are locatedlocated
Specification of sourcesSpecification of sources file://file:// http://http:// ftp://ftp://
file://file://
File name on the Greenstone server File name on the Greenstone server systemsystem That file will be included in collectionThat file will be included in collection
Directory name on the Greenstone Directory name on the Greenstone serverserver Everything in the folder and its Everything in the folder and its
subfolders will be includedsubfolders will be included
http://http://
Web pageWeb page The web page will be downloadedThe web page will be downloaded All pages it links to (and all pages they All pages it links to (and all pages they
link to) that reside on the same site, link to) that reside on the same site, below the URL, will also be downloadedbelow the URL, will also be downloaded
URL that leads to a list of filesURL that leads to a list of files Everything in the folder and its Everything in the folder and its
subfolders will be included in collectionsubfolders will be included in collection
ftp://ftp://
File to be downloaded using FTPFile to be downloaded using FTP Directory name on the FTP serverDirectory name on the FTP server
Downloads everything in the folder and Downloads everything in the folder and its subfoldersits subfolders
Collection Building StepsCollection Building Steps
☑ ☑ Collection InformationCollection Information
☑ ☑ Source DataSource Data
☐ ☐ ConfigurationConfiguration
☐ ☐ BuildingBuilding
☐ ☐ ViewingViewing
3. Configuration3. Configuration
This step can be bypassedThis step can be bypassed Allows adjustment of configuration Allows adjustment of configuration
optionsoptions The construction and presentation The construction and presentation
of all collections are controlled by of all collections are controlled by specifications in a special collection specifications in a special collection configuration fileconfiguration file
Collection Building StepsCollection Building Steps
☑ ☑ Collection InformationCollection Information
☑ ☑ Source DataSource Data
☑ ☑ ConfigurationConfiguration
☐ ☐ BuildingBuilding
☐ ☐ ViewingViewing
4. Building4. Building
The computer does the work of the The computer does the work of the building processbuilding process
Indexes are built:Indexes are built: For browsingFor browsing For searchingFor searching Following specifications in the Following specifications in the
collection configuration filecollection configuration file Status line shows progressStatus line shows progress Warnings shown if files can’t be Warnings shown if files can’t be
foundfound
Collection Building StepsCollection Building Steps
☑ ☑ Collection InformationCollection Information
☑ ☑ Source DataSource Data
☑ ☑ ConfigurationConfiguration
☑ ☑ BuildingBuilding
☐ ☐ ViewingViewing
5. Viewing5. Viewing
View the collection that has just View the collection that has just been createdbeen created
E-mail can be sent to the collection’s E-mail can be sent to the collection’s contact addresscontact address Must enable by editing Must enable by editing main.cfg main.cfg
configuration fileconfiguration file
Working with Existing Working with Existing CollectionsCollections
Add more material and rebuild the Add more material and rebuild the collectioncollection
Edit the configuration file to modify Edit the configuration file to modify the collection’s structurethe collection’s structure
Delete the collectionDelete the collection Put the collection on CD-ROMPut the collection on CD-ROM
Adding Material to a Adding Material to a CollectionCollection
Do not re-specify files that are Do not re-specify files that are already in the collectionalready in the collection Files would be included twiceFiles would be included twice
If the building process fails, the old If the building process fails, the old version remains unchangedversion remains unchanged
Structure of collection can be Structure of collection can be changedchanged Edit the configuration fileEdit the configuration file
May add plug-ins or an option to a plug-inMay add plug-ins or an option to a plug-in
Plug-ins & Document Plug-ins & Document FormatsFormats
Plug-ins are specified in the collection Plug-ins are specified in the collection configuration fileconfiguration file
File name determines document formatFile name determines document format Widely used document formats:Widely used document formats:
TEXTPlugHTMLPlugWORDPlugPDFPlug
PSPlugEMAILPlugZIPPlug
Text FilesText Files
TEXTPlug Plug-InTEXTPlug Plug-In *.txt*.txt *.text*.text
Plain text filePlain text file Title metadata based on the first line Title metadata based on the first line
of the fileof the file
HTML FilesHTML Files
HTMLPlug Plug-InHTMLPlug Plug-In *.htm*.htm *.html*.html .shtml.shtml .shm.shm .asp.asp .php.php .cgi.cgi
HTML FilesHTML Files
HTMLPlug Plug-InHTMLPlug Plug-In Imports HTML filesImports HTML files Title metadata extracted from the HTML Title metadata extracted from the HTML
<title> tag<title> tag Other HTML <meta> tag data can be Other HTML <meta> tag data can be
extractedextracted Parses and processes any links in the fileParses and processes any links in the file Links to other files in the collection are Links to other files in the collection are
trapped and replaced by references to the trapped and replaced by references to the documentdocument
HTML FilesHTML Files
file_is_urlfile_is_url Optional switch within the HTML plug-Optional switch within the HTML plug-
inin Causes URL metadata to be inserted Causes URL metadata to be inserted
into each document, based on the file-into each document, based on the file-name convention that is adopted by the name convention that is adopted by the mirroring package. The collection uses mirroring package. The collection uses this metadata to allow readers to refer this metadata to allow readers to refer to the original source material rather to the original source material rather than a local copythan a local copy
Microsoft Word FilesMicrosoft Word Files
WORDPlug Plug-InWORDPlug Plug-In *.doc*.doc
Imports Microsoft Word documentsImports Microsoft Word documents Greenstone uses independent Greenstone uses independent
programs to convert Word files to programs to convert Word files to HTMLHTML Many variants on the Word formatMany variants on the Word format Older Word formats use a simple text Older Word formats use a simple text
string extractionstring extraction
PDF FilesPDF Files
PDFPlug Plug-InPDFPlug Plug-In *.pdf*.pdf
Imports PDF FilesImports PDF Files Adobe’s Portable Document FormatAdobe’s Portable Document Format Greenstone uses independent Greenstone uses independent
programs to convert PDF files to programs to convert PDF files to HTMLHTML
PostScript FilesPostScript Files
PSPlug Plug-InPSPlug Plug-In *.ps*.ps
Imports PostScript FilesImports PostScript Files Works best when a standard Works best when a standard
conversion program is already conversion program is already installed on the computerinstalled on the computer
Uses simple text extraction algorithm Uses simple text extraction algorithm if no conversion program is presentif no conversion program is present
Email FilesEmail Files EMAILPlugEMAILPlug
*.email*.email Imports files containing emailImports files containing email
Each source is checked for e-mail contents Each source is checked for e-mail contents Extracts metadata:Extracts metadata:
SubjectSubject ToTo FromFrom DateDate
Deals with common formatsDeals with common formats Netscape, Eudora, Unix mail readersNetscape, Eudora, Unix mail readers
Compressed & Archived Compressed & Archived FilesFiles
ZIPPlug Plug-InZIPPlug Plug-In *.zip*.zip *.tar*.tar .gz.gz *.z*.z *.tgz*.tgz *.bz*.bz
Relies on standard utility programs Relies on standard utility programs being presentbeing present
Building Building Collections Collections ManuallyManually
Building a CollectionBuilding a Collection
Building a Collection:Building a Collection: The process of taking a set of The process of taking a set of
documents and metadata information documents and metadata information and creating all the indexes and data and creating all the indexes and data structures that support the searching, structures that support the searching, browsing, and viewing operations that browsing, and viewing operations that the collection offersthe collection offers
Building a CollectionBuilding a Collection
Four Phases in Building a CollectionFour Phases in Building a Collection MakeMake
Make a skeleton framework structure to contain the Make a skeleton framework structure to contain the collectioncollection
ImportImport Import the documents and metadata, convert to a Import the documents and metadata, convert to a
Greenstone standard formGreenstone standard form BuildBuild
Build the required indexes and data structuresBuild the required indexes and data structures InstallInstall
Make the collection operationalMake the collection operational
Building Collections Building Collections ManuallyManually
☐ ☐ Getting StartedGetting Started
☐ ☐ Making a framework for the collectionMaking a framework for the collection
☐ ☐ Importing the documentsImporting the documents
☐ ☐ Building the indexesBuilding the indexes
☐ ☐ Installing the collectionInstalling the collection
Getting StartedGetting Started
Locate the command promptLocate the command prompt Go to the directory where Greenstone Go to the directory where Greenstone
was installedwas installed cd “C:\Program Files\gsdl”cd “C:\Program Files\gsdl”
Tell system where to find Greenstone Tell system where to find Greenstone filesfiles setup.batsetup.bat
Sets the variable GSDLHOME to the Sets the variable GSDLHOME to the Greenstone home directoryGreenstone home directory
To return later To return later cd “%GSDLHOME%”cd “%GSDLHOME%”
Building Collections Building Collections ManuallyManually
☑ ☑ Getting StartedGetting Started
☐ ☐ Making a framework for the collectionMaking a framework for the collection
☐ ☐ Importing the documentsImporting the documents
☐ ☐ Building the indexesBuilding the indexes
☐ ☐ Installing the collectionInstalling the collection
Make a framework for the Make a framework for the collectioncollection
Use the Perl program Use the Perl program mkcol.pl mkcol.pl to to ‘make a collection’‘make a collection’
Get description of usage and Get description of usage and argumentsarguments perl –S mkcol.plperl –S mkcol.pl mkcol.plmkcol.pl
May leave off first part if system recognizes May leave off first part if system recognizes that .pl files are associated with Perlthat .pl files are associated with Perl
Make a framework for the Make a framework for the collectioncollection
perl –S mkcol.pl –creator perl –S mkcol.pl –creator emailAddress emailAddress collectionNamecollectionName
Make a framework for the Make a framework for the collectioncollection
Examine the file structureExamine the file structurecd “%cd “%GSDLHOMEGSDLHOME%\collect\%\collect\collectionNamecollectionName””
List directory contentsList directory contentsdirdir
Seven subdirectories are created:Seven subdirectories are created:archivesbuildingetc (contains collect.cfg file)
imagesimportindexperllib
Make a framework for the Make a framework for the collectioncollection
collect.cfg Filecollect.cfg File emailAddressemailAddress placed in the creator and placed in the creator and
maintainer linesmaintainer lines collectionNamecollectionName placed in collection- placed in collection-
meta linesmeta lines Plug-ins are insertedPlug-ins are inserted
Building Collections Building Collections ManuallyManually
☑ ☑ Getting StartedGetting Started
☑ ☑ Making a framework for the collectionMaking a framework for the collection
☐ ☐ Importing the documentsImporting the documents
☐ ☐ Building the indexesBuilding the indexes
☐ ☐ Installing the collectionInstalling the collection
Importing the documentsImporting the documents
The collection’s The collection’s importimport directory directory should contain the source materialshould contain the source material
Drag the directory containing the Drag the directory containing the source material into the source material into the importimport directorydirectory
You may drag several source You may drag several source directories and hierarchiesdirectories and hierarchies
Importing the documentsImporting the documents
The import process:The import process: Brings documents into the Greenstone Brings documents into the Greenstone
systemsystem Standardizes document formatStandardizes document format
(the way that metadata is specified)(the way that metadata is specified) Standardizes the file structureStandardizes the file structure
(that contains the documents)(that contains the documents)
Importing the documentsImporting the documents
To get a list of options for the import To get a list of options for the import program:program: perl –S import.plperl –S import.pl
The basic import command is:The basic import command is: perl –S import .pl perl –S import .pl collectionNamecollectionName
Importing the documentsImporting the documents
You may be in any directory when You may be in any directory when the the importimport command is issued command is issued The software works by knowing the The software works by knowing the
collection’s name and the Greenstone collection’s name and the Greenstone home directoryhome directory
Warnings may appearWarnings may appear When files are found without When files are found without
corresponding plug-inscorresponding plug-ins These files will be ignoredThese files will be ignored
Building Collections Building Collections ManuallyManually
☑ ☑ Getting StartedGetting Started
☑ ☑ Making a framework for the collectionMaking a framework for the collection
☑ ☑ Importing the documentsImporting the documents
☐ ☐ Building the indexesBuilding the indexes
☐ ☐ Installing the collectionInstalling the collection
Building the indexesBuilding the indexes
Use the program Use the program buildcol.plbuildcol.pl
Building the indexesBuilding the indexes
Modify Modify collect.cfgcollect.cfg file to customize file to customize the collection’s appearancethe collection’s appearance collectionnamecollectionname
Web browsers receive this name as the title Web browsers receive this name as the title of the collection’s front pageof the collection’s front page
collectionextracollectionextra Description of the collectionDescription of the collection Appears under “About this collection” on Appears under “About this collection” on
the collection’s home pagethe collection’s home page Enter as a single line in the editorEnter as a single line in the editor
Building the indexesBuilding the indexes
Modify Modify collect.cfgcollect.cfg file to customize the file to customize the collection’s appearancecollection’s appearance iconcollectioniconcollection
Give the collection an icon imageGive the collection an icon image Put the location of the image between quotesPut the location of the image between quotes If absent, the collection’s name will be usedIf absent, the collection’s name will be used Use _Use _httpprefix_httpprefix_ as a shorthand way of as a shorthand way of
beginning any URL that points within the beginning any URL that points within the Greenstone file areaGreenstone file area
Example:Example:_httpprevix_/collect/collectionName/images/icon.gif_httpprevix_/collect/collectionName/images/icon.gif
Building the indexesBuilding the indexes
To get a list of options for the build To get a list of options for the build program:program: perl –S buildcol.plperl –S buildcol.pl
The basic build command is:The basic build command is: perl –S buildcol .pl perl –S buildcol .pl collectionNamecollectionName
Building the indexesBuilding the indexes
The building process takes about a The building process takes about a minute on small collections and can minute on small collections and can take much longer for very large take much longer for very large collectionscollections
You may ignore most warning You may ignore most warning messagesmessages
Serious problems will cause the Serious problems will cause the program to terminateprogram to terminate
Building Collections Building Collections ManuallyManually
☑ ☑ Getting StartedGetting Started
☑ ☑ Making a framework for the collectionMaking a framework for the collection
☑ ☑ Importing the documentsImporting the documents
☑ ☑ Building the indexesBuilding the indexes
☐ ☐ Installing the collectionInstalling the collection
Installing the collectionInstalling the collection
Building is done in the Building is done in the buildingbuilding directory directory Collection must be moved to the Collection must be moved to the indexindex
directory before users can see itdirectory before users can see it Drag contents of the Drag contents of the buildingbuilding directory directory
to the to the indexindex directory directory If If indexindex already contains files, remove them already contains files, remove them
firstfirst Forgetting to move the contents of Forgetting to move the contents of
buildingbuilding to to indexindex is a common mistake is a common mistake
Installing the collectionInstalling the collection
To view the newly built collection:To view the newly built collection: Restart GreenstoneRestart Greenstone
If using the Local Library versionIf using the Local Library version Reload Greenstone Home PageReload Greenstone Home Page
If using the Web versionIf using the Web version
Importing and Importing and BuildingBuilding
General InformationGeneral Information
Two Main Parts to Collection Two Main Parts to Collection Building:Building: Importing (Importing (import.plimport.pl)) Building (Building (buildcol.plbuildcol.pl))
Files and DirectoriesFiles and Directories
Collection Specific Collection Specific DirectoriesDirectories
GSDLHOMEGSDLHOME collectcollect – all the digital library collections – all the digital library collections collectionNamecollectionName – directory of collection – directory of collection
importimport – original source material – original source materialarchivesarchives – result of import process – result of import processbuildingbuilding – temporary, contents manually moved to – temporary, contents manually moved to indexindexindexindex – bulk of info served to users – bulk of info served to users
((importimport, , archivesarchives and and buildingbuilding can be deleted) can be deleted)etcetc – contains – contains collect.cfg collect.cfg filefileimagesimages – icons used for the collection – icons used for the collectionperllibperllib – Perl programs specific to collection – Perl programs specific to collection
Other Greenstone Other Greenstone DirectoriesDirectories
GSDLHOMEGSDLHOME liblib – common software for both the collection server and – common software for both the collection server and
receptionistreceptionist binbin – programs used for building process – programs used for building process scriptscript – Perl programs used – Perl programs used
((mkcol.plmkcol.pl, , import.plimport.pl, , buildcol.plbuildcol.pl)) perllibperllib – Perl modules – Perl modules pluginsplugins – Perl plugins – Perl plugins classifyclassify – Perl classifiers – Perl classifiers cgi-bincgi-bin – Greenstone runtime system – Greenstone runtime system
(absent in Local Library version)(absent in Local Library version) srcsrc – source code in C++ – source code in C++ colservrcolservr – the collection server – the collection server recptrecpt – the receptionist – the receptionist
Other Greenstone Other Greenstone DirectoriesDirectories
GSDLHOMEGSDLHOME packagespackages – source code for external software packages used – source code for external software packages used
by Greenstoneby Greenstone(indexing and compression program, database (indexing and compression program, database
manager program, etc.)manager program, etc.)(each package is stored in a directory of its own (each package is stored in a directory of its own
with a readme file)with a readme file) binbin – executables – executables mappingsmappings – Unicode translation tables – Unicode translation tables etcetc – configuration files for the entire system, initialization – configuration files for the entire system, initialization
and error logs, user authorization databaseand error logs, user authorization database imagesimages – user interface images and icons – user interface images and icons macrosmacros – small code fragments that drive the user interface – small code fragments that drive the user interface tmptmp – temporary files – temporary files docsdocs – documentation for the system – documentation for the system
Object IdentifiersObject Identifiers Document’s permanent name in the Document’s permanent name in the
systemsystem Remain the same when collection rebuiltRemain the same when collection rebuilt Assigned by the import processAssigned by the import process Stored as an attribute in the document Stored as an attribute in the document
archive filearchive file Character strings starting with the letters Character strings starting with the letters
HASH (HASH0109d3850a6de440c4d1ca2)HASH (HASH0109d3850a6de440c4d1ca2) Used to name directory where archive file Used to name directory where archive file
is storedis stored
Plug-InsPlug-Ins Plug-ins do most of the work of the import processPlug-ins do most of the work of the import process Operate in the order in which they are listed in the Operate in the order in which they are listed in the collect.cfgcollect.cfg
filefile Input file is passed to each plug-in until one is found that can process Input file is passed to each plug-in until one is found that can process
itit If there is no plug-in that can process a file, a warning is If there is no plug-in that can process a file, a warning is
printedprinted Plug-ins determine the traversal of the subdirectory structure Plug-ins determine the traversal of the subdirectory structure
in the import directoryin the import directory
RecPlugRecPlug - processes directories, recurses through directory - processes directories, recurses through directory structures and passes the name through the plug-in liststructures and passes the name through the plug-in list
GAPlugGAPlug – processes Greenstone Archive Format documents – processes Greenstone Archive Format documents (in the archives directory structure)(in the archives directory structure)
ArcPlugArcPlug – used during building, processes list of document – used during building, processes list of document OIDs produced during import (list is stored in OIDs produced during import (list is stored in archives.infarchives.inf file)file)
The Import ProcessThe Import Process
The Import ProcessThe Import Process Brings documents and metadata into the system Brings documents and metadata into the system
in a standardized XML formin a standardized XML form Original material placed in Original material placed in importimport directory directory Import process transforms it to files in the Import process transforms it to files in the
archivesarchives directory directory The original material can be deletedThe original material can be deleted
Collection can be rebuilt from archive filesCollection can be rebuilt from archive files New material added to collection by placing it in New material added to collection by placing it in
importimport directory and re-executing the import directory and re-executing the import processprocess The new material finds it way into archives along with The new material finds it way into archives along with
existing filesexisting files To keep the source form of collectionsTo keep the source form of collections
Do not delete the archivesDo not delete the archives ““Source” form can be augmented and rebuilt laterSource” form can be augmented and rebuilt later
The Build ProcessThe Build Process
The Build ProcessThe Build Process
Creates the indexes and data structures Creates the indexes and data structures that make the collection operationalthat make the collection operational
Indexes for the whole collection are Indexes for the whole collection are built all at oncebuilt all at once Build process does not work incrementallyBuild process does not work incrementally Adding new material to Adding new material to archivesarchives requires requires
that entire collection be rebuilt (by issuing that entire collection be rebuilt (by issuing buildcol.plbuildcol.pl))
Most collections can be rebuilt overnightMost collections can be rebuilt overnight
Options for Import and Options for Import and BuildBuild
Additional Options for Additional Options for ImportImport
Additional Options for Additional Options for BuildBuild
Options for Import and Options for Import and BuildBuild
To see options for any Greenstone script, To see options for any Greenstone script, type its name at the command prompttype its name at the command prompt
Options for Import and Build help with Options for Import and Build help with debugging (see Table 6.5 on page 310):debugging (see Table 6.5 on page 310): verbosityverbosity archivedirarchivedir maxdocsmaxdocs collectdircollectdir outout keepoldkeepold debugdebug
Greenstone Greenstone Archive Archive
DocumentsDocuments
Greenstone Archive Greenstone Archive FormatFormat
<!DOCTYPE GreenstoneArchive [<!ELEMENT Section (Description,Content,Section*)><!ELEMENT Description (Metadata*)><!ELEMENT Content (#PCDATA)><!ELEMENT Metadata (#PCDATA)><ATTLIST Metadata name CDATA #REQUIRED>]>
Document MetadataDocument Metadata
Metadata – descriptive information Metadata – descriptive information about author, title, date and keywordsabout author, title, date and keywords
Stored with metadata nameStored with metadata name Stored at the beginning of the sectionStored at the beginning of the section Example:Example:
<Metadata name=“Title”>Freshwater <Metadata name=“Title”>Freshwater Resources in Arid Lands</Metadata>Resources in Arid Lands</Metadata>
Document MetadataDocument Metadata
Dublin Core – a metadata standardDublin Core – a metadata standard New metadata types can be inventedNew metadata types can be invented Metadata can be assigned by an Metadata can be assigned by an
automatic process rather than automatic process rather than manually enteredmanually entered
The Dublin CoreThe Dublin Core
Collection Collection Configuration Configuration
FileFile
Collection Configuration Collection Configuration FileFile
Default Configuration Default Configuration FileFile
Getting the Most Getting the Most Out of Your Out of Your DocumentsDocuments
Basic Plug-In OptionsBasic Plug-In Options
Document Processing Document Processing Plug-insPlug-ins
Document Processing Document Processing Plug-insPlug-ins
Document Processing Document Processing Plug-insPlug-ins
Assigning Metadata from Assigning Metadata from a Filea File
XML Document Type Definition XML Document Type Definition (DTD)(DTD)
Example XML Metadata FileExample XML Metadata File
Document Type Definition Document Type Definition (DTD)(DTD)
<!DOCTYPE GreenstoneDirectoryMetadata [<!ELEMENT DirectoryMetadata (FileSet*)><!ELEMENT FileSet (FileName+,Description)><!ELEMENT FileName (#PCDATA)><!ELEMENT Description (Metadata*)><!ELEMENT Metadata (#PCDATA)><ATTLIST Metadata name CDATA #REQUIRED><ATTLIST Metadata mode (accumulate|override) "override">]>
Example XML Metadata Example XML Metadata FileFile<?xml version="1.0" ?>
<!DOCTYPE GreenstoneDirectoryMetadata SYSTEM"http://greenstone.org/dtd/GreenstoneDirectoryMetadata/1.0/GreenstoneDirectoryMetadata.dtd"><DirectoryMetadata><FileSet><FileName>nugget.*</FileName><Description><Metadata name="Title">Nugget Point Lighthouse</Metadata><Metadata name="Place" mode="accumulate">Nugget Point</Metadata></Description></FileSet><FileSet><FileName>nugget-point-1.jpg</FileName><Description><Metadata name="Title">Nugget Point Lighthouse</Metadata><Metadata name="Subject">Lighthouse</Metadata></Description></FileSet></DirectoryMetadata>
Tagging Document FilesTagging Document Files<!--<Section><Description><Metadata name="Title"> Realizing human rights for poorpeople: Strategies for achieving the internationaldevelopment targets </Metadata></Description>-->(text of section goes here)<!--</Section>-->
ClassifiersClassifiers
Format StatementsFormat Statements
Format StatementsFormat Statements
Examples of Format Examples of Format StringsStrings