Upload
wilfrid-hicks
View
222
Download
0
Embed Size (px)
Citation preview
Greenstone
Building your own collection
• Overview
• Installation
• Usage
• Building a collection
What is Greenstone?A suite of software which has the ability to serve
digital library collections and build new collections.
It provides a new way of organizing information and publishing it on the Internet or on CD-ROM.
Ways to find information Searching
Ex, search for particular words in the text “Full-text search” Indexes built from different parts of the document
Browsing Ex, browse document by titles Involves lists, classification
MetadataMetadata are descriptive data associated with each document.
For ex,
<Metadata name="PictureN">boon.jpg</Metadata> <Metadata name="Height">137feet</Metadata> <Metadata name="Date">1852</Metadata> <Metadata name="State">Maine</Metadata> <Metadata name="Title">Boon Island Light</Metadata>
• Can be used as searchable index• Used to generate the browsing structures (lists or hierarchical structures) through “classifiers”
Greenstone Document Format
•XML format:Source documents are converted into standard XML format by “plugins.” Plugins can process plain text, HTML, WORD, and PDF documents, and email messages.
•Multimedia documentsEither linked to the textual document or accompanied by textual descriptions.
•Multilanguage documentsUnicode to represent the character sets for consistency
Why using Greenstone? Forget cgi programming Built-in server GUI is provided Easy to use Making large collection in a short time
becomes possible
Installation Download from the www.greenstone.org page. Platform: Windows or Unix system. local library or web library?
Local library has a built-in webserver. Web library
Configure the external webserver Point to URL of Greenstone's library executable, like
http://localhost/gsdl/cgi-bin/library.exe "Enter Library" or "Restricted Version”?
“Restricted Version” used only when networking software has been installed incorrectly. Windows keeps attempting to dial up your internet service
provider. “Restricted Version” must use a Netscape web browser.
Using Greenstone Searching and Browsing
punctuations are ignored in search terms Query types --- “all” and “some” Icon meanings Setting the perferences
sensitivity, stemming, Boolean queries Change language Change presentation
BUILDING A COLLETION
Using "the Collector" easy to use builds collections based on the existing
collection with new content Not feasible to use the “collector” alone to
create collections with completely new structures
Building from command line is preferable
Step by step instructions 1. Change to the correct directory > cd “C:\Program Files\gsdl”
2. Invoke setup.bat, which is needed for each new DOS session > setup.bat
3. Make a collection > perl –S mkcol.pl –creator [email protected] Lhouses
Lhouses is the collection name. Now you have a new collection directory called Lhouses.
4. Populate the collection
Copy documents into the Lhouses collection’s import directory. This is can be done through copy and paste using Windows Explorer. Or, on the command line, type
> cd "%GSDLHOME%\collect\Lhouses”
> xcopy /s document_path\* import
If you have stored all the documents in C:\My Document\LHCollection, then document_path is C:\My Document\LHCollection.
5. Import the collection
> perl –S import.pl Lhouses
6. Edit collect.cfg file
It is the configuration file for the collection, which is in the collection’s etc directory.
Give the collection a name through collectionmeta collectionname
Add a description of your collection through collectionmeta collectionextra "barabara…".
Add a collection icon through collectionmeta iconcollection “_httpprefix_/collect/Lhouses/images/icon.gif” If the image is in the collection.s images directory
=> collect.cfg
7. Build the collection
> perl –S buildcol.pl Lhouses
8. Make the collection available over the web
Either select the contents of the building directory and drag them into the index directory.
Or, remove the index directory (and all its contents) by typing
rd /s index # on Windows NT/2000
deltree /Y index # on Windows 95/98
and then change the name of the building directory to index with
ren building index
Finally, mkdir building
Unix commandscd ~/gsdl # assuming default Greenstone in home directory
source setup.bash # if you.re running the BASH shell
source setup.csh # if you.re running the C shell
mkcol.pl .creator [email protected] Lhousescd $GSDLHOME/collect/Lhousescp .r document_path/* import/import.pl dlpeoplebuildcol.pl dlpeoplerm -r index/*mv building/* index
The import process converts documents of various formats into Greenstone Archive Format. Import.pl needs to know what plugins are to be used. Plugins parse the imported documents and extract metadata from them. See ex.
The build process compresses the text, builds full-text indexes according to the collect.cfg, and precalculates the appearance of the collection.
Import and Build processes
Assigning Metadata from a file and build search indexes assign metadata from a single file, metadata.xml Make sure the plugin RecPlug is included in the
collect.cfg and the use_metadata_files option is set. add searching indexes, see collect.cfg move metadata.xml to the import directory Import the collection again and rebuilt. >perl –S import.pl Lhouses
>perl –S buildcol.pl Lhouses
>rd /s index (or deltree /Y index)
>ren building index
>mkdir building
Create Browsing Indexes Through Classifiers
•Vlist, Hlist, Datelist
•classifiers contain a metadata argument, by which the documents are classified and sorted. See collect.cfg
•For hierarchy classifier, it needs a classification file, which defines the metadata hierarchy. Three parts: Identifier, Position-in-hierarchy, name of the classification.
For ex, subheight.txt and substat.txt
•the classification file are put into the etc directory, rebuild (>perl –S buildcol.pl Lhouses, then rd /s index or deltree /Y index, and ren building index, finally, mkdir building)
Formatting Output Format the document Format the lists produced by classifiers and searches
Add format strings to collect.cfg
Then rebuild.
Another way of assigning metadata assigning metadata from a file called index.txt, using the plugin indexplug, see collect.cfg
Put index.txt in the import directory. Modify collect.cfg. Then re-import and rebuild the collection.
References: 1. Greenstone Installation Guide,
2. Greenstone Users’ Guide,
3. Greenstone Developers’ Guide,
4. Documentations from “Light Houses” Group of CPSC 670, Fall 2001.