Upload
edgar-chambers
View
221
Download
0
Tags:
Embed Size (px)
Citation preview
Automation of documentary information Definitions :
information : data with structure and meaning which have the potential to change the receiver’s knowledge
documentary : recorded knowledge, considered worth to be kept on a time- and space-persistent carrier
Automation : using the computer to optimize
management; in documentary systems : mainly retrieval of information
Automation of documentary information D.I. = info kept in ‘documents’ with a
mainly textual contents component variant : documents can also represent
real documents : e.g. bibliographic records, with or without abstracts
automation criteria : professional standards flexibility in structure words-recognition powerful formatting functions …
Automation of documentary information Professional standards :
ISO-2709 file format MARC (subfields, repeatable fields…) XML (!!) -> gaining importance
variable length and structure records e.g. cfr. XML : capable of dealing with
semi-structured text-entities (document elements are optional, repeatable, extensible, defineable…)
Automation of documentary information Typical software characteristics :
word recognition (<>field) flexible structures Inverted File (addressbook of all words) support additional features : stopwords, equivalency
lists, thesaurus (knowledge systems…), strong formatting, web-capabilities etc.
<> ‘library systems’ = integrated systems for catalogues and administrative data (e.g. loans)
The ISIS-environment Why ISIS ?
professional, academic concept by UNESCO (UNISIST, CCF, MARC…)
educational qualities multi-platform (DOS, Windows, Unix,
WWW, JAVA) non-commercial software (free) world-wide users’ community
(especially Latin-America, Eastern Europe, Asia…)
ISIS-history (1) I.L.O. 70’s: merging of ‘C.D.S’ and ‘I.S.I.S.’ UNESCO :
1985 : produces PC-version ‘Micro CDS/ISIS’ for DOS (as a series of separate softwares)
version 2.0 (1988) : added ISIS/Pascal 1990’s : version 2.3 (integrated menu system) 1992 : v.3.0 (networking, multi-user), new :
Unix-version 1998 : Winisis (graphical interface, hyperlinks) 2001-2004 : current version 1.5 (a.o. XML,
wizards)
ISIS-history (2) BIREME (Sao Paulo, Brazil) :
1995 : CISIS, a set of command-line utilities for ISIS-database management
1997 : ISIS-DLL, a programming library for graphical operating systems
1998 : WWWISIS : a server for CGI-web applications
2000 : WXIS = WWWISIS version 4.0
ISIS-history (3) Italy + UNESCO : JAVAISIS
JAVA : platform-independent programming language (RTE : free)
JAVAISIS allows to access (remote) ISIS-databases (based on BIREME WWWISIS-server)
interface : ‘copies’ WinISIS looks-’n-feel now : searching + basic data-entry
functions are available, but slow
ISIS-history (4) OPENISIS : re-programming ISIS using the
‘open source’ philosophy (OpenISIS Verein, http://www.openisis.org) 2002 : first initiatives 2003 : first results (but reworking needed) 2004 : version 1.0 expected
UNESCO also prepares to release ISIS-software as an open-source software : source code available, non-commercial best known example : Linux O.S. co-ordination and tight control still necessary, to
maintain the standards and a certain unity
ISIS history (5) : WEBLIS Since 2003 : fully integrated, web-based
library system based on ISIS is available from FAO Based on ISIS.DLL and www-isis.exe as a
dedicated database-server Functions :
simple, advanced and thesaurus-based searching
Data entry with lists and validation Loans circulation with advanced features
The basic concepts (1) ISO-2709 : a format to precisely describe
(bibliographic) records for transport between systems the ‘header’ and directory : fixed-format
numerical description of the record (how long, which fields/lenghts), start-position of text-content and field tags
the text-contents concatenated with a separator character
the records separated by a record-separator used as exchange format e.g. for all MARC-
tapes
ISO-2709 example record :00846000000000277000450000100310000000400040003102300090003512001140004400
3000200158005000200160100001100162100001100173109003300184121006700217122001700284123001500301600000500316220013000321200001700451240004200468250001000510324002800520
332001000548343000500558350000500563#ABTASSOCIATESINC./AGRICULTUR#AMS#19951205
#ConductingPan‑Europeanresearch:apreliminaryevaluationofanewmethodologyforEuropeanaquacultureresearch#B#K#Shaw,S.A.#Bailly,D.#^aUniv.Strathclyde^bGlasgow^cUK#3.Annu.Conf.oftheEuropeanAssociationofFisheriesEconomists#Dublin(Ireland)#10‑12Apr1991#^aen#ProceedingsofthethirdAnnualConferenceoftheEuropeanAssociationofFisheriesEconomists,Dublin,Ireland,10‑12April1991#Hillis,J.P.^ed.#^aDublin(Ireland)^bTheStationeryOffice#^p163‑175#Ir.Fish.Invest.[B.Mar.]#0578‑7467#1994#^i42#~
MARC-records ‘Machine Readable Catalogues’ (IFLA)
for international standardization of bibliographic data (-> exchange)
based on ISO-2709 file format field-tags 1-999 (defined by
implementation, e.g. UNIMARC, CCF…) variable fields and field lengths subfields, e.g. ^ade Smet^bEgbert
Inverted File support the IF contains all searchable elements
(terms) , sorted alphabetically, and their positions within the database: which record which field which occurrence of that field which position in that field (word-counting for
proximity retrieval) the IF in fact represents ‘all possible
searches’ already done - except for Boolean set-combinations (AND/OR/NOT) - and saved
ISIS-structures : the MST the database itself is a binary ISO-
2709 file with all the records concatenated no separators, binary header records can be active or non-active
(e.g. changed, logically deleted…) all new records are appended at the
end of the file, which always grows (-> needs ‘compacting’)
ISIS-structures : XRF the Cross-Reference file is a ‘first-phase’
normal index (list of pointers, i.e. relative addresses sequence with fixed records) to the records in the MST for fast access of records
an XRF-file completes a basic ISIS-database M+X = ISIS, cfr. CISIS-tool ‘MX’ (Bireme) can be reconstructed if absent by special
tools (e.g. UNESCO, Bireme)
ISIS-structures an ISIS-database = at least one
combination of MST + XRF plus IF all other files are :
supporting (e.g. Field Definition Table, data-entry FMT’s, presentation PFT’s…) or
optional (e.g. stopwords .STW, equivalency list .ANY) or
derived from MST : the Inverted File-components, defined by FST
ISIS-structures The Inverted File :
‘nodes’ and ‘leaves’ of a B-Tree organised (for quick positioning in very big files) table, separate for ‘short’ and ‘long’ entries
.L01 and .L02 .N01 and .N02
an index on the B-Tree (.CNT) the postings-file : .IFP, containing all
alphabetically sorted entries with their postings (record, field, occurrence, position)
temporary files for sorting (in WORK-directory)
The FDT Listing of :
data-entry worksheets (FMT) IF-definitions (FST) presentation formats (PFT) the actual fields (tag, name, length,
type and repeatability) mandatory in CDS/ISIS, WinISIS,
JAVAISIS but not in other family members
The Field Selection Table defines the strings to be put into the
IF three columns :
1 : identifier (a real or ‘alias’ tag) 2 : a method : 8 types (per field,
delimiters <> or /, per word, with or without prefixes)
3 : the extraction format, following the ISIS-Formatting Language (with all features of it, incl. ISIS/Pascal programs)
SYSPAR.PAR All ‘settings’ of the software, some
of them can be set using ‘configuration’
Line 5 : data folder -> either the database-files or their referral files DBN.PAR
In WinISIS : very much used CISIS : CIPAR.CPR
The Formatting Language = the real nucleus of the ISIS-software and its
main development tool defines which strings are produced as either :
database-values (taken from the fields); can be ‘processed’ values from the database, e.g. fields combined, computations, even data taken from other ISIS-databases by ‘REF-’function
or ‘literals’ : quoted strings, e.g. HTML- or XML-tags
in WinISIS : hyperlinks are added to enrich the presentation graphically and functionally
The Formatting Language FL is used in 5 basic ISIS-functions :
‘normal’ output of the database : display on screen or printing or ‘dumping’ in file
to define the strings extracted for the IF to convert values while importing or
exporting records from/to ISO-2709 using a ‘reformatting’ PFT
for sorting records for validation of data entry
FL basics 3 possible element types :
Vx : values from the fields, can be ‘processed’, even by a program (format exit)
Literals : texts to display Mode commands and links
Literals : 3 types ‘unconditional’ e.g. ‘ID=‘v1 |conditional| (with + : not last/first) e.g. |
Remarks: |+v500 “repeatable” e.g. (“Author: “v300)-> only
once
FL basics Mode commands : p/d/h
P: proofreading D: data mode (with . and space) H: heading mode -> end user
l/u : lower/uppercase {i/b/f/fs …} : italics, bold, fonts,
fontsize etc.
FL basics : logical routing IF condition THEN statements ELSE
other statements FI – nesting possible
Select Vx case ‘value’ : statements case ‘value’ : statements
elsecase : statements Endsel
FL basics : hyperLINKs LINK((‘prompt’),`COMMAND ‘,format)
Commands : e.g. OPENFILE, TEXTBOX E.g.
LINK((‘website’),(OPENFILE ),v445) LINK((‘abstract’),(TEXTBOX ),v600)
Textboxes can also contain images, e.g. link((‘Show'),'TEXTBOXIMG ',v50)
Can be ‘RCHILD’ -> closing with recorde.g. ‘TEXTBOXIMGRCHILD ‘,v50
Assignment 1 : Install WinISIS on your PC Use the WinISIS-wizard to produce :
a FDT for a catalogue database a data-entry worksheet with 1 picklist one simple IF-FST at least one standard PFT
Enter a few sample bibliographic records
Assignment 2 : FL enter records with 1) repeatable fields
2) subfields 3) date field etc. Install ‘ASFA’-database apply basic FL commands to display,
using WinISIS PFT-editor, e.g. starting from ‘decorated’ PFT : literals (uncond. ‘ ‘ ; repeat. “ “ ; cond. | |) fs, cl, i, b, box() etc. modes : mpl, mhl, mdl
Assignment 3 : elaboration Create 2 new databases :
For users, I.e. names/addresses For ‘loans transactions’
Enter some data in both new databases
Develop PFT’s for bibliographic display with loans info, using ‘REF(L())’ functions
Assignment 4 : recapitulation Create new database ‘ABSTRA’ with only
two fields : v1 (title) and v2 (abstract), indexing on full title and all abstract words
export title- and abstract fields from ASFA-database using export FST
import into ABSTRA write ASFA-PFT to display abstract in
separate textbox, using ABSTRA as external database with REF-function (or write ABSTRA-PFT to display e.g. authors from ASFA-database)
use ABSTRA-database to test search techniques