AUNG SAN U. - P2 InfoHouse · 2018. 6. 13. · AUNG SAN U. MATERIALS ENGINEER CODE 342 BLDG 469 (619) 545-9751 Fax (619) 545-7810 . AN EXPERT INFOBASE FOR HAZARDOUS WASTE MINIMIZATION

NAVAL AVIATION DEPOT NORTH ISLAND

SAN DIEGO CA 92135

AUNG SAN U. MATERIALS ENGINEER CODE 342 BLDG 469

(619) 545-9751 Fax (619) 545-7810

AN EXPERT INFOBASE FOR HAZARDOUS WASTE MINIMIZATION IN AIRCRAFT REPAIR AND REWORK

Materials Engineering Laboratory Naval Aviation Depot, North Island, San Diego

California 92 135

1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0

-- PART 1 -- INTRODUCTION NADEP NORTH ISLAND HAZMIN INFOBASE HAZMIN INFOBASE'S INFOSTRUCTURE THE BASIC SOFTWARE SET PARADIGM SHIFTING FOR THE EXPERT INFOBASE SOME CHARACTERISTICS OF THE EXPERT INFOBASE SYSTEM

COTERMINOUS WITH PROCEDURE, WITH PROCESS, AND WITH INFOBASE

END-USER INTERFACE DESIGN OPTIONS

REFERENCES

-- PART 2 -- 9.0

loi;.i 10.2 10.3 10.4

11.1 11.2

11.0

12.0

13.0 14.0 15.0 16.0 17.0 18.0 19.0 20.0 21.0 22.0

BASIC BUILDING BLOCKS OF THE HYGEN EXPERT INFOBASE SEARCH INDEXING WORDS AND KEY PHRASES STAGE 1 STAGE 2 STAGE 3 STAGE 4

STAGE 1 STAGE 2

LOOKUP INDEXING FOR WORDS/PHRASES OF A SPECIFIC INTEREST

DESIGN AND IMPLEMENTATION OF HYPERTEXT RECORDS WITH REPEATING GROUP OF INFO WITHIN ANOTHER REPEATING INFO GROUP OBJECT-ORIENTED CONSIDERATIONS SPLITTING FILES GENERATING "SLIDING COLUMNS" EXPERT SYSTEMS EXAMPLE OF A PORTFOLIO -- INFORMATION BY THE SHOP GENERATING KEY PHRASES FOR SEARCH INDEXING TRANSLATOR PROGRAMS CROSS-RELATING DATABASE DATA TO TEXT, AND VICE VERSA DATA STANDARDIZATION, ERROR CORRECTION, ETC. INFOBASE SYSTEM DEVELOPMENT ENVIRONMENT CONCLUSION

1 .O INTRODUCTION

The need to electronically manage our aircraft maintenance directives and processing procedures that were generated over many years and number in the thousands arose, when it became necessary to identify the California and Federal Environmental Protection Agency (EPA) restricted materials authorized for shop use by these documents. Our digitized engineering documents are in text and graphics format; the state or the federal restricted materials are text or database data files that we received from external organizations; Materials Safety Data Sheet (MSDS) Information is downloaded from periodically updated CD-ROMs; and the shops HAZMAT (HAZardous MATerials) Authorized User List (HAUL) data arrive periodically from a main-frame as comma delimited ASCII downloads. The information administrative system that we need must therefore be able to integrate all these diverse information and data sources and in addition, have a high usability for the end-users. Essentially, this is a requirement to integrate text, graphics and database info at an information organization level beyond that of the basic relational database approach.

The Expert Infobase System technology that we will be describing here, successfully meets all the above requirements and can also make use of data in existing relational databases, unlike object databases (that claim superior data modelling capability to relational databases) which do not provide easy interface to directly import and work with those data. The relational functionality remains intact for the Expert Infobase, while many object-oriented like properties are inclusive and can efficiently and intuitively handle complex data models that may be highly denormalized. Relational databases become unwieldy or use matrix data management approaches that are counter-intuitive for highly denormalized data models. Additionally, the Expert Infobase can handle info such as text and graphic files. We will also be describing how database info can be synthesized to be used hypertextially. Although the word "expert" is used within its name, the technology, being in a way a superset of the familiar information management technologies, is very general and can be applied to many new types of administrative system.

We will be describing in detail, our Naval Aviation Depot (NADEP), North Islands Hazardous Waste Minimization (HAZMIN) Infobase. Its overall objective is to manage engineering and materials documentation necessary for the HAZMIN program with appropriateness, correctness, comprehensiveness, completeness and timeliness. Specific objectives are to maintain complete and up-to-date engineering and materials documents; to cross-reference documents with EPA Restricted Materials; to clarify the industrial process flow to the resolution of individual process operations; to provide complete and up-to-date process and process operation information; and to identify material usage to generation of waste per process operation. The HAZMIN Infobase was designed for use by material engineers, industrial engineers, and production personnel by an end-user department. The 3

end-users need was what drove the product, and the product drove the technology.

Part 1 describes the concept and general structure of the Infobase. Those who had already seen how the HAZMIN Infobase functioned, will find it easier to follow the topics described within this document. Part 2 of this document deals with details of the technology and the basic building blocks of the Expert Infobase. In some cases, the reader is expected to be familiar with the software under discussion. However, even those readers who are not familiar with the software involved, should still obtain an increased understanding of the Infobase concepts that are mentioned in Part 1.

3

2.0 NADEP NORTH ISLAND HAZMIN INFOBASE

The HAZMIN Infobase can provide information organized by shop, by HAUL and associated MSDSs, and by Process Operation. For updating, only the appropriate head database table or text document need be updated, and then the programs that were developed for automatic integration of various info into the system will do the rest. Material information standardization requirements for the HAZMIN Infobase were reviewed by Toomer et. al. Ipefil]. The infobase can be explored using four arrow keys -- up, down, left and right.

....................................................................

Our prime need is to be able to identify all the state and federal safety and environmentally restricted materials mentioned in our directives and procedures for repairing and reworking of aircraft. Many of these were prepared prior to the EPA material restrictions. Using the Expert Infobase technology described in a later section, we can instantaneously search and identify, using a 386 33 MHz or better PC, all the occurrences of the EPA defmed hazardous materials and their corresponding Chemical Abstract Service (CAS) registration numbers within the digitized text documents. Details regarding SEARCH indexing procedures are given in Section 10.0.

The LOOKUP indexing capability of HyGEN gives us the ability to fiid as a group, all the synonyms, abbreviations, acronyms, equivalent codes, etc., for the various restricted chemicals and materials within the directives, the procedures, the HAUL for the shops, and for all the MSDSs associated with the HAUL items. The restricted materials data are setup within databases and we can automatically update the Infobase after updating the appropriate databases. More details about indexed LOOKUP arrangements can be found in Section 11.0.

Information such as the HAUL is very difficult to maintain error-free unless it can be compared against other related information, and requires data maintenance on an on-going basis. Appropriateness and correctness of information may show up only when reviewed within a certain context. This usually means further information integration is required and the info sources may be from databases and as well as from text documents. The complexity of the Infobase can be better understood by examining its infostructure.

3.0 HAZMIN INFOBASES INFOSTRUCTURE

An appropriate and correct overall Infostructure is essential for clearly understanding the organization of the engineering and materials documentation, for the efficient collection of relevant information, for the structuring requirements of the matching software programming, for the automatic reintegration of information, and for the system maintenance and enhancement requirements. It should broadly reflect the integrated real-world info models involved with our HAZMIN program. Each of these info models need to clearly identified and sufficiently defined to the necessary level of detail for correctly implementing the software program structure for the Infobase. The info modelling capabilities of the software chosen to implement the program structure must therefore be capable of effective realization of the real-world info models involved.

.............................................................

Figure 1 is the overall schematic diagram of the infostructure of our HAZMIN Infobase. The infobase contains those engineering directives that are generated locally: Local Engineering Specifications (LESs) and Local Process Specifications (LPSs). The bottom layers are involved with LESs, tech drawings, manuals, etc., that drive the processing to be performed. However, these documents by themselves may not be able to provide sufficiently detail directions at the process operations' level.

LPSs may have to be generated to supplement existing directions. The real-world info models at these layers below the shaded layer, are documents containing black and white text and graphical line drawings. Even ordinary infobases can be organized for this kind of information. The object life of these documents is usually in terms of years before further revisions are made. -) The shaded layer of the infostructure deals with process operations. Each industrial process such as chromic plating is partitioned into a series of process operation steps. The partitionings are designed to coincide with the operation of major process-units such as plating tanks or paint spraying booths. Also, the resolution of the partitioning for a process operation must be sufficient to relate all incoming materials to the waste generated. This level of partitioning can also be used for step-by-step detail documentation of operational procedures for the operators and processing units.

Each process operation should have the list of materials to be used with their National Inventory Identification Numbers (NIINs) and the list of references authorizing their use. The HAUL so far mentioned is the "empirical" HAUL. The data for it is provided by each shop's request for approval for using restricted materials for their processing. To check the validity of the claimed HAUL items, we need to generate the "compiled HAUL." Compiled HAUL will have the listing of all the NIIN identified materials listed within all the opspecs (short for process operation specifications). From the sum total of process operations for a shop, we can determine the list of hazardous materials truly needed for processing in that shop. Other materials within the HAUL for that shop should only be for repair and maintenance of equipment or other facility needs. Any other unaccountable HAUL item is to be deleted. Once this data scrubbing is done, the first reality check is completed. This check should be 1 done for every shop.

-L9/

ACTUAL WASTE I

Base-lined WASTE OUT for each of all Runs \ HAZMAT Authorized Users List

*%- \ \ R&M/Fac i 1 i ty 1 \

~@.

\ ----

I LESS Bt TEIS & DRAWINGS & MANUALS DIRECTIUES~PROCEDURES~OPERATIONSIfiPPROUED HAZMATS/PROJECTED WASTE

Figure 1 -- Schematic diagram of HAZMIN's Infostructure. The arcs indicate data pathways. Number 1 and 2 indicate reality checking levels. The Infobase is implemented to level 1 thus far.

The real-world info model of an industrial process operation should include a process flow diagram with a series of process operation steps with IDS and nomenclatures for the steps. An opspec is a textbase record, sometimes includes graphics info. A textbase record is usually loosely structured when compared to a database record and can have explicit as well as implicit field names (can allocate proximity character set as field name) and field values of several lines of textual info. In addition, it may be partially or totally treated as a free-form text. Examples of textbase software are the Personal Information Managers (PIMs). In most cases, textbases do not have relational capabilities, or if they do, they are limited and slow. The object life of an opspec is dependent on its parent industrial process and may be revised whenever the parent process is revised. If that process is reasonably stable, changes become necessary mainly with technological improvements or regulatory changes, and these events are usually in terms of years. But since the opspec information is linked to shop, HAUL, and reference data, any change in those data will shorten its object life. The info system of choice must be able to detect whenever those changes occur and react appropriately.

Figure 1 also shows the tank solutions and materials data as information elements of the process operations. All hazardous materials used should have their NIINs identified. From the NIIN and SHOP data, the associated HAUL item can be identified. Next to identify are the MSDSs associated with each HAUL item and the ability to hyperlink and access those MSDSs. The relationship of HAUL item to MSDS is one-to-many. Although in most cases it is one-to-one, there is often one HAUL item to three or four MSDSs depending on if there are three or four product parts for the HAUL item. HAUL is product level data, whereas MSDS is data at product's part level. Within each MSDS are ingredient data of the chemicals used. Since there can be more than one chemical ingredient per product's part, the ingredient data group will be repeating for as many different chemicals. Thus, MSDS has repeating ingredient data groups, which are within the repeating MSDS data groups of HAUL.

Our HAUL item is uniquely defined by the field values of the SHOP[, BLDG[ (building), NIIN[ and CAGE[. SHOP[ and BLDG[ codes (values) are internal to the organization. Field names ending with "1[)1 are compatible for use within textbase records. CAGE represents the manufacturer identification and is standardized by an external organization, as are the NIIN and the MSDS. In the Department of Defense, MSDSs are uniquely defined by their five character Serial Number. Each Serial Number is uniquely defied by the NIIN, CAGE and PART NUMBER INDICATOR. There can be MSDSs with several different Serial Numbers but having the same NIIN and CAGE, and with differing PART NUMBER INDICATOR values. This is because a product defiied by the NIIN can have more than one product parts and indicated as A, B, C ... etc. The number of product parts for a given NIIN can differ among different manufacturers.

The real-world data model for HAUL can be represented by a relational database model. However, the one-to-many relationship of HAUL to MSDSs, especially when full-set MSDSs are several screens in length, requires a design solution for efficient representation of the combined info. The real-world info model for the MSDS is like a textbase record and must also somehow be related to a HAUL database. It requires careful designing for an effective HAUL and MSDS layers Infobase model, and the program structure for managing the idormation. An ideal design would provide intuitive and "bullet proof" interface for the

-~-) *-u

end-users. Section 12.0 will discuss the design and implementation of the Expert Infobase 1> model. Not shown in the infostructure diagram are the extra dimensions required for the restricted materials listings of the Federal and California EPAs. We want these info to be fully and thoroughly identifiable within the HAUL, MSDS, Process Operations, LPS and LESS info layers of the infostructure.

4.0 THE BASIC SOFTWARE SET

The Expert Infobase system is an integration of graphics, infobase, textbase, database and expert system. The integration approach is hypertext centric and the benefits of this kind of approach will be summarized below. Each of the main application software used will be briefly reviewed.

Ordinary infobases are capable of electronic publication of text and graphics with word indexing; they do not necessarily have the capability of phrase indexing, essential for cross-relating with databases. The Expert Infobase can index phrases and thus can cross-relate with databases. In addition, semantic networks of specific group of words and phrases can be established.

Relational databases can handle normalized data models efficiently, but rapidly become unwieldy for many denormalized real-world info models, which additionally may contain text and graphic information. The Expert Infobase can manage denormalized info models efficiently and intuitively. It can synthesize database info into hypertext files. An additional advantage for converting database data to hypertext info, is that, rule-based expert systems may be generated to provide instantaneous multiple comparison of options and alternatives based on the key fields and to rapidly provide multiple perspectives for better situational awareness.

To convert textbase and database report outputs into hypertext reports, three types of objects may be required to be automatically generated. They are:

m Textual report files (with the fields of interest) from database or textbase that contains database or textbase records as text records. These records can then be processed into hypertext records. The file is then split with each hypertext record essentially ending up as a separate text file.

The records within the database or textbase to be converted into the hypertext records as above, identified with an appropriate file name. The file name should be traceable to the unique identification of each hypertext record, and for DOS, should not exceed eleven characters and preferably less than nine. One way to do this is to add a field to each record with the information regarding the corresponding (ASCII) angle bracket hyperjump. Each hyperjump then should contain the file name to be later assigned to a corresponding hypertext 3

record.

Menus listing hyperjumps that will be automatically generated with each reintegration procedure.

Textbases themselves usually can process several sentences of field values, but most of them perform poorly, if at all as relational databases. But a textbase in combination with a suitable relational database can have the best of both worlds. Although this combinational usage itself is independent of being part of the Expert Infobase technology, it is nevertheless, incorporated within the realm of the Expert Infobase technology. Moreover, the Expert Infobase technology approach provides the extra possibility of the database and textbase textual reports hypertextually interacting with each other.

Table 1 gives the particulars of the software set mentioned in this document. We may now review the major software used to implement the Expert Infobase. HyGEN was designed as a general purpose hypertext system by MAXTHINK Inc. HyGEN can successfully and synergistically integrate a select set of off-the-shelf software with certain specific capabilities that can give the system, relational capabilities of not just database to database, but also database to text, and other cross-relating capabilities.

HyGEN possesses a set of properties essential for integrating all the above application programs' generated objects. These properties are:

1. HyGEN works with ASCII. Many application programs such as word processors, databases, spreadsheets, etc., can import and export data/info in ASCII. They can therefore process back and forth the data/info between the application programs as needed and output files that are suitable for coordination by HyGEN.

-)

2. HyGEN has a SEARCH indexing engine to search index words as well as phrases. It is absolutely essential that phrases can be indexed for efficient interaction with a database where the listing of chemical names and their standardized CAS numbers can be searched against ASCII text files. Note that real-time searching instead of indexed searching is unacceptably slow for Infobase needs.

3. HyGEN has hyperjump links that are defined by ASCII characters, so the jumps can be automatically generated as programmed and processed by a database or a text editor with a programming language.

4. The hyperjumps are jumps to objects such as text and graphics files, and to application programs. Text and graphics objects can be in many different directories. Jumps to the application programs mean shelling out to those programs. All these capabilities help to efficiently design info models that may be highly de-normalized and also to allow real-time links to specialized programs such as Computer Aided Design (CAD) programs if preferred.

-, 5. A hyperjump may be programmed to have an object name that can be compounded with ' ) LJ

prefix, suffix or text strings which may represent other objects. And this combination allows

t

--------------------__________________c_------------------------------ ...................................................................... TABLE 1: SUPPORTING SOFTWARE SET FOR HAZMIN INFOBASE

1. HYGEN -- Hypertext RunTime Program; A v a i l a b l e as Shareware

2. COLOR-TX -- Program t o Generate ASCII Color Hypertext Screens; Usefu l for Quick S e t t i n g U p of Menu Screens

3. TRANSTEXT -- Hyper tex t A S C I I Text E d i t o r ; Usefu l fo r Quick Checking of Hyperjump Links and as a Second A S C I I Text E d i t o r

4. HOUDIN'I -- Node Network A n a l y s i s Software; U s e f u l f o r Organ iza t iona l and System Requirement Analyses

(510) 540-5508 voice; (510) 548-4686 fax

C a p a b i l i t i e s ; F i l e S i z e up t o 2Gb

......................................................................

MaxThink I n c . 2425 B Channing #592, Berkeley, CA 94704;

-------------^--------------------------------------------------------

5 . VEDIT P l u s -- A S C I I T e x t E d i t o r ; Command Macro

Greenview Data, Inc. 2773 Holyoke Lane, Ann Arbor, M I 48103 1-(800) 458-3348

6. askSam v . 5 . 1 -- F r e e Form Text Database askSam Systems P.O. Box 1428, P e r r y , FL 32347

1-(800) 3-ASKSAM ...................................................................... 7. Paradox for DOS -- Rela t iona l d a t a b a s e w i t h S c r i p t Recording

C a p a b i l i t y ; can be any DOS mature d a t a b a s e w i t h above f u n c t i o n

Borland I n t e r n a t i o n a l Inc . , 1800 Green H i l l s Road, P.O.Box 66001, Scotts V a l l e y , CA 95067-0001 1-(800) 336-6464

8 . StacKey -- U t i l i t y t o A u t o m a t i c a l l y Switch I n t o and O u t of App l i ca t ion Programs

Support Group I n c . Lake Technology Park, P.O. Box 130, McHenry, MD 21541; 1-(800) 872-4768 ......................................................................

9 . Software Carouse l -- Task S w i t c h e r S o f t L o g i c Solu t ions One P e r i m e t e r Road, Manchester, NH 03103

1-(800) 272-9900

10. HOTSHOT Graphics -- VGA Draw Program Symsoft P.O.Box 10005, I n c l i n e Vi l lage , NV 89450

efficient design and ease of programming to generate expert systems with multiple referencing qualities.

6. HyGEN also has a LOOKUP indexing engine, which can provide semantic networking. It is a kind of integration that provides instantaneous finding of chemicals grouped by acronyms, abbreviations, equivalent chemical names, chemical abstract codes, etc. This capability is very useful when standardization of representing chemical names in ASCII is lacking. We will discuss those aspects in Section 11.0. The standard graphics format for HyGEN is that of .PCX. HyGEN also provides a set of utilities for SEARCH and LOOKUP indexing and text processing. The utilities for text processing are basic and for more advanced text processing, a programmable ASCII text editor is essential.

We use VEDIT PLUS by GREENVIEW Data Inc., an ASCII text editor with a powerful programming language for text processing. It can work interactively with the ASCII report outputs of databases. VEDIT PLUS can handle up to 2 GB file size, and can take care of a CD-ROM download of text files for the MSDSs and process them as programmed. It can generate reports with heading data group followed by repeating data groups, or sliding column format, etc., that would efficiently interpret real-world complex information reporting. These capabilities allow designing of the HyGEN Infobase info model to provide information as intuitive as a real-world info model would represent information. Many command macros can be programmed to serve as advanced text and text file processing utilities such as for template editing, filters and translator programs.

Incoming data from different databases are handled by the PARADOX (DOS) database. This is a well regarded database and depending on PARADOX'S import data format options, various database downloads can be integrated into the Infobase system. Only the PARADOX'S macro script and not the PARADOX Application Language (PAL), is used for programming the PARADOX database. PARADOX worked (can export/import ASCII) very well with VEDIT PLUS and between them, it is quite easy to do a relational operation of database data with text records, for even those requiring logical AND conditions. For the implementation of the Expert Infobase, it was found that additional relating of database tables are required for hyperjumps info and the multiple status codes of the links between various objects. In other words, more and not less use of relational database capabilities are required. Other (matured) relational databases with macro script capability may be used instead of PARADOX.

-)

One of the easiest ways to import ASCII text records into databases is to edit them with VEDIT PLUS if needed and then import them into askSam textbase (by askSam Systems). From askSam, the data may be outputted in comma delimited ASCII and imported into other databases such as PARADOX. HyGEN Infobase can thus essentially import data from the data input front end, or from the information report back end (even including sliding columns format reports). This ability to import data both ways is far superior to databases that can import data only one way through the data input end. AskSam has many useful text database processing capabilities, and is easier to program than VEDIT PLUS, but there is a limitation of file size when complex programming is required, and also a limitation due to the complexity of the programming itself. AskSam's relational capabilities are somewhat limited

and extremely slow. It was found that it is usually better to use askSam, VEDIT PLUS and PARADOX in combination when the need for relating textbase tables occurs. Fortunately, the key fields required for relating textbase tables are usually data in terms of numbers or simple phrases and are not information of several sentences. Therefore, a subset of the key fields may be imported into PARADOX and all data relating and sorting done there and then somehow to organize the textbase as required.

STACKEY by CtrlAlt Associates is another powerful software program used. It is for automatically switching into VEDIT PLUS, PARADOX, and askSam, in any sequential order if required. The program describes itself as the ultimate batch enhancer. In our Expert Infobase system, it is used for the running of programs to update the Infobase automatically.

The software programs described above are the main ones used for the construction of the Infobase. There are also other software used for higher construction speed and for specialized applications. We found that productivity of system construction could be significantly improved using a task switching software. The task switcher that we used is SOFTWARE CAROUSEL by SOFTLOGIC SOLUTIONS. Another software that is helpful is TRANSTEXT ASCII hypertext editor produced by MAXTHINK Inc. It does not have a programming language like VEDIT PLUS, nor is it capable of handling files of sizes larger than the available RAM. But it can easily edit and immediately use the angle bracket hyperjumps to link up hypertextially with other files which may then be edited as needed. It is most useful for efficiently editing batch files, command macros and script macros. Another MAXTHINK Inc. product that is useful is COLOR-TX software for generating ASCII color hypertext screen that may be used as hypertext menu screens. A good file manager that can handle thousands of files within a directory is very useful. It is even more useful if it can copy or delete a directory with all its sub-directories. We use POPDOS file manager (but this is no longer marketed by LOGITEEH). Software that we prefer for editing .PCX drawings are HOTSHOT GRAPHICS v.1.8 and NEOPAINT 2.2. Note that HOTSHOT GRAPHICS V.2.+ is optimized for VGA graphics. Incidentally, Windows versions of HyGEN, VEDIT PLUS, askSam and PARADOX are presently available, or will be available later in 1995.

5.0 PARADIGM SHIFTING FOR THE EXPERT INFOBASE

Database terms such as real-time updating and querying, and ad-hoc querying capabilities, need to be reconsidered regarding infobases. Moment by moment real-time data updating for some databases, such as for banking or aircraft ticketing databases, is highly preferable. And if real-time itself does not take inordinately long to accomplish, then real-time updating is the way to setup the database. Only real-time querying can profit from real-updating. Our HAZMIN info libraries such as for the Engineering and the Process Specs, the HAUL or the MSDSs, because of their much longer object life compared to the reintegration frequency, need no moment by moment updating. Some other administrative actions determine when any of those objects are made effective or otherwise. Usually, the incremental change of information between reintegration cycle time of even a month, is less than 1 percent of each type of information library’s total content. From an administrative system vantage, 1 percent

..............................................................................

or less error for not being able to reintegrate moment by moment, is far more acceptable than an unintegrated jumble of info with some key data with errors that can be higher than 30 percent. If the system can perform a speedy reintegration cycle, error due to the lack of real-time updating and querying can be kept to a very minimum.

Today's computers and software are much faster than those of ten years ago, and real-time data processing with old equipment can be slower than some of the non-real-time data processing approaches of today. In other words, real-time data processing is not synonymous with instantaneous data retrieval. Instantaneous data retrieval is likely to be found with infobases rather than with databases, where every permutation of all the mainstream queries of interest could be computed ahead of time (which is realistically possible when dealing with long life objects), and all the reports made instantly retrievable at time of query. HyGEN based technology with a sufficiently fast PC, can reindex its SEARCH and LOOKUP info and reintegrate a complete infobase overnight, and provides virtually instantaneous information retrieval at time of use. Whereas, many other database attempting to compute real-time some of the same complex queries, could take hours even for a single answer. That is simply unacceptable to many end-users, especially when multiple queries for multiple info or data comparisons are needed.

HyGEN Expert Infobase appears to be incapable of ad-hoc querying capability that databases may claim to have. That is true only at the simplest level of the Infobase use. Even then, the Infobase can provide a far larger source of integrated information already checked out for appropriateness, correctness, completeness and coherency. We can also generate rule-based sliding columns expert systems that can provide hyperjumps to objects of interest within the specific info organizational context. These expert systems if well placed, can be of value in regard to indepth info management. The extent of information that an expert infobase can provide normally precludes the need for ad-hoc querying. Whereas, any other broad info extraction approach is simply uncoordinated, incomplete and unfit for greater integration of information. Databases that support the Infobase have all the ad-hoc querying capabilities available for the Infobase system developers, system administrators and advanced users. And these databases also benefit from the feed-back of the integrated and cohesive information that is possible only at the Infobase level. Furthermore, as the complexity and magnitude of the information increases, even experienced database users will have to go on an exacting learning curve to get clear answers with ad-hoc queries. Average users can rapidly get into a muddle with such queries and may simply be wasting time.

Beside the Expert Infobase technology to integrate text, graphics, textbase and database info at an info organization level one layer above that of the basic relational database approach, there are what are known as expanded databases which also claim to do data integration. These expanded databases are more suitable for data than info integration. That is, they provide an interface to several other databases and can also display text and graphics info, but are not necessarily capable of cross-relating at the data to info level or vice versa. Their mode of operation is mainly database centric and their text processing capabilities are much more limiting than the Expert Infobase, whose mode of operation is hypertext centric. It could be helpful for anyone who is seeking information integration above what the extended database offers, to fully understand the key differences between the database and hypertext centric

approaches. -7 I Reintegration process usually takes a sampling snap shot of the total information, and runs

through the reintegration program fed with that info snap shot. The info snap shot approach may not always be suitable for use with objects with very short lives because if the reintegration time is relatively too long, the info snap shot in theory can give a high degree of error in fast moving and changing info situations. The reintegration cycle time for an info of considerable size and with a certain degree of complexity, can be in tens of hours for both the hypertext centric or the database centric approaches. However, expanded databases usually do not reintegrate information, because any advantage there may be for real-time querying, especially for very short persistent objects, may be effectively nullified after a real-time data reintegration cycle (if possible) using the database centric approach. In the case of the Expert Infosystem, if the data reports involving short lives data objects are required, then direct hyperjumps from it to the database program, or engine of interest (including expanded databases) that is doing real-time data processing, can be provided. At that level then, nothing is different from the usual manner of real-time data processing.

The Expert Infobase approach of information integration is at a relative macro level of info management than even expanded database. It samples total info for each reintegration event and provides the end-users with a complete administrative overview of a work project or a program. Yet, the user still has the full downwards compatibility and the benefits of the existing databases or other information processing engines in support of the Infobase. No other current info management approach offer similar capabilities with flexibilities and be

. comparatively cost effective.

6.0 SOME CHARACTERISTICS OF THE EXPERT INFOBASE SYSTEM

The much greater info integration capability of the Expert Infobase made some problems within an organization more acute than before. Existing info can be from many different sources, both internal and external to an organization. Consider the following scenario. A key database to be integrated with the infobase has key errors, and is owned by an external source who is not in any hurry to correct the errors. The answer to that kind of problem is that the Infobase system must be designed to be able to identify each and every hyperlink error and to provide error control programs that will replace known errors with the correct data as far as possible, and the Infobase then is to be reintegrated. This capability is also useful for dealing with unintentional errors that can happen from time to time. The Expert Infobase System can be programmed to show status "flags" of its key hyperlinks, and the system administrator can then take the appropriate action to eliminate the linking deficiencies indicated by the negative flags. After identifying the errors that are not corrected by the data owners of an external organization, those errors and the corresponding correct data can be setup in a database so as to automatically replace the errors within the next incoming data update (which may be text or database table) with the correct data. The Expert Infobase System is capable of taking care of that.

The much greater info integration capability of the Expert Infobase also provides the opportunity for the integration of info not yet systematically organized but could be of immense value if they are incorporated into the Infobase. Usually, the reason why such info are not organized systematically is because of the difficulty of manually pulling together all the relevant info and keep it error free, and the difficulty afterwards of keeping the info up to date. Thus, once the power of the Infobase becomes available, there could arise more info requirements that need to be synthesized or generated. Complex newly generated info is usually highly error prone and organization for such info may have to go through a few iterations until the Infobase compatible methodology becomes sound and adequate. The Expert Infobase technology has not only the power but also the flexibility to promote rapid evolution of effective info models. Although efficient info models can be effected, higher complexity of integration, higher possibility of inadvertent errors demands more complex program designs capable of semi-automatically integrating error free info and checking for correctness and currentness of hyperlinks. The program designing effort needed for meeting these requirements can be most exacting, but the returns are only possible with the help of the Expert Infobase technology. A well designed semi-automatic info induction program can be ten to thirty times faster than doing things manually. The semi-automatic approach allows for thorough checking at all key steps during the learning process of designing a new info model.

Object-based technology concepts were never considered during the Infobase development. Yet, as was pointed out by those who saw the product, there are many end-result properties that are usually associated with object-based technology apparent within the HAZMIN Infobase. It is not the purpose here to debate how much of those properties are available, and at what degree of purity. Each system developing team will have to decide for themselves those theoretical and practical aspects for achieving the end-results. Understanding the abstract concepts of object-oriented programming, although generally not necessary, may sometimes help in thinking precisely.

The HAZMIN Infobase design does not allow the end-user to make changes to data and is strictly read-only. This greatly simplifies programming and networking requirements at the integrated information level. It is a good idea to keep things as simple as possible at this level. Most people already familiar with databases and macro language programming can easily learn this technology, once the appropriate documentation becomes available. There are manuals that come with the software set mentioned, but producers of those software were unaware that their product could be used for a higher level of system integration, and thus provided no instruction of how to help develop such systems. So, in Part 2 of this document, we will provide some of the implementation ideas that may be considered basic for the development of the HAZMIN Infobase System.

There is a HAZMIN Infobase on-line documentation for the system administrator, regarding the algorithms used for system reintegration and maintenance. Many of the algorithms of this document are adapted from the on-line documentation. Each on-line algorithm has an explanation hypertext document file, from which the user can hyperjump (placed within Remark lines) to all the related programs such as batch files, macros and scripts. Since these program files are all in ASCII, they can be loaded into a text editor from within HyGEN and be edited as needed. Checking for errors and correcting them cannot be easier. In spite of

hundreds of program files and database tables, by systematically using the hypertext capabilities, they are organized for efficient management. System overview capabilities are excellent, and additionally, the system developer can obtain total in-depth details instantaneously at any point within the system.

We still have not yet developed an on-line end-user manual. There are hard copy "cheat sheets" that will inform the end-users of how to use the LESS and LPSs libraries. Since we are hypertext capable, the end-user manual will eventually be an on-line hypertext manual.

There are many system analysis approaches. The two approaches that we use will be described here. The first is the Target System Design approach. It requires listing of Issues, Near Term Major Milestones, Intentions, Enabling Software Tool Set, Degree of Fit and Limitations. Table 2 is an example of the Target System Design considerations recorded during the early phase of the HAZMIN Infobase project.

The second system analysis approach follows that of Warnier-Orr [Refi2]. This is quite a formal and thorough analysis approach and needs to be studied in detail before participating in any discussion about it. We will therefore not go into the details regarding this analysis, except to note that it requires entity diagrams to be drawn as well as to correctly link the entities to each other. The hypertext approach easily handles the diagrams. The entity diagrams analysis concept is practical as long as there are not too many entities, and not too many links among them, to be manually made. Otherwise, and very rapidly, it becomes difficult to keep track of all the links to the entities, correctly. Fortunately, there is a node-network analysis software called HOUDINI that can be adapted for entity diagrams analysis -- it will detect all linking errors and those errors be corrected till none are left. This is a excellent program for the analysis of very large (thousands of entities and links) systems, but it is not specifically designed for entity diagrams analysis, and is not always obvious how it may be adapted for such analysis.

7.0 END-USER INTERFACE DESIGN OPTIONS ....................................................... The HyGEN user interface is essentially that of hypertext with its angle bracket hyperjumps. It cannot have been simpler. In order to design a more interesting interface than just menus with rows upon row of hyperjump choices, some of the (graphics) menu screens are designed to be apparently 3dimensional and conducive to the orientation of the user. Using the metaphors of "book shelves" and notebooks" on each shelf, from the main menu (Figure 2), the user selects the bookshelf of interest. It is essentially a disguised form of a decision (selection) tree. Each notebook on a selected shelf, for example see Figure 3, represents the library of Engineering Specs or HAUL or MSDSs, etc. The collection of bookshelves and the notebooks constitutes our knowledgebase regarding our laboratory's requirement for hazardous materials management. The notebooks also are considered as representing classes of information, and so by updating the notebooks, all topical information of an info class will be updated.

Issues ------ PROCESS Process Control

Reproducibility Repeatability Uniformity No Reworks

Appropriateness Correctness Comprehensiveness Completeness Time1 ine s s

No Lost Parts

All of the Above

Documentation

Trackability

Yield Enhancement

HAZMIN Inventory Control Material Substitution Process Substitution Waste Segregation Waste Analysis Document Correlation

NEAR TERM MAJOR MILESTONES .......................... KEY DATA ELEMENTS BY SHOP:

Referring and Authorizing LEDs

1. 2. 3 .

4.

5 .

6 . I .

Materials Into Shop Materials Consumed by each Processing

Waste Generated by each Processing

Wastestream Types Out of Shop

Operation

Operation

To identify all in-coming materials into a shop. To identify all materials authorized for shop use. To identify all material usage authorizing documentations for each shop. To identify all material usage for each of the processing operation. To identify waste generated by all the processing operations performed by the shop. To identify all wastestream types generated by a shop. To identify HAZMIN opportunities and Disposal Cost Minimization opportunities.

Intentions

To facilitate the following services/functions regarding the above issues I

----------

OPERATION Support

Relating Material Request to Authorizing Document Classification and Approval of Incoming Materials

-----------------

.A

TABLE 2: (continued)

Identification of Process Operations and Supporting

Relating Process Operations to Process Units Relating Process Operations and Units to Incoming

Materials TQM of Process Operations Tracking of Parts being Processed Monitoring of Process Units & Facilities During

Tracking of Process Units & Facilities During Usage Relating Process Operations and Units to Outgoing

Identifying Standby Mode Outgoing Waste Identifying Waste Streams Identifying Process Units and their Waste Streams Analysis of & Segregating and Management of Waste Streams Analysis of Waste Streams Linking Applicable HAZ-MAT Regulation to a Waste Stream

PLMNING Support

Activities

Standby Mode

Waste

Efficacious Process/Material Substitution Understanding of HAZ-MAT Regulations

Enabling Software Tool Set .......................... HyGEH+ASKSAM+PARADOX+VEDIT PLUS InfoBase

Degree of Fit ------------- Operational Visualization & User Interface Aids

Book Shelf Data Plot Entity Diagram Flow Diagram Herring Bone Houdini Node Network Logic Venn-Diagram Pie Chart Warnier-Orr Diagram

Expandability -- OK Adding, Updating -- OK Real-The Updating -- Not Applicable Relatability -- Adequate

Limitations -----------

figure 2 -- HAZMIN Infobase's main menu screen.

SHELF# 1

I

4

Figure 3 -- An electronic "Book Shelf" with "Notebooks" that may be selected and accessed.

A shop worker can be overwhelmed by all the information of the knowledgebase, when all he needs is the data concerned with his shop. Fortunately, additional menus can be arranged to provide the user with shop-by-shop information by simply extracting all the information within the notebooks that are identifiable by shop. HyGEN Expert Infobase technology is capable of doing that. We use the metaphor “portfolios” whenever we designed specialized interfaces, such as for information listed by shop. The shop worker then simply selects his shop from the portfolio menu and then browses hypertextially to fiid what he is looking for. Many menus below the main menu layer are automatically regenerated during reintegration and are dependent upon the information being updated. The topical info of the portfolio itself needs no updating because the portfolio’s hyperjumps are to the corresponding topical info class within the notebooks. Details for the implementation of the shop portfolio are given in section 16.0.

HyGEN interface design options compared to the interface design options of Visual Basic or Visual C++ programming languages for Windows is minimal. Yet, with some design smart, many data models that relational databases have difficulties coping with, can be effectively handled by the HyGEN Expert Infobase. The reader needs to bear in mind that the data models and the info models that can be successfully designed usually are dependent on a software program’s user interface design options. A software program may provide a wide array of design options and yet may not provide design options that are necessary for synthesizing database info for hypertext (or any other info handling) use. Visual or interface design alone cannot usually be programmed for the processing of text. For text processing, one needs a programming language or a capable programmable ASCII text editor. Our approach of using HyGEN with a programmable text editor cannot be simpler and yet we were able to meet all our design goals. If a Windows equivalent of a HyGEN becomes available, that with a programming language or an appropriate programmable text editor may allow more ambitious info models to be designed but at the expense of complexity and signifcant loss of infobase construction speed, all leading towards a greater product cost.

Beside portfolios, we can design hypertext menus of special interest that may be graphics or text files. For example, Total Quality Management (TQM) cause and effect schematic graphics diagram (see Figure 4) can be designed to have hyperjumps from its item elements to the appropriate notebooks or portfolios. HyGEN offers the basic hypertext functions between the .PCX graphics and ASCII text files to allow us to design effective visualization and focusing aids. For those who are interested in the basic design approaches of hypertext interface, an article by Parunal may be useful [Ref3].

8.0 COTERMINOUS WITH PROCEDURE, WITH PROCESS, AND WITH INFOBASE

Although HyGEN is hypertext centric in how it operates, it also is essentially directing hyperjumps to objects such as text and graphics files, and application programs. We want to clearly know whether all these hyperjumps will still be valid after each reintegration of the Expert Infobase. The terms, coterminous with procedure, with process, and with infobase (or database) are the terms used in Object Database Management System (ODBMS) discipline for -*)

;L.3

Figure 4 -- Total Quality Management (TQM) cause and effect Herring Bone hypertext graphics screen. The squares indicate predefined cursor positions associated with items that are hyperlinked to the corresponding "Notebooks."

checking relevancy of objects after updates [Ref:4]. Similarly, we can use these checking methods to determine if any of the hyperjumps may or can be in error after each reintegration cycle or some kind of updating process.

For the Expert Infobase, coterminous with procedure can be where procedure equates to reintegration, or procedure is in regard to the updates of some real-time application program that is hyperlinked to the Infobase, or the generation of objects by some application programs and the objects then being linked to the Infobase.

The hyperjump links involved with the HAUL and the MSDS data, and menus with such hyperjumps, are by design coterminous with procedure (where the procedure is reintegration) and therefore is current and valid with every reintegration.

Hyperjumps may be to an application program such as a CAD program’s graphics file. Normally, that graphics file is then updated independently of the Infobase reintegration, and that new info is regarded as coterminous with procedure, where the procedure is not reintegration, but the graphics file updating process.

Now supposing that instead of directly linking to the CAD program or its run-time program with the graphics of interest, the object (graphics file) of interest is generated by screen grabbing the graphics of interest of the CAD program, and then linking that grabbed object to the Infobase. That object is regarded as coterminous with procedure, but now the procedure is neither the Infobase reintegration, nor the graphics file updating. The procedure is the process of running a program for screen grabbing the graphics file of interest and linking it to the Infobase.

Why would we want to screen grab when we can view the same graphics info directly? By screen grabbing, we can still operate in pure HyGEN mode with no need to run any other application programs. Grabbing a screen is done before any HyGEN session. The LAN capable HyGEN -- HyNET-LAN program is then not required to be able to work interactively with application programs across the network. Hence, networking requirements are kept very simple, and costly network compatible application programs are not needed. However, there are times when it may be of advantage to operate real-time, the application program such as CAD programs as speciality engines. That means direct linking of HyGEN to the application program that may be operated real-time. If networking is required for such cases, more elaborate network compatible methods may have to be considered.

Industrial Process Operations Specs within the HAZMIN Infobase does not change with reintegration, but may change with industrial process change, with changes of processing units, with changes of references authorizing the use of restricted materials, and with the changes of the permitted container mix of restricted materials authorized for use. These spec objects are considered coterminous with process. The hyperjumps within those hypertext record files can become out of date whenever the changes mentioned above take place. Using MAXTHINK utilities such as ANGLE, BIC and REFALL, the hyperjumps within the opspecs can be checked en masse against appropriate data to determine whether the jumps are currently valid or not, and corrective actions taken as needed. Yet in the real-world, there are

sometimes freaky problems with pracess updating that make it difficult to determine automatically the change of all info. For example, the shop can switch the rinse tank and the plating tank, and normally the physical tank IDS remain the same as before, because the IDS are dependent upon the location on the shop floor. The tanks' ID data will be valid entry into the database but incorrect process info. The checking approach mentioned above will not flag any error. Only the regular users of such info are likely to detect those kind of errors. They can then notify the system administrator for the necessity for corrections. To improve detection of such problems, additional safeguards can be programmed into the Infobase.

There are many menus (text) and graphics files that will remain unchanged throughout the life of the Infobase. They are coterminous with infobase.

REFERENCES

1. -------- -----------

Toomer, B. A., I Aung San, "An Expert Infobase System for Tracing Hazardous Materials in Engineering Documents and System Requirements for Material Information Standardization,'' Chemical Data Standard: Databases, Data Interchange and Information Systems -- 2nd Volume, ASTM STP 1298, Charles E. Gragg and Joseph Mockus, Eds., American Society for Testing Materials, Philadelphia, 1995.

2.

3.

Orr, K., "Structured Requirements Definition," Ken Orr and Associates, Inc. 1981.

Parunak, H. V. D., "Ordering the Information Graph," Chapter 20, Berk, E., Devlin, J., Eds., "Hypertext/Hypermedia Handbook," McGraw Hill Publishing Company, Inc., 1991

_ -

4. Stein, R. M., "Object Databases," BYTE, April 1994.

9.0 BASIC BUILDING BLOCKS OF THE HYGEN EXPERT INFOBASE

The topics of Part 2 are about the main basic building blocks of the HyGEN Expert Infobase System and with the HAZMIN Infobase used as an example application. The example building blocks should be helpful when developing an information administrative system using the HyGEN Expert Infobase technology.

Of course, to fully follow the ideas below, the reader would need to be familiar with the appropriate software manuals. The major functions of the macros quoted will be described, but note that they are mostly not part of any macro library and is better to assume that all macros will have to be programmed by the system developer. VEDIT PLUS macro names can be recognized by the .VDM extensions. askSam database table names can be recognized by the .ASK extensions and the askSam macro program names are preceded by colons with no space in between. Many of the examples given below are simplified as compared to the actual HAZMIN Infobase, so as to make it easier to understand the principles of the basic building blocks. The HAZMIN Infobase was designed to adequately model the real-world info and has to deal with more intricacies. So the actual algorithms used can sometimes be more complex. MAXTHINK utilities mentioned in this writing are listed in Table 3. The functions of those utilities are described in the respective MAXTHINK product manuals.

.

10.0 SEARCH INDEXING WORDS AND KEY PHRASES

The Objective is to SEARCH index the text directories of: \LESS; \LPSS; WAUL; WSDS; \CLOL; \ODs; and \SARA. The HyGEN utilities used will index all subdirectories of these directories as well. The program is divided into four stages and sometimes with further program partitioning within each stage to provide better reusability of the sub-routine DOS batch files. However, keep in mind that these algorithms are more suitable for the beginning programmers and also may be used as starting points for advanced programmers to totally replace the batch files or the VEDIT PLUS macros with more efficient custom programmed utilities. Note that the example algorithms are basic. There may be needs for additional algorithms to generate ancillary info to further elaborate information involved with SEARCH indexing.

-----------I----------------------____---------___________-----------_____

10.1 STAGE 1

Figure 5 is the flow diagram of the first stage SEARCH indexing algorithm. HyGEN needs its indexing related files, configuration and help files in the same directory as where its

-_-------___

t 7

NAEdE EdAXTHINK PRODUCT

ANGLE BIC BIGSORT BMI FS GW awc JOIN - TXT KF KWIC0 KWIC1

REFALL

HyGEN HyGEN HyGEN HyGEN

HyGEN HyGEN HyGEN HyGEN HyGEN HyGEN HyGEN HyGEN

TRANSTEXT

-- I ???? can be LESS, LPSSj HLMSj MSDFj CLOL, ODSj SARA

\TRU Directory Unless Otherwise Stated

Index i ng RAWWORD. TXT Text Directory \????\* . *

TX T

TX T

Figure 5 -- STAGE 1 SEARCH Indexing algorithm.

START.TXT file resides. If the text file or files to be SEARCH indexed also reside in the same directory or its subdirectories, all the other files need to be temporarily moved to another directory, say to WO-lNDEX directory. A batch file may be written to move all other files beside the files to be indexed. The files to be indexed must remain in their native directories. In the case of the HAZMIN Infobase's \LESS and LPSs directories, each of the engineering document to be SEARCH indexed has its own subdirectory where all the text and graphics files for that document are kept together.

' * ?

Three output files are sought for each text directory with regard to HyGEN related SEARCH indexing. They are: VOCABURY.TXT; VOCABURY.IDX; and KWIC. KWIC is renamed as KWIC-W so as not to get inadvertently written over by later generated KWIC files. KWIC-W will contain the listing of all the words and numbers (with more than one character) of the text files. Note that the HyGEN GW utility will process as spaces for most of the ASCII characters that are not alphanumeric, hyphen, period (decimal) or apostrophe. Phrases to be SEARCH indexed will be added with KWIC-W in the next stage.

During the SEARCH index processing of each of the text directory of interest, VOCABURY.* files need to be within the same directory where the MAXTHINK utilities are kept, otherwise some of those utilities will not function correctly. We kept those utilities in the \TRU directory. Also, all the DOS batch files written are placed within the \TRU directory and it will be assumed here that many of the relevant input text and data files and utilities reside within the \TRU directory unless otherwise stated. Only one text directory should be processed for SEARCH indexing at a time within one \TRU like directory. Multi- processors or multiple PCs (or PC with multiple CPUs) may be set up to do parallel processing. The function and use of the MAXTHINK utilities are described in detail in the HyGEN manual and will not be repeated here.

Figure 5 shows a text processing sequence of batch files IDX-????.BAT and IDX-l.BAT, using the DOS and HyGEN utilities. ???? stands for LESs, LPSs, etc., and different starting parameters corresponding to different directories are required for IDX-???? .BAT. Names of the batch files themselves have no special importance other than for referencing. Note that IDX-l.BAT need only the RAWWORD.TXT file and no other starting inputs. It is therefore reusable with all the text directories to be indexed. This basic sequence for SEARCH indexing is described in the HyGEN manual. However, Stackey batch enhancer program can be used to automatically sequence this STAGE 1 program as well as for the rest of the other stages. Stackey is capable of passing DOS starting parameters to the batch files.

10.2 STAGE 2

Figure 6 shows the schematic diagram of the STAGE 2 algorithms. The inputs required are the KWIC-W and key phrase listings, and VEDIT PLUS macro frles to be used to skip from indexing, often repeated words. Two outputs are required: KEYF!IND.TXT and SKIP-KF.VDM.

The first part is combining the ASCII key phrase listing files representing the LESs, LPSs,

-----*------

2 t.

W E D I T

Sk ip-KF .UDH

~ ~~~~~~~

Figure 6 -- STAGE 2 SEARCH Indexing schematic diagram.

HAUL, MSDSs libraries with KWIC-W. The key phrase listings may be combined as desired with each SEARCH indexing's specific requirements. The key phrase lists may be outputted from databases as reports or simply stored as ASCII text files. Our batch files outputted key phrases from selected database tables and are add on to KWIC-W. KWIC-W is then renamed as KEYFIND.TXT. Because we wanted to have different combinations of key phrases for SEARCH indexing for different text directories, the naming of the SAM????.BAT batch files (where ???? stands for LESS, LPSs, etc.) have to be different. Other ways to combine KWIC-W with the key phrase listing text files, is to write a VEDIT PLUS macro, or to use the MAXTHINK's JOIN-TXT utility. How the key phrases for the various directories may be generated will be covered as a separate topic in section 17.0.

The second part of this stage is to select, copy and rename a VEDIT PLUS macro as SKIP-KP.VDM macro program file, thus again, lead to needing batch files with separate names for indexing the different directories. To use storage space efficiently, ordinary text files such as from \LESS and LPSs need to skip indexing the "glue" words such as "and, be, for, if," etc., that are repeated hundreds of times throughout each directory. However, text files that are outputs of textbases have field names and other idiosyncratic names that are within every report and they can number in the ten of thousands or more, and should be additionally skipped from indexing. Therefore, separate VEDIT PLUS macros for different directories may be required. To standardize the name of the skip macro, each of these different macros are copied and renamed as needed by each batch file as the SKI€-KP.VDM (a locally standardized name). This will allow simpler programming later. Note that the key phrases themselves can have some of the words to be skipped included in them. Yet HyGEN will still indexed those phrases with all the skipped words included within them. -1 10.3 STAGE 3

Figure 7 shows the flow diagram of the STAGE 3 algorithm. The batch file IDX-3.BAT for this stage is usable with all the different directories to be indexed. Three inputs are required. First is the KEYFIND.TXT file from STAGE 2. Second and third input files are VOCABURY.TXT and VOCABURY.IDX files that were generated by STAGE 1. Those two files are necessary to run the MAXTHINK's KF utility and they all must be in the same directory. Four output ASCII files are required. They are VOCABOl.TXT, VOCABOl.IDX, KWIC.TXT and KWIC.IDX.

--_e _---_---

KEYFIND.TXT file from STAGE 2 must be sorted alphabetically after adding different key phrase listings to the KWIC-W. But MAXTHINK's BIGSORT utility is case sensitive and so VEDIT PLUS macro is used for converting all phrases to upper cases before sorting. After sorting with BIGSORT, VEDIT PLUS macro for eliminating duplicates and another macro for skipping words are used. Those who can do advanced programming could probably programmed a single utility to do all these processing steps. This final KEYTEXT.TXT and VOCABURY." files are used as inputs for KF to output KEYMATCH.TXT file. It would be nice if KF can accept file input for skipping of words.

I + Use UEDIT macro W E D I T\X-DUPL I . UDM

, I D I . I Use UEDIT macro I t I \UEDIT\UP-CASE. Use UEDIT macro 1 UDM -. W E D I T\SK IP-KF .

UDM

DO THIS FIRST t DO THIS NEXT I t KEYMATCH. TXT t I - E I +KWIC I - I t KWIC. KEY

1 KW IC. TXT IDXZ .BAT I KWIC, SOR

To Generate KWIC.TXT, KWIC.IDX, UOCABB1.TXT & UOCABB1.IDX Indexing Files I 3: Figure 7 -- STAGE 3 SEARCH Indexing algorithm.

-3

TheKEYMATCH.TXT file is than processed using the MAXTHINK utilities as shown in the Figure 7 to output the KWIC.* files. These steps are explained within the HyGEN manual.

MAXTHINKs BMI utility to output VOCABO1.IDX. These four files are to be used with the HyGEN SEARCH indexing engine.

' Then, KEYMATCH.TXT is renamed as VOCABO1.TXT. This file is used as input to the 10.4 STAGE 4

Figure 8 is a schematic diagram for the fmal assembly of the text directory to be fully SEARCH indexed. VOCABOl." and KWIC.* files are moved from the \TRU to the text directory that is to be SEARCH indexed. All the files that were temporary moved to the \NO-INDEX are then moved back into the text directory. Some text directories need extra info which is not related to SEARCH indexing but nevertheless, is info that need to be changed with the info update for the text directory. Such information can be taken care of at this point by generating the ancillary info files and reassembling the text directory. For example, the HAUL and MSDS text record files number in the ten of thousands and programs can be written to generate new menu files that will provide hyperjumps to the new set of text records. Because each text directory may need to be assembled differently from one another, separate batch files are used for this stage.

_-_--_-_----

The above four stages of SEARCH indexing essentially describe how databases can be connected to the indexing process. It would be very convenient if there are custom utilities that will function the same as for the VEDIT PLUS macros mentioned.

11.0 LOOKUP INDEXING FOR WORDS/PHRASES OF A SPECIFIC INTEREST

For the HAZMIN Infobase, we are interested to LOOKUP index our LESs, LPSs, HAUL and MSDSs libraries in terms of California or Federal EPA restricted materials listings. It is assumed here that SEARCH indexing preceded LOOKUP indexing and that some of the LOOKUP indexing related files are generated during SEARCH indexing. The algorithms will be explained in two stages. Keep in mind that they are somewhat basic example algorithms and programming variations for LOOKUP indexing is common.

Figure 9 shows the flow diagram representing the algorithm for batch file CLOL-LES.BAT for the case of LOOKUP indexing the LESs text directory in terms of the California EPA restricted materials listing. VOCABOl.* files (generated during SEARCH indexing) within the text directory to be LOOKUP indexed, are copied into directory \TRU and renamed as VOCABURY." files. For the example, it is assumed that the California EPA restricted materials data is kept in an askSam textbase table called CLOL-ADJ.ASK. :AUTO-CLOL-KF is the askSam macro program that will output the file KEYFIND.TXT

RGP????.BAT

STAGE 4

FINAL ASSEMBLY Indexed in

All batch files are in \TRU directory

Figure 8 -- STAGE 4 SEARCH Indexing schematic diagram.

LOOKUP INDEXING STAGE 1: Example Shown I s To Generate Files For CLOL-LESS

UOCABURY . TXT 3 E

CLOL-LES.BAT

/

\ TR

1 --LESS\

Figure 9 -- STAGE 1 LOOKUP Indexing algorithm.

with the listing of the restricted materials. This list requires sorting and BIGSORT is used for that. Any duplicates within the listing are removed by the X-DUPLI.VDM macro.

Next, KEYFIND.TXT and VOCABURY." files are inputs to the KF utility and the resulting KEYMATCH.TXT is processed by the CLLE-UD2.VDM macro. The function of that macro is to format the hyperjumps listing of each of the state EPA restricted materials, generate file splitting alignment keys, give file names for each of the listed material and then split the KEYMATCH-TXT file into many \CLLE\*.* files. Section 14.0 will discuss more about implementation of these functions. A typical splitted file content is shown in Figure 10.

-1

Next, we need to quasi-SEARCH index the \CLLE directory. To do that, GW, BIGSORT, GWC need to be run to generate a new VOCABURY.TXT file that will overwrite the older file. We then run BMI to generate a new VOCABURY.IDX and that will also overwrite the previous VOCABURY.IDX. With KEYFIND.TXT and the new VOCABURY.* files as inputs, KF is run again to generate a new KEYMATCH.TXT file. This file is copied into \CLLE and renamed as CL-LESS. The KEYMATCH.TXT format is standardized to use with the HyGEN SEARCH indexing engine. But here, we want to SEARCH index the information using angle bracket hyperjumps instead of using the search engine's "S" key. The macro CL-LESS.VDM is used to reformat CL-LESS so that its data can be imported into the askSam textbase. Figure 11 shows the beginning of that reformatted file. Notice that the LOOKUP info for 1-TRICHLOROETHANE has an angle bracket hyperjump to the \CLLE\CE-OOOOl file of Figure 10. Figure 10 has the entire list of viable hyperjumps for looking up 1 -TRICHLOROETHANE within the Local Engineering Specifications.

To LOOKUP indexed our LPSs, HAUL and MSDSs libraries, the algorithms are essentially the same for them as for LESS, but with appropriate variations. The files generated for the LPSs, HAUL, and MSDSs are placed in the \CL-LPSS, \CL-HAUL and \CL-MSDS respectively.

This stage's algorithm is shown by the flow diagram of Figure 12. The objective here is to combine the files within \CL-LESS, \CL-LPSS, \CL-HAUL and \CL-MSDS in a sensible manner on the basis of each of the restricted material. Also, additional lookup information such as the context for the restriction, dose limits, restricting agencies, equivalent materials, etc., whatever useful information for each restricted material needs to be added in. Seven info inputs are required as shown in Figure 12. Only two GLOSSARY." output files are essential for LOOKUP indexing. A single batch file such as CLO-LKUP.BAT can be used to sequence the running of the sub-programs.

When Figure 11 information is imported into askSam, the "@ @ " serves as key for automatically splitting into textbase records. For askSam textbase records, an explicit field name can be recognized by a word or a character terminated by the square bracket ''[,,. The field with the name -[ is for the exact wording used for matching material in regard to LOOKUP indexing. SEQ[ field value of \CL-* directories are the same within each directory _>

C a l . L i s t of L i s t s HAZMAT ' 5 1-TR ICHLOROETHANE Jump to L o c a l E n g i n e e r i n g Spec.

< C : \ L E S S \ 3 4 8 0 1 8 8 . \ S T A R T . T X T 3 12) < C : \ L E S S \ 3 4 6 8 5 - 9 0 . \ S T A R T . T X T 3 13> < C : \ L E S S \ 3 4 8 0 5 - 9 8 . \ S T A R T . T X T 6 14> < C : ~ L E S S ~ 3 4 6 8 5 _ 9 8 . \ S T A R T . T X T 6 19) CC:\LESS\3480885.A-\START.TXT 3 16>

Figure 10 -- HyGEN screen showing content of file CE-0001 of the directory \CLLE. This particular file lists the hyperjumps to all the various LESS that have ] 1-TRICHLOROETHANE text string.

'"1 1-TRICHLOROETHANE . SEQC 1 A

.. (v

Y U

but different among those directories in accordance with the required sort order for their outputs.

Figure 12 also shows the outputs from :AUTO-LKUPl and :AUTO_LKUP2 of CLOL-ADJ.ASK and are imported into CLO-LKUP.ASK after being processed by the VEDIT PLUS macros. To explain the reason for these actions, the textbase record format of CLOL-ADJ.ASK need to be reviewed. Figure 13 shows that format. ![ indicates CAS number of the material. &[, $[, Source[, %[, AS-OF[ fields are for other data types supplied by the state EPA and we need not be concerned with them in regard to the basic LOOKUP indexing process. For Figure 12, the -[ field value is COBALT -- the material to be matched for LOOKUP indexing. A pair of I t%%” is used as implied field name (unique character set with no ending 0 and there can be more than one of the same per askSam record, askSam knows how to process implied field info. The number of consecutive lines beginning with %% depends upon how many lines of information as given by the state EPA that one would wish to show within the LOOKUP index viewing screen. The important point to realize here is that in order to maximize matching of word or phrase, sometimes the value of -[ is edited to be a sub-set of the total phrase or sentences following the %%s. When such an event occur, the record is marked with “ABV” within the KEY[ field to facilitate different formatting required for the output reports. There are two records with the same -[ field value of COBALT and Figure 14 shows the other record.

askSam program :AUTO-LKUPl will output reports with the -[, %%, SEQ[ inforrnation and with appropriate formatting for records with or without ABV within the KEY [ field, and ready to be imported into the askSam textbase CLO-LKUP.ASK. GN-LKUP1.VDM is used for eliminating duplicate records with varying number of lines of information between different duplicated groups and also for deleting blank lines.

-1 askSam program :AUTO_LKUP2 will output reports with the CAS number for -[ instead of the material name, and the -[ value for %% instead of the value used for the :AUTO_LKUP2 program. This will allow for LOOKUP indexing materials by their nomenclatures as well as by their CAS numbers. X3L-DUP.VDM is used for eliminating duplicate records all with three lines of information.

The design of the LOOKUP index viewing screen for COBALT can be observed from Figure 15. The figure shows that only the active LOOKUP item is shown expanded, and the other

before or after COBALT may be seen unexpanded if COBALT‘S LOOKUP info does not take up the entire screen, and it does not in the figure.

With empty data but with all the macro programs included, CLO-LKUP.ASK textbase table is generated by copying and renaming the STD-LKUP.ASK textbase table. Previous CLO-LKUP.ASK will simply be written over. There are altogether six inputs into the CLO-LKUP.ASK textbase as shown in Figure 12. askSam :AUTO-LKUP program for CLO-LKUP.ASK puts out GLOSSARY.TXT file with text records using -[ field values for first level sort and SEQ[ field values for second level sort. Zmagine now that for COBALT, the output records are sorted in the order corresponding to the two (must be at least one)

~ 3 report inputs fiom :AUTO_LKUPl, the report inputs (if any) from the \CL* directories, the \

1 ? imzia-68-1 %[ E M+

Source[ CLOL IN1 1 Hey[ OR1 ATC 1 X I 1

-[ COBALT CARBONYL 1 22 COBALT CARBONYL

$11

AS-OF[ ez-81-93 1

%% COBfiLT CARBONYL

Figure 13 -- An askSam textbase record for looking up COBALT within the -[ field.

1

? ? 62287-76-5 81 a E

Source[ IN1 CLOL I Key[ ATC ABU 1 xc I

-1 COBALT I

S[ J

AS-OF[ ez-el-93 1 22 COBALT, ((ZjZ’-~ljZ-ETHANEDIYLBIS(NITRI %% COBALT, ((2,ZJ-(I,2-ETHANEDIYLBIS %A (N I TR ILOMETHYL IDYN 1

f

Figure 14 -- Another askSam textbase record for looking up COBALT within the -[ field but with different qualification for the restriction of materials as indicated by the words following -1 the %% implied field.

..

'I) LORAZEPATE DIPOTASSIUM OAL TAR OAL TARS

OBALT FOR, COBALT, €€Z,Z'-(l,Z-ETHANEDIYLBIS

CN I TR ILOMETHYL I DYN 1

FOR, COBALT AND COMPOUNDS %%

same number of report inputs from .:AUTO_LKUP2 as from :AUTO-LKUPl, which in this example is two. These reports if closely viewed one after another will have the group material title - COBALT, needlessly interposing between the reports. Removing these redundant COBALT words will result in that of Figure 15. GN-LKUP2.VDM macro looks for alphanumeric character starting at column one (must be for HyGEN LOOKUP indexing engine) and will delete all except the material name at the very top of a particular material group.

-1

Examining Figure 15, the hyperjumps associated with COBALT, ((2,2 ... and COBALT AND COMPOUNDS are generated when outputting info from :AUTO-LKUPl. The hyperjumps (on the left side of the figure) are to the California EPA List of Lists restricted materials information (text) file, and will highlight the lines with the respective CAS numbers for those materials. There are only four \CL-* directories in the example used, but there are hyperjumps to five files on the right side of the figure. One of them (distinguishable by CP-O0046) is really to the COBALT OXIDE file for matching within the LPSs library. The extra hyperjump happens because COBALT is a word within COBALT OXIDE. COBALT CARBONYL and COBALT OXIDE both contain the word COBALT and are also listed for LOOKUP indexing. But there is no extra hyperjump to the COBALT CARBONYL file. This is because, there is no matching of COBALT CARBONYL anywhere within any of the files within the \CL-* directories, and there simply is no LOOKUP hyperjump file for COBALT CARBONYL.

After running GN-LKUP2.VDM, the output file GLOSSARY.TXT is copied into the \CLOL directory for keeping the California EPA Info. Then BMI utility is run with the \TRU\GLOSSARY.TXT as input. The resulting GLOSSARY.IDX is then copied into \CLOL. This essentially complete the basic LOOKUP indexing process. Just press the "L" key when in the California EPA Info library to invoke the LOOKUP indexing. In most cases, we would like to have additional information to support the LOOKUP index capabilities and ancillary programs to generate the extra information files become necessary. Such files are like the lists of state and federal EPA materials with matches within LESs, LPSs, HAUL and MSDSs libraries.

The same procedure as above is true for the LOOKUP indexing of our LESs, LPSs, HAUL and MSDSs libraries in terms of the Federal EPA restricted materials or any other restricted material listing. Figure 16 graphically shows the state and federal EPA restricted materials to be LOOKUP indexed against the Engineering and Process Specs., the HAUL and the MSDSs (associated with the HAUL items) libraries. From the pre-defined cursor positions of this graphics screen, we can hyperjump to libraries of interest and view any of the LOOKUP indexed information, and that is more or less -- semantic networking.

12.0 DESIGN AND IMPLEMENTATION OF HYPERTEXT RECORDS WITH REPEATING GROUP OF INFO WITHIN ANOTHER REPEATING GROUP OF INFO

.................................................................................................................... -)

L J

As already mentioned in Section 3.0, MSDS text records has repeating ingredient data groups

Figure 16 -- Relationship of Federal, State and other restricted materials lists to be LOOKUP Indexed against the Engineering and Process Specs, the HAUL and MSDS document libraries.

and these MSDS records themselves may be many-to-one to a HAUL item record. The real-world data model for HAUL can easily be represented by a relational database model for structured data. The real-world info model for the MSDS is more at home as a textbase record than as a database record, and we must also be able to relationally link it to an appropriate HAUL item record.

The HAUL data is straight forward importation into PARADOX from a mainframe, using the ASCII comma delimited approach. PARADOX can sort and sequentially numbered the HAUL items within the HAUL database table. These items are uniquely defined by the SHOP[, BLDG[, NIIN[ and CAGE[ key fields and sorted in that order as well. It then outputs ASCII comma delimited text file of the key fields and the sequence number field data. Using a VEDIT PLUS macro, angle bracket hyperjumps to file names containing the sort numbers can be synthesized and the file imported back into PARADOX. The new database table is then related to the HAUL database table. Henceforth, the resulting database table, say HAUL-DAT.DB, will have all the HAUL data as well as the hyperjumps for each HAUL item.

The HMIS CD-ROM (MSDS info for US Government use only) default-set (larger than sub-set but smaller than a full-set) download of MSDS is an ASCII text file of records with explicitly defined field names, and some field values can be multiple rows of many sentences. By using a VEDIT PLUS macro, the file is processed to be imported into askSam. The records are than sorted by the MSDS SERIAL-NUMBER[ and outputted as an ASCII text file. A VEDIT PLUS macro can then process that file to set up file splitting alignment keys, name of files to be split, and split and place these text record files into appropriate directories. Since each MSDS item has a unique five character serial number, we can conveniently use that for the naming of files instead of using a sort sequence number name. The hyperjumps synthesized will be to jump to those files. Each serial number must correspond to a unique set of NIIN[, CAGE[ and Part Number INDICATOR[ field values, otherwise, there is error. The same macro may add hyperjumps to the corresponding full-set MSDS data within the CD-ROMs. Then we have MSDS hypertext record with file names that can be the same as their five character serial number. A sub-set of MSDS with only the fields of interest for relational purposes, sorted by the SERIAL-NUMBER[ is downloaded from the askSam textbase table and that file is imported into PARADOX. In similar manner as for HAUL, we can synthesize all the hyperjumps for the MSDS items and generate a new MSDS database table, say MSDS-DAT.DB. And from the hyperjumps, we should be able to jump to the default-set MSDS text record files.

-

HAUL-DAT.DB and MSDS-DAT.DB are then related with each other using the NIIN and the CAGE as key fields. The resulting new database table, say HLMS-DAT.DB, may now output a file, say HLMS.TXT with the sort order of SHOP[, BLDG[, NIIN[, CAGE[ and SERIAL-NUMBER[ fields.

Remember that the HAUL items are uniquely defined by the SHOP[, BLDG[, NIIN[ and CAGE[, and that there can be more than one record with such unique sort group to-be-file name because of one-to-many to MSDS. Also, the consecutive hypertext records with their to-be-file names within HLMS.TXT will be in sequence. Furthermore, from the sort order for \

-J

HLMS.TXT, the hypertext records within the group with the same to-be-file name will be sorted by the MSDS SERLAL-NUMBER[. Supposing there is a one-to-three group and the top portion (heading group) of each hypertext record is for HAUL and the bottom portion is for a qub-set of MSDS as shown in Figure 17 to 19. The three records have identical information regarding HAUL but different for the information regarding the sub-set of MSDS. A VEDIT PLUS macro can be written to detect identical heading groups within the same to-be-file name group and except for the very first heading group, delete the other heading groups that are between the records. After running this macro, HLMS.TXT will only have the consolidated one HAUL to many sub-MSDS hypertext records as shown by the example in Figure 20.

I>

Why do all this? Because most databases are not designed to output reports with heading group of a type of data and trailing groups of another type of data. But once the database software manufacturers understood the need for this capability, and upgrade their products accordingly, the consolidation processes as for HAUL and sub-set of MSDS can be done much quicker as well as easier.

A suitable VEDIT PLUS macro may now split HLMS.TXT into individual hypertext record files and placed them in the appropriate ditectories. A general approach for splitting files will be covered in Section 14.0. The actual HAZMIN Infobase infostructure for relating HAUL with its MSDSs is more complex because error-correction capabilities were built into the system. HAUL data has a large percentage error and has yet to be scrubbed clean by the owners from another department, and maybe eventually. Meanwhile, known errors can be automatically corrected for every Infobase reintegration. The error-correction design is relatively advanced and is outside the scope of the descriptions for the basic building blocks.

Each HAUL item must have at least one or more MSDS items. If a successful hyperlink to the associated MSDS exits, we can easily hyperjump to that MSDS info file. Figure 21 is a typical format of a MSDS info file that we can hyperjump to. However, the information for this example file is obtained from a non HMIS CD-ROM MSDS source because of HMIS CD-ROM's propriety stipulations and we cannot show their contents here. Note that the MSDS for the diluted HCl has two ingredients and therefore there are repeating groups for the ingredient data, dependent on the number of ingredients.

The hypertext centric design described in this section can efficiently and intuitively handle the situations where MSDS data groups may repeat within a HAUL item record, and these MSDSs data records may have repeating ingredient data groups. An important design idea used is in regard to having only a subset of MSDS to merge with a HAUL item data and relying on the hyperjump to access the full-set or less-than-full-set (default-set) of MSDS text record that may be several pages long. The sub-set of MSDS to be merged with the HAUL item should be as brief as possible and beside the identifying fields for the MSDS item, only the fields useful for cross-referencing of the HAUL and MSDS info should be included in it. That way, even if a HAUL plus MSDS hypertext record have many repeating groups of sub- set MSDSs merged to a HAUL item, it is still easy to see the total combined info that is also maximized for SEARCH or LOOKUP indexing.

ZWK-SPPYC ie 1 U-ISSUEC CN 1 NIINC 6893819473 CAGEC 66773 NIIN-RE 1 CAGE-RC 68773

TEK-AUTH c 3443 1 AUTH-R C DATEC 3235 1 HIlC YES 1 HIZC 1 H13C HSDS C 1 DSP-CODE[ 1 SN C 1 SEQ# C

RK-HL f 1 RK-RLC HLMS 1 RK-RQC DIRECT 1 RK-DMC DIRdM

FSCC 8838 1 MFGC ALOX CORP 1

NOMENE CORROSION