19
SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 25(11), 1223–1241 (NOVEMBER 1995) Design and Implementation of Heterogeneous Distributed Multimedia System using Mosaic GSQL SUNIL MAGAVI, JOHNNY WONG AND PRAKASH BODLA Department of Computer Science, Iowa State University, Ames, IA 50011, U.S.A. (email: {magavi, wong, bodla}@cs.iastate.edu) SUMMARY With more and more computer users having access to the Internet, it is important for them to access the vast amount of information in an efficient and easy manner. This project explores one of the possibilities for providing an integrated system to the user to access the different types of media like text, image, audio and video stored in different databases on remote machines without the user having to know about the underlying mechanism involved in accessing the database. Such a system must hide the location of the remote data, retrieve the information, process the results and present it to the user. We have used the National Center for Supercomputer Applications (NCSA) Mosaic’s Gateway Structured Query Language (GSQL), which enables a system developer to build input forms, query to the remote database in a relatively easy manner and present the results to the user. With the help of this prototype, the development time to build such an application is considerably reduced as compared with developing the same application using the existing Graphical User Interface (GUI) tools like the X-Window system. KEY WORDS: distributed database; data transparency; heterogeneous distributed database system; Mosaic; GSQL; World Wide Web; multimedia; hypermedia INTRODUCTION In this paper, we describe the design and implementation of a prototype client–server model for Heterogeneous Distributed Multimedia System (HDMS) and address the question of how this prototype can provide an integrated solution for transparent access to heterogeneous information (text, graphics, sound and video) repositories. We introduce the National Center for Supercomputer Applications (NCSA) Mosaic and Gateway Structured Query Language (GSQL) as well as how these can be used to design and implement our prototype in an efficient and easy manner. Client–server architecture is a method of separating a database application into two parts. These parts could run on different computers, communicating over a network or Internet. The client executes application software and interacts with the user, while accessing database information across the network. The server executes the database software and handles the functions which control the data required by the client. Client–server architecture makes it easy for one database to be shared by many remote clients. This allows the server to perform the various database management tasks while the client can manipulate the data locally. In this way, users can access data globally and at the same time allow autonomy for local database management systems. CCC 0038–0644/95/111223–19 Received 12 July 1994 1995 by John Wiley & Sons, Ltd. Revised 20 March 1995

Design and Implementation of Heterogeneous Distributed ... · SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 25(11), 1223–1241 (NOVEMBER 1995) Design and Implementation of Heterogeneous

Embed Size (px)

Citation preview

SOFTWARE—PRACTICE AND EXPERIENCE, VOL. 25(11), 1223–1241 (NOVEMBER 1995)

Design and Implementation of Heterogeneous DistributedMultimedia System using Mosaic GSQL

SUNIL MAGAVI, JOHNNY WONG AND PRAKASH BODLA

Department of Computer Science, Iowa State University, Ames, IA 50011, U.S.A.(email: {magavi, wong, bodla}@cs.iastate.edu)

SUMMARY

With more and more computer users having access to the Internet, it is important for them to access thevast amount of information in an efficient and easy manner. This project explores one of the possibilitiesfor providing an integrated system to the user to access the different types of media like text, image,audio and video stored in different databases on remote machines without the user having to know aboutthe underlying mechanism involved in accessing the database. Such a system must hide the locationof the remote data, retrieve the information, process the results and present it to the user. We haveused the National Center for Supercomputer Applications (NCSA) Mosaic’s Gateway Structured QueryLanguage (GSQL), which enables a system developer to build input forms, query to the remote databasein a relatively easy manner and present the results to the user. With the help of this prototype, thedevelopment time to build such an application is considerably reduced as compared with developing thesame application using the existing Graphical User Interface (GUI) tools like the X-Window system.

KEY WORDS: distributed database; data transparency; heterogeneous distributed database system; Mosaic; GSQL; World

Wide Web; multimedia; hypermedia

INTRODUCTION

In this paper, we describe the design and implementation of a prototype client–server modelfor Heterogeneous Distributed Multimedia System (HDMS) and address the question of howthis prototype can provide an integrated solution for transparent access to heterogeneousinformation (text, graphics, sound and video) repositories. We introduce the National Centerfor Supercomputer Applications (NCSA) Mosaic and Gateway Structured Query Language(GSQL) as well as how these can be used to design and implement our prototype in anefficient and easy manner.

Client–server architecture is a method of separating a database application into two parts.These parts could run on different computers, communicating over a network or Internet.The client executes application software and interacts with the user, while accessing databaseinformation across the network. The server executes the database software and handles thefunctions which control the data required by the client. Client–server architecture makesit easy for one database to be shared by many remote clients. This allows the server toperform the various database management tasks while the client can manipulate the datalocally. In this way, users can access data globally and at the same time allow autonomyfor local database management systems.

CCC 0038–0644/95/111223–19 Received 12 July 1994Ó1995 by John Wiley & Sons, Ltd. Revised 20 March 1995

1224 S. MAGAVI ET AL.

A distributed database is a network of databases, stored on multiple computers, thatappears to the user as a single database. Each database is controlled by its own local databasemanagement system but communications software transparently links them together.

In order to access information from more than one database, there must exist a systemwhich hides the location of such information, fetches the information, assembles the results,and presents the results to the user. The distributed database system1 in theory, preservesthe autonomy of such information without imposing any changes on existing databases.This includes hiding the heterogeneity of file systems, data models, database languages anddata semantics, as well as hardware and the operating systems on which the databases run.The system should appear as a single integrated database.

In a typical distributed database configuration, the client and server portions of thedatabase management system are located on different computers. This type of configu-ration allows for a division of labour between the client machine and the server machine.The server must have sufficient memory, disk storage and processing power (usually a main-frame computer or a high speed RISC workstation) to execute and administer the database.Clients need only enough memory to execute an application or tool which accesses thedatabase over a network (usually PCs, Macintosh’s or workstations).

Hypertext is a simple concept for organizing and viewing information. It is basicallythe same as regular text – it can be stored, read, searched, or edited – with an importantexception: hypertext contains connections within the text to other documents. That is, it isa method of representing information where selected words in the text can be ‘expanded’at any time to provide other information about the word. These words arelinks to otherdocuments, which may be text, files, pictures, or audio and video clips. Any hyperlinkcan point to any document available anywhere on the Internet (see Figure1). Each of thedocuments is located at a node and thelinks are thus cross-references to these nodes. Byselecting a hyperlink, a user thus in effect goes from one node to another by means of theselinks.

Hypermedia is hypertext with a difference – hypermedia documents contain links notonly to other pieces of text, but also to other forms of media – sounds, images and movies.Images themselves can be selected to link to sounds or documents. Hypermedia simplycombines hypertext and multimedia.

World Wide Web

The World Wide Web (WWW) is officially described as a ‘wide-area hypermedia informa-tion retrieval initiative aiming to give universal access to a large universe of documents.’A good introduction to Mosaic and the World-Wide Web is given in Reference 2.

Mosaic is a global hypermedia browser that allows a user to discover, retrieve and displaydocuments and data from all over the Internet. It was developed at the National Center forSupercomputer Applications (NCSA).

It is a client for Hyper-Text Transfer Protocol (HTTP) servers. HTTP is the transferprotocol of WWW servers. It is a ‘universe of network-accessible information’, deliveredby HTTP servers, each presiding over its own information base. WWW has hypertextconventions built into its documents and transfer protocol, to make the web easy for anyoneto roam, browse and contribute to. In addition, Mosaic can also communicate with moretraditional information servers such as File- Transfer Protocol (FTP), Gopher, Wide-AreaInformation System (WAIS), and Network News Transfer Protocol (NNTP).

Global hypermedia means that information located around the world is interconnected in

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1225

Figure 1. The basic hypertext model

an environment that allows a user to travel through the information by clicking onhyperlinks– terms, icons, or images in documents that point to other, related documents.

Mosaic also features unlimited multimedia capabilities. File types that Mosaic cannothandle internally, such as Motion Picture Experts Group (MPEG) movies, Postscript doc-uments and Joint Photographer Experts Group (JPEG) images, are automatically sent toexternal viewers.

NCSA Mosaic was originally designed and programmed for the X-Window system. Thishas been extended for both the Apple Macintosh and Microsoft Window platforms.

Users interested in learning more about Mosaic, HTML and other topics can obtaininformation from the following Uniform Resource Locators (URL):HTML : An introductory reference, ‘A Beginner’s guide to HTML’, is available at:

http://www.ncsa.uiuc.edu/General/Internet/WWW/HTMLPr imer.html.Browsers: Several browsers are available over the Internet through Telnet. An up-to-datelist of these is available at:

http://info.cern.ch/hypertext/WWW/FAQ/Bootstrap.htmlWWW Servers: To learn more about WWW Servers, consult the WWW server premieravailable at:

http://www.vuw.ac.nz/who/Nathan.Torkington/ideas/www -servers.htmlGSQL: Information regarding this is availabe at:

http://www.ncsa.uiuc.edu/SDG/People/jason/pub/gsql/sta rthere.htmlWWW FAQ : Frequently Asked Questions about WWW is available at:

http://siva.cshl.org/ boutell/www-faq.html

1226 S. MAGAVI ET AL.

Although Mosaic has many advantages, it has got some limitations. The greatest limitationis the difficulty in finding resources on a particular topic or subject. Another limitation isthat the users of distributed hypermedia systems are overwhelmed by the large number oflinks.

GATEWAY STRUCTURED QUERY LANGUAGE

Previous work on the design and implementation of remote database access using the OpenSoftware Foundation Distributed Computing Environment (OSF/DCE) has been reported inReference 3. That experience has enabled us to gain a better understanding of the designand implementation issues involved in remote data access.

Gateway Structured Query Language (GSQL) is a gateway program that provides a forminterface in Mosaic to Structured Query Language (SQL).4 GSQL comprises two programs:gsql.candsqlmain.cand it contains an SQL specification file called a PROC file which isused to create a form (see Figure2) to allow a user to enter inputs.

As GSQL is still under development, it does not have the capability of the standard SQL.Also the GSQL provided support only for Sybase as a backend database server.

gsql.cis a C program which is called by a HTTP server from Mosaic. This program takesa PROC file as input and creates the frontend for the user to select the various fields inthe SQL statement, forms a SQL statement and finally calls thesqlmain.cto execute theSQL statement. The PROC file contains commands thatgsql.cuses to map components ofthe SQL query string to widgets (field, buttons, pulldown menus, etc) for user input andselection. The mapping is done by declaring a variable for each phrase of the SQL querystring that requires user input. This is usually a database field.

Initially, the HTTP server is fired up and a HTTP URL (Uniform Resource Locator isthe network extension of the basic file system which is explained in greater detail in thefollowing sections) is executed by the HTTP server. The URL is actually a shell script,which executes thegsql.ccode. Thegsql.ctakes the PROC file (part of the shell script), andfrom the input and mappings, builds the complete SQL statement. Thegsql.cthen executesthe sqlmain.cpassing the SQL statement as a command line argument. Thesqlmain.cisnow responsible for sending the SQL statement to the database server (see Figure3) andreceiving the results. The backend server queries the database and sends back the results tothe clientsqlmain.c.

Some of the fields returned by the server may actually behyperlinkspointing to differentdocuments in the network or even to non-ASCII files like Image (having .gif extension),Audio(having .au extension), Video(having .mpg extension), and so on.

The shell script which is called by the HTTP server contains the following lines:/home/grad1/magavi/gsql/httpd 1.1/sql/gsql/home/grad1/magavi/gsql/httpd 1.1/sql/course.proc $QUERY STRINGwheregsql is theC executableand course.proc is the PROC file to create the front end forour Course Modulesdatabase prototype. This shell script is fired up by the HTTP serveras:http://shazam.cs.iastate.edu:13228/shell.script

The shell script will execute thegsql.c and the HTTP server will then pass the PROCfile to gsql.c, which will parse the file and send Hyper Text Markup Language (HTML)commands to Mosaic to create the form. Thus for each PROC file there needs to be a shellscript which the HTTP server can execute.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1227

Figure 2. The Forms interface using a PROC file

1228 S. MAGAVI ET AL.

Figure 3. The interface with backend server

Since gsql.c is invoked through a shell script, the PROC file is totally hidden fromthe client. This is important since it could contain whatever information that is neededto gain access to the database (like logins and passwords). The PROC files should alsobe placed in some directory that is protected from user-access via FTP or httpd (NCSAhttpd is a HTTP/1.0 compatible server for making hypertext and other documents availableto web browsers). The HTTP server can be configured to deny access to specific directories.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1229

gsql.c program

This program

1. is executed by the HTTP server;2. executes the PROC file to form the frontend;3. forms the SQL statement when user selects from frontend;4. executes thesqlmain.cand passes the SQL statement as a command line argument.

PROC files

This contains

1. code in HTML format which, when executed, will create the frontend.2. The path for the executable name of thesqlmain.cso that it can be executed bygsql.c.

Backend program

This program

1. processes the SQL query;2. can be written in C or Perl;3. the backend program is specified in the PROC file and this is invoked by GSQL when

the user selects input from the PROC file;4. consists of three main programs.

(a) sqlmain.c. The backend executes this program. This passes the SQL statementformed by thegsql.cto the embedded SQL program.

(b) sqlutil.c. This contains the user defined functions.(c) executesql.c. This is now the client program which connects to remote database

server using Transmission Control Protocol / Internet Protocol (TCP/IP). Theremote program is an embedded SQL (Pro*C) program which gets the SQLquery from thesqlmain.c, executes it and sends the results back tosqlmain.c.This can be converted to HTML format for display on Mosaic.

DESIGN OF THE HETEROGENEOUS DISTRIBUTED DATABASE SYSTEM

In our prototype, we used the GSQL to design a client– server model where we have an Or-acle database server running on a DEC workstation, and the clients running on workstationsor PCs.

The heterogeneity of the system comes from the fact that the data stored in the databaseis text, but the individual fields in a record may be textual information, or the names ofimage, audio and/or video files which are stored on remote machines.

The users (on the client side) use the input forms provided by GSQL to query the databaseand then use the results to retrieve the information from the remote machines. The users,however, will have no idea of where the data is stored or what underlying mechanism isused in retrieving these files.

1230 S. MAGAVI ET AL.

Figure 4. Design of the HDMS

In case of image, audio and/or video files, the user has the option of executing these fileslocally through Mosaic or downloading them onto the local file system. We have modifiedsome Mosaic configuration files to include ‘execution of shell scripts within Mosaic’. Usingthis capability, if ahyperlink retrieves theexecutable nameof a software package installedon the user’s machine, the user can then execute the software package from Mosaic. It isto be noted that this paper does not discuss any of the security aspects of using softwareover a network. We have used a network simulation package, COMNET to illustrate thisaspect of our implementation.

Design objectives

The following are some of the major design objectives:

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1231

1. A modular design should be followed, so that it is possible to integrate other moduleswithout major changes to the design.

2. The user and/or the system developer should be able to perform standard databaseoperations like: SELECT, INSERT, DELETE, CREATE table, DROP table.

3. The backend should be easy to create and maintain, and should support fast retrievalof database records.

4. The program should fully exploit the impressive flexibility of Dynamic EmbeddedSQL (this will be discussed in the next section).

5. The fact that the backend may physically be located at a remote machine should betransparent to the frontend (client) user.

6. The location and the underlying mechanism used in retrieving the files like image,audio, video should be transparent to the user.

7. The backend should be developed independent of the frontend, i.e. it should be capableof interfacing with any frontend that sends a SQL statement to the backend for dataretrieval.

Database design issues

The database is the backend of the design. It should allow more than one client to accessthe database and this concurrency is achieved by ‘forking’ – a process to handle all theclient requests and the main process is waiting to receive connection requests from theclient. This uses Dynamic Embedded SQL to execute the SQL statements (see Figure5).The following modules are implemented:

1. Select Module: selects the specified records from the database.2. Add Module: adds records into individual tables.3. Delete Module: deletes records from individual tables.4. Create Table Module: create a table in the database.5. Drop Table Module: drop a table from the database.

The following steps are required to be performed by the backend.

1. Start the server and wait for client’s request. On receiving the client’s request perform:2. fork a process to handle the client’s request.3. Receive the SQL statement from the client.4. Execute the SQL statements (using Pro*C).5. Store the results into a file and ship the results back to client.

Database structure

The database engine is Oracle running on a DEC workstation. The database consists ofone main table called the ‘course table’ and following are the fields in the table:

1. NUMBER – unique course number and is ahyperlink.2. TITLE – title of the course.3. DEPARTMENT – department which offers the course.4. UNIVERSITY – university which offers the course.5. CREDITS – number of course credits.6. TEXTBOOK – course textbook.

1232 S. MAGAVI ET AL.

Figure 5. Backend database design

7. INSTRUCTOR – instructor handling the course.8. SOFTWARE – software packages taught and/or used in the course.9. HARDWARE – hardware platform required to run the above software.

10. ABSTRACT – course abstract.11. COMMENTS – general comments about the course.

Dynamic Embedded SQL

The SQL data language is anon-procedurallanguage, i.e. most statements are executedindependently of preceding or following statements, i.e. in contrast toprocedural languageslike C, Pascal, FORTRAN which are based on constructs such as ‘loops’, ‘branches’ and‘if/then’ pairs. To utilize the full capability of SQL and otherprocedural languages, SQLstatements can beembedded in procedural languages.

The Oracle Relational Database Management System (RDBMS) includes the Pro*C6 tool,which is designed to convert a C program that embeds SQL statements. This C program canaccess and manipulate data in an Oracle database. As aprecompiler, Pro*C converts theEXEC SQL statements found in its input file to appropriate Oracle calls in an output file.The output file can subsequently be compiled, linked, and executed in the normal mannerfor C programs.

Dynamic defined statementsare SQL statements which are not known at compile time;the statement can, and probably will, change from execution to execution, whileStaticdefined statementsare known at compile time and are hard coded into the program. In ourprototype, all SQL statements are executed dynamically as nothing is known about thesestatements till run time.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1233

IMPLEMENTATION ISSUES

Our prototype should achieve the two most important goals of a HDMS, namely the trans-parency and the autonomy of local database and present the entire system as one integrateddatabase from the user’s perspective. In order to demonstrate these aspects, we have storedall text, image, audio and video files at separate remote machines and included these namesas part of the records in our main database.

Transparency and autonomy

As stated earlier, the database may contain the names of files (image, audio, video, etc)which are actually located at remote machines. However the location of these files is trans-parent to the user. In order to achieve this transparency, we use the concept ofUniversalResource Locators(URLs).

URLs can be thought of as a networked extension of the standard filename concept: notonly can a user point to a file in a directory, but that file and that directory can exist on anymachine on the network, can be served via any of several different methods, and might noteven be something as simple as a file: URLs can also point to queries, documents stored deepwithin databases, the results of anarchie search (a network service that searches FTP sitesfor files) or afinger search (a service that responds to queries and retrieves user informationremotely) command, etc. The design of these addresses (URLs) is as fundamental to WWWas hypertext itself. This flexibility allows the web to envelop all the existing data in FTParchives , news articles and WAIS and Gopher servers. The following is an example toshow that a file called ‘foobar.mpg’ on HTTP server ‘www.yoyodyne.com’ in directory‘/pub/files’ corresponds to this URL:

http://ftp.yoyodyne.com/pub/files/foobar.mpgSince for our prototype we know where the various files are located, we attach the

appropriate URL to that file name before displaying it on Mosaic. However, the user seesonly the file name on the display. For example, suppose all the video files (having .mpgextension) are stored at a remote machine say ‘class1.iastate.edu’ and the file is stored in‘/srmagavi/VIDEO/foobar.mpg’, then the URL for this machine will be:

http://class1.iastate.edu/srmagavi/VIDEO/where http (hyper text transfer protocol) is the low overhead WWW protocol that accessesthis file.

Thus when the user requests any video file, the GSQL appends the appropriate URL tothat file. When the user selects the file through Mosaic, the http server then retrieves this filefrom the remote machine specified by the URL and either executes the file or downloads itdepending on the user’s selection. The user is thus unaware about the location of the dataand from his/her point of view, this appears as a local file.

Autonomy is achieved by the fact that the repositories have complete control over the datastored locally and can decide who may access them or what operations can be performedon them.

Note. The user/system developer has two options for storing the file names:(a) Just the file name is stored. In this case the URL is appended to the file name based onthe file extension.(b) The file name along with its URL can be stored in the database.

1234 S. MAGAVI ET AL.

Integrated system

The data retrieved from the database as a result of the user’s query is displayed as HTMLcode on Mosaic. Since the data is transparent and autonomous, all this information appearsto the user as if this is local data. The user is also unaware about the underlying mechanisminvolved in sending SQL statements and receiving the results from the remote database.Mosaic then hides the process of executing the remote files or the process of downloadingthese files. To the user, this appears as one integrated system providing all services requiredfor an application.

Executing shell scripts inside Mosaic

As stated earlier, we have modified some Mosaic configuration files to include ‘executionof shell scripts within Mosaic’. This section discusses its implementation. We now explainhow Mosaic’s multimedia configuration works and how this concept is used to execute theremote software package on the user’s local machine.

When a hyperlink (or anchor) points to a data file that is not HTML or plain text, suchas an image or sound file, Mosaic attempts to use an external program to display the imageor play the sound. If Mosaic cannot find an appropriate external viewer, it prompts the userfor a filename under which to save the data file, in case the data is needed outside Mosaic.An example is when a user is ‘ftping’ a file from a remote site using Mosaic, saytest.tar.Mosaic brings up a window wherein the user types in the name of the file and specifies thepath to store the file. However, if it is a gif file, then it will execute the file just as executingthe commandxv file.gif on the local machine, wherexv is the X-Window system viewerprogram. Mosaic uses a two-step process to determine what external viewer to use:

1. The Multipurpose Internet Mail Extensions (MIME) (which is a mechanism for spec-ifying and describing the format of Internet message bodies) type of the incoming fileis determined, either according to its file extension or as specified by the document’sserver. If Mosaic must rely on the file extension, it either uses a built-in default list,or a user-configurable extension map file.

2. Mosaic matches the incoming file’s MIME type to an external viewer.

Determining MIME type of incoming file

If the file extension is specified by the Mosaic server, it uses the default list where thefile types are mapped to the corresponding MIME type. For example, all files of type GIF(having a .gif extension), are mapped to the MIME type:image/gif. However, if the userhas specified a file extension, then it uses the user configured extension file, namely the‘.mime.types’. The extension map files which map file extensions to MIME types are createdin the following way.

1. Mosaic is configured to map some extensions to MIME types by default. To turn offthese defaults, the X resource useDefaultExtensionMap can be set to false. However,this is not advisable as it is overridden locally as explained below.

2. There can be a global extension map; the X resource globalExtensionMap gives thefilename. The default is:/usr/local/lib/mosaic/.mime.types.This is generally the location for system or site-wide viewer configuration.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1235

3. There can also be a personal extension map called the.mime.type. This is used forlocal user configurations.

4. Entries in the personal extension map take precedence over entries in the globalextension map, which in turn take precedence over the built-in defaults.

Matching incoming file’s MIME types to external viewers

A mailcap file is a configuration file that maps MIME types to external viewers. Themailcap files are created in the same manner as external map files, but the filenames are.mailcap instead of.mime.types. For example, MIME typeimage/gif is mapped to theexternal viewerxv. This is entered in the .mailcap file as:

image/gif; xv %swhere ‘%s’ is the name of the file pointed to by thehyperlink.

Using the above information, we modified the configuration to execute remote softwarepackages on the user’s local machine. This is done in the following manner:

1. An entry application/x-csh cshis placed in the user or system extension map toassociate extension.csh with type application/x-csh, so a file with extension.csh,like test.csh, can now be accessed through remote or local machine and mapped torequired MIME type. We stored this in the .mime.types file.

2. An entry is placed in the user or system mailcap that appears as:application/x-csh; csh -f %sThis was added to the .mailcap file. Thus,csh -f will be used as the ‘viewer’ forthe document, which means the shell script, whatever it happens to contain, will beexecuted on the client’s host.

3. Meanwhile the file on the remote machine, which is pointed to by thehyperlink,should contain the executable name of the software package and the filemust havea .csh extension.

Client–server model for GSQL

The GSQL as available on Mosaic provides just the bare framework for programmers tobuild applications. The developers provide thegsql.cprogram, which parses the user inputsfrom a PROC file and builds an SQL statement. The programmer is then responsible forwriting the backend program, namelysqlmain.c, which will receive the SQL statement asan input parameter. This backend program can now be used to contact other hosts usingany appropriate communication protocols to perform some function.

In our implementation, the backend program contacts the Oracle server, sends the SQLstatement and waits for the results of the query. The results are then displayed on Mosaic asHTML code. Thegsql.candsqlmain.cprograms are on one host, namely HP workstationsand the Oracle server runs on a DEC workstation (see Figure6).

Design modularity

As seen in Figure6, there are three distinct modules:

1. gsql.c. This parses user inputs and forms the SQL statement.

1236 S. MAGAVI ET AL.

Figure 6. Client–server model

2. SQL Query. This sends the SQL statement to the server, receives the results anddisplays it. It determines the file extensions and if the URL is not included in the filename, it appends the appropriate URL to the file name.

3. Oracle Server. This consists of two modules: one for receiving the SQL statementfrom the frontend and the other for executing this SQL statement using DynamicEmbedded SQL.

The modularity lies in the fact that an application can now replace the ‘SQL Query’ mod-ule on the frontend(client) and the SQL receiving module on the backend(server) completelywithout requiring any change in the design of the other modules. For example, the abovetwo mentioned modules now have codes which use the TCP/IP communication protocol.The two modules can be replaced with modules using the Remote Procedure call (RPC)communication protocol. Both applications can still use the module which uses DynamicEmbedded SQL to execute the SQL query.

The fact thatgsql.cparses the PROC files and forms the SQL query without any relationto the other modules, adds a lot of modularity, as the user can now modify thegsql.c tocreate new functions and write PROC files to test whether the SQL query is generatedcorrectly before contacting thesqlmain.c.

Modifying GSQL

The gsql.c program as available on Mosaic is written to form SQL statements whichare Sybase specific. Since we want to experiment with theserver running on a differentdatabase, the GSQL (in particular thegsql.c) had to be modified to produce SQL statementsspecific to Oracle.

Since thegsql.c is essentially a parser which maps the widget types to the user inputsand forms the SQL statement, we had to tailor it to the point where it reads the user inputand stores the information in a temporary buffer which would later be concatenated in theorder as required by Oracle SQL statements. One specific example of the change is wherethe ‘blank spaces’ in text are changed to a ‘+’ in Sybase, whereas this is just left as ‘blank

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1237

spaces’ in Oracle SQL. We modified thesqlmain.cto incorporate an interactive session toget the server name or Internet Protocol (IP) address and port number from the user.

Adding routines

The only SQL statement which can be formed or the only operation that can be performedwith the GSQL is the ‘SELECT operation’. The following is an example of a SQL SELECTstatement formed by GSQL.SELECT course.num, course.title, course.dept, course.univ, course.credits,course.textbook, course.instructor, course.software, course.hardware,course.abstract, course.commentsFROM courseWHERE course.num = ’COM S 586’

In order to improve the flexibility of using GSQL with the Oracle database, routines forinserting and deleting records as well as creating and droping tables from the database wereadded to GSQL. Thegsql.cprogram can now parse PROC files and produce SQL INSERTand DELETE statements. As mentioned earlier, a PROC file has a shell script which isexecuted by the HTTP server. Since there are separate PROC files for insert and deleteoperations, each of these will now have a separate shell script. The user has an optionto choose these operations from the initial form as seen in Figure7. The following areexamples of the INSERT and DELETE statements formed by GSQL.INSERT INTO course ( course.num , course.title , course.dept , course.univ , course.credits, course.textbook , course.instructor , course.software , course. hardware , course.abstract ,course.comments ) VALUES ( ’COM S 552’, ’Principles of Operating Systems’, ’COM S’,’IOWA STATE UNIVERSITY’, ’3’, ’ADVANCED OPERATING SYSTEMS CONCEPTS- MAEKAWA AND OLDEHOEFT’, ’WONG JOHNNY’, ’NONE AVAILABLE’, ’NA’,’Theory and implementation of operating system concepts’, ’Pre Requisite COM S 352 orequivalent’)

DELETE FROM course WHERE course.dept = ’COM S’ AND course.credits< 3

CREATE TABLE hospital (id char (10), lastname char (20), firstname char (20), sex char(1), ssnum char(11), address char(30), city char (10), state char (2), birthday char(8))

DROP TABLE course [This removes the TABLE course].

TESTING THE PROTOTYPE

We tested the prototype under the following conditions:

1. The different file types (text, image, audio and video) are stored on different machineson the network.

(a) image: at vincent1.iastate.edu/nisbett(b) audio: at las2.iastate.edu/wong(c) video: at las3.iastate.edu/archie(d) text: at shazam.cs.iastate.edu/magavi (The text containshyperlinks to the soft-

ware packageCOMNET .)

1238 S. MAGAVI ET AL.

Figure 7. The Initial form for user selection

2. An Oracle server is started on a DEC workstation. This now waits for the client tosend the SQL statements over the network using TCP/IP.

3. Mosaic is started on the HP workstations and using the INSERT routine in GSQL,the records with the file names are inserted into the database. Some of them are justthe file names and some are complete URL’s.

4. We selected the records using the SELECT routine in GSQL, and then executed theindividual files through Mosaic. Mosaic was successful in retrieving these remote filesand executing them locally.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1239

5. We also successfully tested the other database routines like DROP / CREATE tableand DELETE records.

CONCLUSION

Summary

Our hypertext based prototype results in two gains: a simple integration strategy thatpreserves repository autonomy, handles data transparency and provides a powerful tool forsystem developers to build such an application in an efficient and easy manner.

The GSQL thus meets the needs of this project, that of providing to the user an ‘Integratedheterogeneous distributed multimedia system’. When compared with existing GUI softwarelike the X- Window system, GSQL is extremely easy to design and implement. To build asimilar application using the X-Window system, is extremely difficult as now the systemdevelopers would have to simulate the concept ofhyperlinksand the other attractive featuresof Mosaic like local execution of remote files. TableI compares our prototype using GSQLwith a GUI tool like the X-Window system.

Added to its ease and modularity in design, it allowshyperlinks to be part of the dataretrieved from the database. This is a very important feature as now a system administratorneed not store all the information on one computer, but rather store it in a distributed mannerand provide appropriatehyperlinksto it. This capability gives our prototype a definite edgeover other GUI implementations in terms of multimedia data retrieval and storage. With newroutines for Inserting/Deleting records and Creating/Dropping Tables from the database, itadds a lot of flexibility.

It is easy and simple to use GSQL to develop frontends and the user need not be concernedabout the backend. We also tested this on the PC running MS-Windows and by specifyingthe URL on the PC, we obtain the same form as in the UNIX machine and we are ableto perform the same functions as on the UNIX platforms, assuming that the httpd serveris already running on one of the UNIX machines. This is because ‘forms capability’ isprovided on the PCs, while such a facility will later be released for the Macintosh.

Future work

One problem we encountered is when a new record is to be inserted in the database havingmore than one table. Since all the tables will be linked by some common field(s), it isimperative that a record be inserted into each table even though it has a NULL value. Thereason for doing this is because, when a SELECT is done on the database, then in orderto search for that specific field, a join on all the tables has to be done using the commonfield(s). If there are records in all but one table(assuming a record was not inserted in oneof the tables), then the search would fail, even though there is a record found with the givensearch criteria. If there are large number of tables in a database, then to insert a record intoone table, should the user then have to insert into all the tables? If so, it defeats the purposeof user flexibility and a way has to be found to overcome this disadvantage.

We should also look into the possibilities where we have the tables of one database ondifferent machines and still provide the same functionality, that is, the user should still beable to use all the functions without having to locate these tables.

1240 S. MAGAVI ET AL.

Table I. Comparing GSQL with other client/server design using GUI

Issues GSQL Other GUI(such as the X-Window system)

Development time Extremely easy to build Takes considerably longer timea user input form to build a similar input form

Data Transparency The WWW protocols used Systems developer will have tothrough Mosaic provide develop such protocols todata transparency provide data transparency

Local execution Remote files like image, Such a facility is not in builtaudio, and video can be and has to be developed ifexecuted locally without neededany programming effort

Design Flexibility Provides for only five Extensive type of widgets andtypes of widgets can create complex types of

widgets from existing widgetsModularity Provides excellent mod- Would require more development

ularity in design. Both time to build a comparablethe frontend and the modular design. Also it is verybackend can be built difficult to build and test eachseparately and tested of the modules separtelyindependently. It isalso very easy to inco-rporate modules intoexisting designs

Forming SQL Thegsql.c is a parser The developer has to write astatements which takes the PROC parser to build the SQL statement

file and user inputs and from the user inputsforms the SQL statement

Data types Since the data is to be Does not have the facility todisplayed on Mosaic, the providehyperlinksfrom thefields can behyperlinks data retrieved from the backendpointing to other typesof data like images, audioand video

Learning Curve Extremely easy to learn and Considerably steeper than that ofbuild user input forms GSQL. User has to be proficient

in C and/or C++

REFERENCES

1. J. Noll and W. Scacchi, ‘Integrating diverse information repositories: a distributed hypertext approach’,Computer, 24(12), 38–45 (1991).

2. J. Ronald Vetter, C. Spell and Charles Ward, ‘Mosaic and the World-Wide Web’,Computer, 27(10), 49–57(1994).

3. J. Wong, B. Marshall and R. Goodman, ‘Remote database access in the distributed computing environment’,Software Practice and Experience, 24(4), 421–434 (1994).

4. SQL Language Reference Manual Version 6.0, Oracle Corporation, 1990.

HETEROGENEOUS DISTRIBUTED MULTIMEDIA SYSTEM 1241

5. D. F. Ullman,Principles of Database Design, Computer Science Press, Rockville, Maryland, 1992.6. Oracle Pro*C Users Manual Version 1.3, Oracle Corporation, 1990.7. R. Stevens,Unix Network Programming, Prentice Hall, Englewood Clifs, NJ, 1992.8. D. Comer,Internetworking with TCP/IP, vol. 3, Prentice Hall, Englewood Clifs, NJ, 1992.9. D. Young,The X-Window System. Programming and Applications with Xt OSF/Motof Edition, Prentice Hall,

Englewood Clifs, NJ, 1990.10. A. Kelly and I. Pohl,C by Dissection, The Benjamin/Cummings Publishing Company, Inc., 1992.11. Digital Equipment Corporation Multimedia Strategy, Digital Equipment Corporation, June 1991