2
116 Computer WEB TECHNOLOGIES T he Web has become the preferred medium for many database applications, such as e-commerce and digital libraries. These applications store information in huge databases that users access, query, and update through the Web. Database-driven Web sites have their own interfaces and access forms for creating HTML pages on the fly. Web database technologies define the way that these forms can connect to and retrieve data from data- base servers. The number of database-driven Web sites is increasing exponentially, and each site is creating pages dynami- cally—pages that are hard for tradi- tional search engines to reach. Such search engines crawl and index static HTML pages; they do not send queries to Web databases. The information hidden inside Web databases is called the “deep Web” in contrast to the “surface Web” that tra- ditional search engines access easily. DATABASE CONNECTIVITY Querying via direct Structured Query Language is one of the most common ways to access a database. As the number of database servers and query interfaces increased, application developers needed a standard method to access different databases. To this end, Microsoft developed the Open Database Connectivity standard. The ODBC interface defines a common way to connect and log on to a database management sys- tem; a standardized representation for data types; and libraries of ODBC API function calls that let an application con- nect to a DBMS, execute SQL statements, and retrieve results. A program can use ODBC to read data from a database without target- ing a specific DBMS. All the program needs is the vendor-supplied ODBC driver to link to the required database. ODBC levels of conformance To enable applications and drivers to implement portions of the ODBC API specific to their needs, the standard defines conformance levels for drivers in both the API and the SQL grammar. The ODBC API defines three con- formance levels: A core set of functions corresponds to the functions in the X/Open and SQL Access Group Call Level Interface specification; level 1 and level 2 functions extend the core set. The ODBC SQL grammar also has three conformance levels: minimum, core, and extended. Each higher level provides more fully implemented data definition and data manipulation lan- guage support. After ODBC’s success, Microsoft introduced OLE DB, an open specifi- cation designed for accessing all kinds of data—not only data stored in DBMSs. Java Database Connectivity The Java platform subsequently called for a new connectivity standard, and the result was Java Database Connectivity. The JDBC interface pro- vides the same functionalities as ODBC. JDBC-ODBC bridges enable developers in non-Java environments to use JDBC drivers to connect to data- bases. DATABASE-TO-WEB CONNECTIVITY A Web database environment con- sists mainly of a Web browser (the client), a Web server that understands HTTP, and a DBMS that understands SQL. Database-to-Web connectivity re- quires a layer between HTTP and SQL that can translate between them. Several connectivity technologies have emerged that differ primarily in which part of the architecture sends queries to the database. Figure 1 gives a sim- ple view of system architectures based on either two-tier or three-tier tech- nologies. Two-tier technologies A two-tier architecture accomplishes the database-to-Web integration in a client tier, consisting of a Web browser and Web server, and a server tier, con- Databases Deepen the Web Thanaa M. Ghanem and Walid G. Aref, Purdue University Online databases continually generate Web content that users can only access through direct database queries.

Databases deepen the Web

  • Upload
    wg

  • View
    221

  • Download
    5

Embed Size (px)

Citation preview

Page 1: Databases deepen the Web

116 Computer

W E B T E C H N O L O G I E S

T he Web has become the preferred medium for many database applications, such as e-commerce and digitallibraries. These applications

store information in huge databasesthat users access, query, and updatethrough the Web. Database-drivenWeb sites have their own interfaces andaccess forms for creating HTML pageson the fly. Web database technologiesdefine the way that these forms canconnect to and retrieve data from data-base servers.

The number of database-driven Websites is increasing exponentially, andeach site is creating pages dynami-cally—pages that are hard for tradi-tional search engines to reach. Suchsearch engines crawl and index staticHTML pages; they do not send queriesto Web databases.

The information hidden inside Webdatabases is called the “deep Web” incontrast to the “surface Web” that tra-ditional search engines access easily.

DATABASE CONNECTIVITYQuerying via direct Structured

Query Language is one of the mostcommon ways to access a database. Asthe number of database servers andquery interfaces increased, applicationdevelopers needed a standard methodto access different databases. To thisend, Microsoft developed the OpenDatabase Connectivity standard. TheODBC interface defines

• a common way to connect and logon to a database management sys-tem;

• a standardized representation fordata types; and

• libraries of ODBC API functioncalls that let an application con-nect to a DBMS, execute SQLstatements, and retrieve results.

A program can use ODBC to readdata from a database without target-ing a specific DBMS. All the programneeds is the vendor-supplied ODBCdriver to link to the required database.

ODBC levels of conformanceTo enable applications and drivers

to implement portions of the ODBCAPI specific to their needs, the standarddefines conformance levels for driversin both the API and the SQL grammar.

The ODBC API defines three con-formance levels: A core set of functionscorresponds to the functions in theX/Open and SQL Access Group CallLevel Interface specification; level 1and level 2 functions extend the coreset. The ODBC SQL grammar also has

three conformance levels: minimum,core, and extended. Each higher levelprovides more fully implemented datadefinition and data manipulation lan-guage support.

After ODBC’s success, Microsoftintroduced OLE DB, an open specifi-cation designed for accessing all kindsof data—not only data stored inDBMSs.

Java Database ConnectivityThe Java platform subsequently

called for a new connectivity standard,and the result was Java Database

Connectivity. The JDBC interface pro-vides the same functionalities asODBC. JDBC-ODBC bridges enabledevelopers in non-Java environmentsto use JDBC drivers to connect to data-bases.

DATABASE-TO-WEB CONNECTIVITYA Web database environment con-

sists mainly of a Web browser (theclient), a Web server that understandsHTTP, and a DBMS that understandsSQL.

Database-to-Web connectivity re-quires a layer between HTTP and SQLthat can translate between them.Several connectivity technologies haveemerged that differ primarily in whichpart of the architecture sends queriesto the database. Figure 1 gives a sim-ple view of system architectures basedon either two-tier or three-tier tech-nologies.

Two-tier technologiesA two-tier architecture accomplishes

the database-to-Web integration in aclient tier, consisting of a Web browserand Web server, and a server tier, con-

Databases Deepen the WebThanaa M. Ghanem and Walid G. Aref, Purdue University

Online databases continuallygenerate Web content thatusers can only access throughdirect database queries.

Page 2: Databases deepen the Web

sisting of the DBMS. Three technolo-gies prevail in this architecture.

Common gateway interface. CGI isa program that runs on the server tierand handles all the transformationsbetween HTTP and SQL. The Webdatabase access form includes a link tothe CGI program. When a client ref-erences the HTML form, the Webserver extracts the query parametersand forwards them to the CGI pro-gram on the server.

The CGI program reads the para-meters, formats them appropriately,and sends a query to the database.When it receives the query result, theCGI program formats it as HTMLpages and sends it back to the Webserver.

Perl and JavaScript are two amongmany popular languages for writingCGI programs.

Java applets. Applets are Java pro-grams that Web browsers can loaddynamically and execute on the client’sWeb browser. Applets use JDBC toconnect directly to the DBMS via sock-ets and thus do not require a Webserver.

Server side includes. SSI is codewritten inside an HTML page andprocessed by the Web server. When aclient invokes the HTML page, theWeb server executes the scripts, whichin turn use ODBC or JDBC to readdata from the DBMS. The Web serverthen formats the data into HTMLpages and sends it back to thebrowser.

Examples of SSI scripting languagesare ASP, ASP.NET, JSP, PHP and Cold-Fusion.

Three-tier technologiesThree-tier architectures add a mid-

dleware tier—the application server—between the client and server tiers. Themiddleware handles all applicationoperations and connections for theclients, including data transfer betweenthe Web server and DBMS.

Application servers offer otherfunctions such as transaction man-agement and load balancing. Some

implementations merge the applica-tion and Web server into one Webapplication server.

Examples of application serversinclude IBM Websphere, Oracle 9i,and Sun ONE.

DEEP WEB SEARCH ENGINESA July 2000 survey by BrightPlanet

(www.brightplanet.com) showed thatthe deep Web includes about 550 bil-lion pages and that public informationhidden in it is 400 to 500 times largerthan what users can access through thesurface Web. Access to information inthe deep Web comes only throughdatabase interfaces and queries.

Comprehensive information re-trieval requires simultaneous search-ing of multiple surface and deep Webresources. Deep Web search enginesaim to identify, retrieve, and classifysuch content. They work by rephras-ing the query and sending it simulta-neously to multiple databases in realtime.

Current commercial products in thisarea include BrightPlanet’s Deep QueryManager (DQM2), Quigo Techno-logies’ Intellisonar (www.quigo.com/intellisonar.htm), and Deep WebTechnologies’ Distributed Explorit(www.deepwebtech.com/dexpl.shtml).

In the research community, EduMedis a project at Purdue University (www

.cs.purdue.edu/edumed/). EduMedhelps users find and query online med-ical databases uniformly. It is built ontop of an extensible prototype multi-media database management systemcalled VDBMS (www.cs.purdue.edu/vdbms/).

W e expect deep Web searchengines and technologies toimprove rapidly and to dra-

matically affect how the Web is usedby providing easy access to many moreinformation resources. �

Thanaa M. Ghanem is a graduateresearch assistant in the Departmentof Computer Sciences at Purdue Uni-versity. Contact her at [email protected].

Walid G. Aref is an associate profes-sor in the Department of ComputerSciences at Purdue University. Contacthim at [email protected].

January 2004 117

Client tierWeb browser + Web server

ODBCHTTP

Middleware tierApplication server

ODBC

Client tierWeb browser + Web server

(Applets or SSI)

ODBCSQL

Server tierDBMS(CGI)

SQL

Server tierDBMS

Three-tier architecture

Two-tier architecture

Figure 1. Database-to-Web connectivity: (a) two-tier technologies and (b) three-tier technologies.

Editor: Sumi Helal, Computer and Informa-tion Science and Engineering Dept., Univer-sity of Florida, P.O. Box 116125, Gainesville,FL 32611-6120; [email protected]