Extending Data Processing Capabilities of Relational ...janikow/publications/2003/Extending Data...Extending Data Processing Capabilities of Relational Database Management Systems

Extending Data Processing Capabilities of RelationalDatabase Management Systems.

Igor WojnickiUniversity of Missouri – St. Louis

Department of Mathematicsand Computer Science

8001 Natural Bridge RoadSt. Louis, MO 63121–4499

phone: (314) 516-6353fax: (314) 516-5400

e-mail: [email protected]

Cezary Z. JanikowUniversity of Missouri – St. Louis

Department of Mathematicsand Computer Science

8001 Natural Bridge RoadSt. Louis, MO 63121–4499

phone: (314) 516-6352fax: (314) 516-5400

e-mail: [email protected]

Abstract Relational Database Management Sys-tems proven to be robust and efficient for storingand retrieving data. However, they have limita-tions. Sophisticated data processing (a rule-basedprocessing with inferences) is possible but hard toimplement. To overcome these problems, severalmethods have been developed. Data can be processedby a client application (a client-side processing)or by a RDBMS server (a server-side processing).These methods overcome the main limitations, butsuch solutions are not flexible, hard to modify or al-ter. This paper presents a novel approach, a case ofserver-side processing with the use of a declarativelanguage. A program, written in Prolog, is decom-posed into relations, and stored in the database. Itcan be easily managed, modified or altered. It hasaccess to database relations, and it also has an abil-ity to create relations on demand, so-called dynamicviews or Jelly Views. Jelly Views serve as tempo-rary relations holding data obtained as the result ofthe program. The communication interface betweenthe client and the database is preserved, it is stillSQL.

Keywords: RDBMS, Prolog, deductivedatabase, inference

1 The Relational Model

The relational model represents a database asa collection of relations [1, 2, 3]. Each relationis characterized by its name and attributes. Aset of attribute values for a certain relation iscalled a tuple. A value of the attribute is de-termined by the attribute domain (data type).A domain is a set of atomic (indivisible) val-ues. A relation schema is defined as a relationname (R) and a list of attributes(Ai):

R(A1, A2, . . . , An)

The domain is defined for each attributedom(Ai). A relation state (or just relation)of the relation schema R(A1, A2, . . . , An), de-noted by r(R), is a set of tuples t1, . . . , tm,where each tuple is an ordered list of n valuest =< v1, . . . , vn >, and each value vi is an ele-ment of dom(Ai) or null (does not exist). Thevalues in tuple t are referred to as t[Ai], so bythe attribute’s name. Relations represent factsabout entities or facts about relationships be-tween relations. A relational database schemais a set of relation schemas: S = R1, . . . , Ro

2 RDBMS Boundaries

Each Relational Database Management Sys-tem uses two languages: Data Definition Lan-

guage for defining a relational database schemaand Data Manipulation Language for storingand retrieving data.

SQL has evolved as a high level, robust andcompact DDL and DML query language. Thepower of SQL lies in its declarative nature.A query can be expressed in a logical formwith little care for procedural processing. Itgives meaningful and highly abstractive wayfor querying. But here is a limitation. SQLoffers an abstract interface allowing to build so-phisticated queries, but more complicated dataprocessing is out of reach. In particular, thefollowing disadvantages can be identified:

• lack of more sophisticated, i.e. rule-baseddata processing,

• a query can not be recursive,

• searching for acceptable solutions is lim-ited.

Rule-based processing requires rules, usu-ally the a form of decision trees. Decisionsare based on information acquired from adatabase. At each node of such a tree, thedatabase is queried and the replay is comparedwith appropriate values (or set of values) in thenode. Depending on the comparison, the nextnode of the tree is taken and the process contin-ues. Rule-based processing is a core part of themost Expert and DDB (Deductive Database)systems [4, 5, 2].

Recursive queries, in general, regard search-ing for data which is stored in some tree-like orgraph-like way. The Traveling Salesman Prob-lem can be an example. A relation holds infor-mation about distances between neighboringcities, there is no data redundancy. A relaxedquestion is: find the shortest path between twogiven cities. If the question was asked withknown number of cities on the path, it wouldbe issued as an ordinary query. However, usu-ally the number of cities is not known in ad-vance. SQL3 (known also as SQL99) standardallows recursive queries. However, this featureis very rarely implemented (it is implementedin DB2 [6]).

The Acceptable Solutions problem is anotherapplication where RDBMS can not be applieddirectly. It can be also called a Reverse Ag-gregation. Let’s use the same Traveling Sales-man Problem. The question would be: find allpaths shorter then some value. Each path is alist of cities. As a result, the replay would be aset of paths. Each path would be shorter thanthe given in the query value.

There are several applications where theseproblems appear:

• phone billing: this usually involves so-phisticated rule-based processing, cost ofconnections is calculated basing on differ-ent variables: day time, number of activephone numbers of the customer, callingplan, length of the call etc.; rules are sub-ject to change,

• flight planes, trip planning (the TravelingSalesman Problem in general),

• budget planning, alternative flight/triproutes.

These problems can be tackled using eitherclient-side processing, or server-side processingwith functions.

Client-side processing is based on serializa-tion of queries and taking actions accordinglyto partial replies. The process is as follows: theclient sends a query, the RDBMS replies, de-pending on the reply the client sends anotherquery. The logic is usually hard-coded into theclient. There are a few drawbacks of such asolution:

• the inference process involves manyqueries,

• queries are not optimized by the RDBMS(series of simple queries and travelthrough a decission tree instead of fewmore complex queries),

• with the decision tree hard-coded into theclient, it is hard to modify the logic,

• only dedicated client software is able toconduct the inference process, and thus

the principle of a database as a universalsource of information fails.

Such a solution is implemented in expert sys-tems, with an external RDBMS data source.

Server-side processing seems to be more ro-bust, but there is a need for support fromRDBMS in the form of additional languages,usually procedural ones. Using such a lan-guage, a function can be written. A user querycan use such a function as a data-source. Com-paring to client-side processing, the number ofqueries is reduced, a function is stored and pro-cessed by the RDBMS so it is optimized, andthe database remains an universal source of in-formation. Still, there are a few drawbacks:

• the function’s language is RDBMS spe-cific, can be used only for dedicatedRDBMS,

• logic, i.e. decission tree, is hard-codedwithin the function code, inflexible, hardto modify,

• the input and output for the function haveto be known when the function is defined.

3 Relational Model vs Logic

In general, knowledge processing involves twokinds of data: extensional and intensional.

Extensional knowledge is just bare facts,while intensional defines relationships amongfacts. Logic can be used to express extensionaland intensional knowledge as well [7]. To for-malize syntax for writing logical sentences, thealphabet has to be stated. Data (or knowl-edge) is stored as predicates. A predicate p isdefined as a symbol denoting appropriate rela-tion among n number of arguments:

p/n

where n is called its arity. A fact compoundof n values, concerning predicate p is named asimple clause:

p(a1, . . . , an)

where, < a1, . . . , an > is a tuple (ordered list ofvalues). There is an analogy between a relationand a clause. (in terms of the Relational Modelsee Sec. 1) Both of them describe facts usingtuples. In the case of the Relational Model, avalue in the tuple can be referenced using ap-propriate attribute name or its position, whilefor the Logic Programming only position canbe used. Concluding, any Relational Model tu-ple can be represented as a simple clause, socompatibility between the relational databasesand logic programs has been stated.

Furthermore logic programs can be con-structed with so-called complex clauses. Sucha clause doesn’t express facts explicitly, butdefines relationships among them (intensionalknowledge), with a conclusion.

con(C1, . . . , Cn) ← c1(Ci . . . , Cj) 41 . . .

. . .4r cm(Ck, . . . , Cl).

Here, con is the concluding clause with arityn; 4 are logical operators joining predicatescq (preconditions). Predicates such as cq arematched against their clauses. If such a matchis found, appropriate facts are set and the con-cluding tuple is passed as a new fact. As aresult, a non-deterministic conclusion can bedrawn; a single complex clause can generatemultiple conclusions, and new facts. Function-ality of complex clauses is not supported by theRelational Model.

Logic Programs provide a uniform languagefor representation of databases and even more,enabling additional expressive power in theform of intensional knowledge.

4 Logic Programs in RDBMS

The main proposed improvement to RDMBS isadding capabilities to store and process inten-sional knowledge, along with the extensionalone. The inference engine will be coupledwith the data source, and available to a clientthrough standard SQL queries [8]. This ap-proach addresses the major disadvantages of

current RDBMS (see Sec. 2): limited rule-based processing, recursive queries, searchingfor acceptable solutions.

To keep the architecture flexible and easy tomodify, the following assumptions are taken:

• logic programs are used to construct aview called a Jelly View,

• the view is created on demand,

• the query language is preserved, the viewis transparent – not different from otherRDBMS views.

As a result, RDBMS has its functionality ex-tended toward those of deductive databases(DDB).

The chosen syntax is that of the Prolog lan-guage. This provides all necessary mechanismsfor declaring intensional knowledge, and it alsoprovides some features of procedural languagesthat allow to control the inference process. Inturn, this allows to create logic programs andoptimize them. A Prolog language programis similar to pure logical programs, with sometechnical differences [9, 7] (see Sec. 3). To ac-complish the coupling of the Prolog programwith RDBMS, that program must be repre-sented in a way complying with the RelationalModel, including the normal forms. There arefollowing challenges to face:

• Internal Matching — matching betweenrelations and clauses, enabling access todata (extensional knowledge) from thelogic program,

• External Matching — stating the goal forthe prolog program to generate a JellyView,

• Logic Program itself, which should be de-composed to meet the normal form re-quirements.

These challenges can be met by the threemain parts of a Prolog program: extensionalknowledge (Internal Matching), intensionalknowledge (Logic Program) and goal (ExternalMatching).

Let’s define a behavior of the system. Allqueries requiring a Jelly View have to be pro-cessed. Once such a query appears (detectedby External Matching routines), the Logic Pro-gram should be built. All extensional data isavailable to the Logic Program through the In-ternal Matching mechanism. Then the infer-ence engine generates the necessary Jelly View.Upon the completion of the query, all dynamicdata (the Jelly View) is assumed to be nolonger valid and thus it is removed. If a querydoesn’t refer to Jelly View, it is passed on tothe RDBMS, and the response is passed to theuser. In other words, the system in transparentto the user.

5 Decomposition

Decomposing the three main components ofa Jelly View : External Matching, InternalMatching and Logic Program, is a crucial prob-lem. They must comply with the relationalmodel. One possible approach is illustrated inthe ER diagram in Fig. 1.

External Matching matches a relation nameand a clause name. If a relation name is foundin table-clause.table, a Jelly View for thisrelation is generated. The Jelly View has at-tributes defined by the entity argument. Togenerate the view, the clause name is speci-fied (table-clause.clause) and subsequentlyused as goal for the inference engine. The ar-ity of the goal predicate is calculated from thenumber of arguments (the relation argument).

Internal Matching is a bridge between theinference engine and the RDBMS, it handlesextensional knowledge. A predicate can bemapped on a relation. If the inference en-gine requests a simple clause dealing with thepredicate named clause-table.name of ar-ity clause-table.arity, the clause is gen-erated from the data stored in the relationclause-table.table. This is the way exten-sional knowledge stored in the RDBMS is ac-cessed by the inference engine.

The goal for the logic program has to bespecified. It is held by the clause relation.

table-clause

table

clause

clause

N Mconsists_of

clause order

name

clause-tableN M

has

clause table

EXT Matching

Program

INT Matching

arity

1 Nhas

argument

name

1

1

has

logical operator symbol

1 Npreconditioned

order

argument

1 Nhas

name

position

type

order

Figure 1: External Matching, Internal Matching and Logic Program Entity Relationship Diagram.

Each logic program consists of certain num-ber (one or more) of complex clauses. Eachtable-clause is an ordered list of complexclauses (the relation clause). A single clausehas a number of named arguments. This clauseis either a concluding clause or a preconditionpredicate. These are not differentiated becausethey are represented by the same relation. Asin Prolog, such a construct always has a logi-cal operator on the right (the relation logicaloperator). The concluding clauses and pre-condition predicates are not distinct in terms ofdecomposition., preconditioned relationshipis used to assign preconditions to the conclud-ing clause. As shown, any complex clause de-fined in Sec. 3 can be decomposed into suchentities.

6 Conclusions

Extending knowledge processing capabilities ofRDBMS makes them applicable to new do-

mains and situations. Combining speed and ef-ficiency of a RDBMS with a server-side declar-ative programming gives a powerful, robust,and flexible environment, not only for datastorage and retrieval, but also for sophisticatedprocessing such as drawing conclusions and dis-covering new knowledge. A client applicationissues ordinary SQL queries, all the processingis done on the server side. Intensional knowl-edge (rules) is stored within RDBMS as a de-composed Prolog program. This architecturemakes the system flexible and transparent tothe client. Any rule can be altered individually,with changes applied only to appropriate rela-tions, using ordinary SQL queries. This can bedone without altering any client application.As a result, RDBMS can have DDB function-ality, preserving communication methods andthe Relational Model.

This approach can have applications inmany fields of Computer Science, including:

• Data Mining and Data Discovery,

• rule systems, e.g., telephone billing sys-tems, tax systems,

• Expert Systems, including those self-teaching (RDBMS could now posses Ex-pert System capabilities),

• Artificial Intelligence, Knowledge Discov-ery tools.

References

[1] E. F. Codd. A relational model of datafor large shared data banks. CACM,13(6):377–387, 1970.

[2] Ramez Elmasri and Shamkant B. Navathe.Fundamentals of Database Systems. Addi-son Wesley, 2000.

[3] Jeffrey D. Ullman and Jennifer Widom. Afirst course in Database systems. Prentice-Hall Inc., 1997.

[4] Marcia A. Derr, Shinichi Morishita, andGeoffrey Phipps. The glue-nail deductivedatabase system: Design, implementation,and evaluation. VLDB Journal, 3(2):123–160, 1994.

[5] Raghu Ramakrishnan, Divesh Srivastava,S. Sudarshan, and Praveen Seshadri. TheCORAL deductive system. VLDB Jour-nal: Very Large Data Bases, 3(2):161–210,1994.

[6] Srini Venigalla Netsetgo. Expanding re-cursive opportunities with sql udfs in db2v7.2. Technical report, International Busi-ness Machines Corporation, 2002.

[7] Ulf Nilsson and Jan Małuszyński. Logic,Programming and Prolog. John Wiley &Sons, 1990.

[8] Igor Wojnicki and Antoni Ligęza. An in-ference engine for rdbms. In 6th Interna-tional Conference on Soft Computing andDistributed Processing, Rzeszów, Poland,2002.

[9] Michael A. Covington, Donald Nute, andAndre Vellino. Prolog programming indepth. Prentice-Hall, 1997.

Documents

Extending Data Processing Capabilities of Relational ...janikow/publications/2003/Extending Data...Extending Data Processing Capabilities of Relational Database Management Systems