9
Cooperative Querying in Relational Databases Claudiney V. Ramos Departamento de CiCncia da ComputaGgo Universidade Federal de Minas Gerais 3 1270-901 Belo Horizonte MG Jose L. Braga Departamento de InformAtica Universidade Federal de Visosa 3657 1-000 ViGosa MG Brasil Brasil cvramos@ dcc.ufmg.br zeluis 0 dpi.ufv.br Alberto H. E Laender Departamento de CiCncia da Computaqgo Universidade Federal de Minas Gerais 3 1270-901 Belo Horizonte MG Brasil laender 0 dcc.ufmg.br Abstract This paper proposes an architecture for cooperative ac- cess to databases, and describes a cooperative interface f o r querying relational databases (i.e., an intelface that assists non-experienced and casual users to get useful answersfrom a relational database) which has been implemented based on this architecture. The architecture comprises two knowl- edge bases which store rules that describe the application domain and that guide the process of query formulation and answering. Queries are formulated using a subset of SQL which does not require the user S knowledge of the language syntax and of the database structure. The interface uses the rules stored in the knowledge bases to expand the queries and to provide more relevant answers to the user 1 Introduction Traditional database management systems (DBMSs) pro- vide a number of different facilities for accessing a database. Users may pose queries to a database by using a query language (such as SQL, QBE or QUEL) or a form-based interface or by embedding commands in an application pro- gram [7]. Such facilities, however, are not in general user- oriented, and therefore do not provide means for helping users to formulate and submit queries in a cooperative man- ner in order to extract the required information from the database [9]. For instance, when there is no answer for a query, a DBMS does not usually look for an alternative one which may satisfy the users’ information needs. This lack of cooperativeness imposes limitations to the widespread use of DBMSs [l], making them hard to be used by non- experienced and casual users, since they have to learn many details about the database structure and the DBMS itself. In this paper, we propose an architecture for cooperative access to databases and describe a cooperative interface for querying relational databases which has been implemented based on this architecture. The term cooperative interface is used in this paper to refer to interfaces that assist non- experienced and casual users to get useful answers from a database [ 111. In this way, users and the interface become partners in the querying process with the aim to extract more relevant information from the database. In general, cooperative interfaces, as described in the literature [ 111, are tailored to specific application domains, and thus they use the domain language as the language for interacting with external users. Such an approach makes it possible for the users to interact with the system in the native problem language, so achieving some level of cooperative- ness. Our work is based on a knowledge-based architecture, and makes use of rules for describing the database struc- ture and the cooperative querying strategies that guide the interaction with and the data extraction from the database. Some works in the literature, such as [2] and [4], address the cooperativeness problem in databases by improving the communication process between the users and the database system. In [2], the concept of neighborhood inference is used as a mechanism to define semantic contexts through which it is possible to find an answer to a query based on 0-8186-8052-0197 $10.00 0 1997 IEEE 190

[IEEE Comput. Soc 17th International Conference of the Chilean Computer Science Society - Valparaiso, Chile (10-15 Nov. 1997)] Proceedings 17th International Conference of the Chilean

  • Upload
    ahf

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

Cooperative Querying in Relational Databases

Claudiney V. Ramos Departamento de CiCncia da ComputaGgo

Universidade Federal de Minas Gerais 3 1270-901 Belo Horizonte MG

Jose L. Braga Departamento de InformAtica

Universidade Federal de Visosa 3657 1-000 ViGosa MG

Brasil Brasil cvramos@ dcc.ufmg.br zeluis 0 dpi.ufv.br

Alberto H. E Laender Departamento de CiCncia da Computaqgo

Universidade Federal de Minas Gerais 3 1270-901 Belo Horizonte MG

Brasil laender 0 dcc.ufmg.br

Abstract

This paper proposes an architecture for cooperative ac- cess to databases, and describes a cooperative interface f o r querying relational databases (i.e., an intelface that assists non-experienced and casual users to get useful answers from a relational database) which has been implemented based on this architecture. The architecture comprises two knowl- edge bases which store rules that describe the application domain and that guide the process of query formulation and answering. Queries are formulated using a subset of SQL which does not require the user S knowledge of the language syntax and of the database structure. The interface uses the rules stored in the knowledge bases to expand the queries and to provide more relevant answers to the user

1 Introduction

Traditional database management systems (DBMSs) pro- vide a number of different facilities for accessing a database. Users may pose queries to a database by using a query language (such as SQL, QBE or QUEL) or a form-based interface or by embedding commands in an application pro- gram [7] . Such facilities, however, are not in general user- oriented, and therefore do not provide means for helping users to formulate and submit queries in a cooperative man- ner in order to extract the required information from the database [9]. For instance, when there is no answer for a query, a DBMS does not usually look for an alternative one

which may satisfy the users’ information needs. This lack of cooperativeness imposes limitations to the widespread use of DBMSs [l] , making them hard to be used by non- experienced and casual users, since they have to learn many details about the database structure and the DBMS itself.

In this paper, we propose an architecture for cooperative access to databases and describe a cooperative interface for querying relational databases which has been implemented based on this architecture. The term cooperative interface is used in this paper to refer to interfaces that assist non- experienced and casual users to get useful answers from a database [ 111. In this way, users and the interface become partners in the querying process with the aim to extract more relevant information from the database.

In general, cooperative interfaces, as described in the literature [ 111, are tailored to specific application domains, and thus they use the domain language as the language for interacting with external users. Such an approach makes it possible for the users to interact with the system in the native problem language, so achieving some level of cooperative- ness. Our work is based on a knowledge-based architecture, and makes use of rules for describing the database struc- ture and the cooperative querying strategies that guide the interaction with and the data extraction from the database.

Some works in the literature, such as [2] and [4], address the cooperativeness problem in databases by improving the communication process between the users and the database system. In [ 2 ] , the concept of neighborhood inference is used as a mechanism to define semantic contexts through which it is possible to find an answer to a query based on

0-8186-8052-0197 $10.00 0 1997 IEEE 190

concepts not explicitly mentioned in a query. This allows one to define a multilevel representation of a database. By navigating through these levels, it is possible to derive more relevant answers to the users’ queries. In this representa- tion, knowledge is encoded in a type abstraction hierarchy, that can be built for relations, domains, and specific values for tuples. In [4], a rule-based system approach has been proposed in which the representation of the database is en- coded in rules. Such rules are characterized either as domain dependent or domain independent, and two knowledge rep- resentation levels are considered: the object level, where relations, attributes, domains, and integrity constraints are described, and a meta-level which describes how the object level rules are used.

In our work, we use some of the ideas proposed in [2] and [4] to build a knowledge base that guides the process of cooperative database querying, and adopt a subset of SQL [8] for query formulation that eliminates most of the language features that make it difficult to be used by non- experienced and casual users. This SQL subset allows users to formulate queries by specifying only a list of attributes and selection conditions that they must satisfy in order to be retrieved from the database. Join conditions, when required, are automatically generated based on rules stored in the knowledge base [ 101.

This paper is organized as follows. Section 2 presents the proposed architecture for cooperative database querying. Section 3 describes the cooperative querying interface which has been implemented based on this architecture. Section 4 presents some examples that illustrate the major features of the implemented interface. Finally, Section 5 concludes the paper.

2 An architecture for cooperative database querying

This section describes the architecture proposed for sup- porting cooperative querying in relational databases. The goal of this architecture is to serve as a basis for the im- plementation of an interface that allows non-experienced and casual users to query relational databases without prior knowledge of its structure and of the corresponding appli- cation domain. Such an interface works as a partner in the querying formulation process, assisting the users in the cor- rect specification of their queries and in their reformulation whenever an empty or unsatisfied answer is obtained.

The proposed architecture is shown in Figure 1. The cooperative interface module consists of

A knowledge base (KBI) on the application domain that describes the database schema. This knowledge base stores rules that encode information on relations,

relation attributes, attribute domains, relation keys, and domain integrity constraints.

0 A knowledge base (KB2) containing rules that describe the transformations that may be applied to the users’ queries to derive alternative answers. KB2 may also contain general rules required, for example, to deal with different types of user, to classify and treat errors, to select appropriate dialogues according to external factors and situations, and so on. In addition to that, specific concepts required to query relaxation [3], such as topics of interest and attribute value ranges, are also described in this knowledge base.

0 An inference engine that uses both KB1 and KB2 to obtain the rules that must be applied to transform the users’ queries whenever an alternative answer is re- quired.

0 An user interface that provides cooperativeness at the external level. This interface defines the communica- tion process between users and the database.

1 User Interface

Cooperative interface

inference Engine

Figure 1. An Architecture for cooperative database querying.

In this architecture, the user interface has a strong inter- action with the inference engine which, in its turn, interacts with both knowledge bases during the entire querying pro- cess. In doing so, the inference engine only returns an empty answer to the user as a final message when all possibilities in KB1 and KB2 have been exhausted. It is important to note that KB2 can work as an active knowledge base since it may also contain rules for knowledge acquisition from subsequent interactions with the users.

The main goal of the proposed architecture is to improve the quality of the interaction between users and the database, aiming at increasing the users’ level of satisfaction with the

191

answers obtained for their queries. The basic idea behind this architecture is to embed in the interface rules that pro- vide the users with a better understanding of the database querying process. Moreover, it provides a great level of in- dependence with regards to external factors, since the users are fully insulated from modifications in cooperative behav- ior patterns which are restricted to only one of the knowledge bases.

In addition, another important goal of this architecture, and therefore of the interface described in this paper, is to make the use of query languages like SQL [8] easier, reliev- ing users from the burden of having to know their syntax and the database structure, as well as some cumbersome details usually not understandable by non-experienced and casual users. The immediate benefits of these goals are a safer database querying process and an increased user satisfac- tion, desmystifying the use of databases by non-experienced and casual users.

Thus, in the context of this paper, cooperativeness is considered from two viewpoints:

Cooperativeness in the use of a query language, which allows users to reach a higher level of competence in query formulation without necessarily having to under- stand in detail the language features;

Cooperativeness in the use of a database, which makes it easier to users to query a database without knowing details of its structure, such as relation names, attribute names and domains, integrity constraints, etc.

These are two key issues that must be addressed to im- prove the facilities for database access by non-experienced and casual users, in order to make this task easier, more reliable. and even funnier to them.

3 Overview of the interface

Our interface adopts a subset of SQL [8] for query for- mulation. After getting an SQL expression from the user, the interface activates the inference engine that processes the query using the rules stored in KB1 and KB2. In this process, the interface interacts with the user guided by rules stored in KB2, until it gets an answer that finally satisfies the user’s need.

3.1 SQL subset

Considering the basic format

SELECT <attribute list> FROM <relation list> WHERE <selection condition>

of the SQL SELECT command [SI, the following simplifi- cations were made:

1.

2.

3.

4.

5.

6.

3.2

The union, intersection, and difference operators are not allowed.

Embedded queries [7], i.e. queries using the struc- ture IS IN (SELECT . . .), are not allowed. They must be replaced by appropriate terms introduced in the <selection condition> by means of the connective AND [12].

The <selection condition> can only be a conjunction of terms. Thus only the connective AND is allowed.

All terms in a <selection condition> must be in the format <attribute> <operator><expression>, where <expression> is an attribute or a constant, and <operutor> is one of the relational operators 2, 5, #, =, > or <. When a term is a join condition, the only operator allowed is the equality, which means that only equijoins [7] can be specified in our interface. However, it is important to note that the join condi- tions (equijoin conditions) are automatically generated by the interface and need not be specified by the users.

All attribute and relation names in the database schema must be unique. So any expression R1.AI = R2.A2 can be unambiguously written as A1 = A2.

There is also no need for specifying the FROM clause. The interface is able to identify from the schema defini- tion the relation names that correspond to the attributes specified in the SELECT command. This partly jus- tifies our assumption of unique names, since the inter- face requires from the user only the specification of the <attribute list> and the conditions that the correspond- ing attribute values must satisfy in order to appear in the final answer.

The inference engine

The inference engine is the component of the architecture responsible for controlling the querying process. It uses knowledge from both KB 1 and KB2, making it possible to rewrite the original queries in order to get more relevant answers to the user. The high level algorithm in Figure 2 shows how the inference engine works. A more detailed description of this algorithm can be found in [ 101.

The first step in the querying process is the checking of the attribute list and the selection condition specified by the user. Having received the corresponding SQL command, the interface proceeds by checking its sintax and semantic consistency according to the rules stored in the knowledge base KB 1. In case that some of the specified attributes do

192

BEGIN {engine}

GET the SQL query from the user

CHECK the query syntax and semantic consistency using schema

WHILE the query is not correct DO BEGIN {query correction}

PRINT errors and possible correction actions ACCEPT corrections

CHECK the query syntax and semantic consistency using schema

definition rules

definition rules

END {query correction}

BEGIN {expansion}

EXPAND the list of attributes by using the concept of topic of interest

EVALUATE the query

IF the query has an answer

THEN show the answer to the user

ELSE BEGIN {relaxation}

RELAX the selection condition by using the concept of

attribute proximity

EVALUATE the query

IF the query has an answer

THEN show the answer to the user

ELSE BEGIN {abstract hierarchy searching}

SEARCH for alternative concepts using type

EVALUATE the query

IF the query has an answer

THEN show the answer to the user

ELSE PRINT “no answer”

abstraction hierarchies

END {abstract hierarchy searching}

END {relaxation}

END {expansion}

END {engine}

Figure 2. High-level description of the infer- ence engine algorithm.

not appear in any relation defined in the database schema, the interface interacts with the user to help him to find the correct attributes he is interested in and to reformulate his query accordingly. Having finished the query checking, the interface generates the join conditions using rules stored in KBl and KB2. Whenever possible, the interface also expands the list of attributes by using rules in KB2 that define semantic associations between attributes based on the concept of topic of interest [4, 51. At the end of this first step, a correct SQL query has been generated and it is passed to the database management system to be executed. This query may be different from the initial one, since it may contain attributes or conditions not initially specified by the user.

In the second step, the generated query is executed by the database management system and the resulting answer presented to the user. This answer contains all the attributes specified by the user as well as those added by the interface based on related topics of interest.

A third step is performed when either there is no answer to the generated query or the user is not satisfied with the resulting answer. The goal of this third step is to generate alternative answers to the user by modifying the original query. The query modifications are based on rules stored in KB2 that define the concepts of attribute proximity and neighborhood inference [2]. Such rules allow, respectively, the relaxation of selection conditions (for example, for en- larging the range of values that some attributes may have in the answer) and the use of new relations corresponding to alternative concepts defined in terms of type abstraction hierarchies. The examples discussed in Section 4 will illus- trate these concepts more clearly.

3.3 Structure of the knowledge bases KB1 and KB2

As we mentioned earlier, the knowledge bases KB 1 and KB2 store the information needed to establish a cooperative interaction between the user and the database. KBl con- tains the rules that describe the database relational schema and KB2 contains the rules that describe high level abstract relationships between database objects described in KB 1, such as the concepts of topic of interest [4, 51 and type abstraction hierarchy [2]. The rules stored in KB2 guide the process of query transformation that enhances the query answers by returning additional and more relevant informa- tion. The rules in both knowledge bases are described using the syntax of Prolog predicates [6] .

3.3.1 The knowledge base KB1

The knowledge base KB 1 describes the database schema by means of a set of Prolog predicates that define the relations, attributes of each relation, the relation keys, the attribute domains, and domain integrity constraints. These predicates are used to check the semantic consistency of the user’s query, i.e. if all relation and attribute names specified in the query are defined in the database schema and if the query satisfies the integrity constraints.

Thus given the notation R(A1, A2, ..., An), where R is the name of a relation defined in the database schema and A;, 1 I: i 5 n, is an attribute name in R, the following predicates must be included in KB 1 :

RI: For each relation in the database, rel(R) , where R is

R2: For each attribute of each relation, attrib( R, A) , where

the relation name.

A is the name of an attribute of the relation R.

193

R3:

R4:

R5 :

attrib(flightfdeparture) attrib(flight,farrival) key(fliflight,jlight#)

For the primary key of each relation, key( R, K ) , where R is the name of the relation and li’ is the set of at- tributes that compose its primary key.

For each attribute of each relation, domain(A, D), where A is the attribute name and D is its domain name.

domain(flight#,integer) constraint(fdeparture>O) constraint(fdeparture~24)

For each attribute subjected to a domain constraint, constraint(<condition>), where <condition> is a condition that constraints the attribute values. The <condition> must have the format attribute <op> value, where cop> is one of the relational operators >, <, f, 2,s or =.

flight(flight#,fdeparture,farrival,ff rom,fto,fprice,fcompany#) train(train#,tdeparture,tarrival ,tfrom,tto,tprice,tcompany#) company(company#,compname,compaddress,compphone#) passenger(pass#, passname,passaddress,passphone#) flight-reservation(fticket#,frfIight#,frdate) train-reservation(tticket#,trtrain#,trdate)

3. Predicates that define the concept of proximity between attribute values. These predicates define intervals that can be used to relax conditions on relation attributes, thus allowing for the expansion of the query answers. In our interface, the relaxation of select conditions is allowed only for numeric attributes [lo].

4. Predicates that define the possible query transforma- tions. Theses predicates are defined in terms of the predicates of KB1 and other predicates of KB2, and are used to expand the query answers, thus improving the information provided to the users by the interface.

A topic of interest relates attributes of a same relation and is used to define the relative importance of an attribute in the context of another one. This means that every time an attribute A related to a topic T is used in the attribute list of the SELECT command, all the other attributes related to A by the same topic T must also be retrieved. Thus the following predicates must be defined in KB2 for this purpose:

R6: For each topic of interest, topic(T), where T is the defined topic.

R7: For each attribute of each defined topic, top-attrib(T,A), where A is the attribute related to the topic T.

For our running database example, whose schema is shown in Figure 3, the following predicates must be in- cluded in KB2 to define the topic “schedule”:

Figure 3. Relational schema.

topic(schedu1e) top-attrib(schedule, fdeparture) toD-attrib(schedu1e. farrival)

3.3.2 The knowledge base KB2

The knowledge base KB2 is a higher level knowledge base with regards to KB 1, and contains predicates that define abstract relationships between database objects described in KB 1. These predicates define query transformations that may be applied to the input queries in order to generate answers containing more relevant and useful information to the users. In fact, these predicates are meta-predicates [ 121 in the sense they are in a higher level when compared to those in KB 1. These higher level predicates include:

1. Predicates that define topics of interest [4,5] that relate attributes of a same relation.

2. Predicates that define the concept of neighborhood in- ference among relations [2]. These predicates define the relations that can be used as alternative ones when the execution of a query results in an empty answer.

To allow the relaxation of query conditions, the following predicate is used in KB2:

R8: interval(A,V), where V is the acceptable interval for relaxation the range of a condition on the attribute A , if there is no answer to the initial query’.

In what follows, we present the predicates used to define in KB2 the possible transformations that may be applied to a query in order to expand its answer.

R9: top-int(q,t):-appears(q,a), top-attrib(a,t). This rule states that if the attribute a appears in query q and this attribute is related to topic t (by the predicate top-attrib), then the topic t is a topic of interest (related topic) to query q. This means that all the attributes related to t must also be retrieved and exhibited in the

’It should be noticed that integrity constraints cannot be violated when relaxing attribute conditions.

194

answer. For example, if appea rs(q, fdepartu re), top-attrib(fdeparture,schedule) then top-int(q,schedule).

R10: relev-attrib(q, ca):-rel(r), top-int(q,t), attrib(a,r),

R11

attrib-top(a, t). This rule says that if the query q refers to the relation r, and a is an attribute of r related to the topic t , then it is relevant to the query q to show the attribute a. For example, if rel(Jlight), top-int(q,schedule), attrib(fdeparture,jlight), attrib-top(fdeparture,schedule) then relev-attrib(q,Jlight,fdeparture).

add-attrib(q,a):-rel(r), relev-attrib(q,r',a), i sa (c i ) . This rule states that if the attribute a of r' is relevant to query q, and the query q references a relation r that is a specialization of r , then the attribute a can be added to the list of attributes that must be retrieved. In short, this rule defines the inheritance of attributes from relation r' to relation r, which are related by an is-a relationship [7]. For example, if rel(Jlight), relev-attrib(q,Jlight, fdeparture), isa($ight, transportation) then add-attrib(q,fdeparture).

R12: relev-rel(q, r1 ,q):-rel(r), isa(r1, r), isa(r2,r). This rule states that if the query q references relation r1, and r1 and 1-2 are specializations of r, then it is relevant to exhibit the attribute values of r2 instead of the values of the attributes of r1. For example, if rel( transportation), isa(Jlight, transportation), isa( train, transportation) then relev-rel(q,flight, bus).

R 1 3 : relev-value(q, v): -appears(q,a), interval(a, v). This rule states that if the query q references an attribute a, and this attribute has a value v as a relaxation interval, then v is relevant for q. For example, if appears( q, fdepartu re), intewal(fdeparture, 2), then relev-value(q, 2).

R 14: join(a 1 , rl ,a2, r&-rel(rl), rel(r2), key(a I , r1 ), foreignkey(a1 ,az).

This rule states that if a1 and a2 are attributes of rela- tions TI and r2, respectively, and if a1 is referenced by the foreign key a2 of 13, then relations r1 and 1-2 can be joined by using a1 and a2 as join attributes. For example, if rel(Jlight), rel(Jlight- reservation), key(Jlig ht#,Jlig ht), foreignkey(Jlight#, f$ight#) then join(Jlight#,$ight, f$ight#,&ht-reservation).

R15: imp-attrib(a, b):-topic(t), top-attrib(a,t), top-attrib(b, t). This rule states that if there exists a topic t , and a and b are attributes related to this topic, then the attribute a is important to the attribute b. For example, if topic(schedule), top-attrib(fdeparture,schedule), top-attrib(farriva1,schedule) then imp-attrib(fdeparture,farrival).

4 Query examples using the interface

In this section, we present some query examples that il- lustrate the major features of our interface. The examples are based on the database instance presented in the Appendix. Additional examples can be found in [lo].

Query 1: 1 ? - select~IflightXl,where([departure~O3:0011). departure. Attribute not defined

Relational schema is as follows: flight(flightX,fdeparture.farrival,ffrom,fto,fprice,fcompanyX~ r r a i n ( t r a i n H . t d e p a r t u r e . i a r r i v a l . t f r o m . t ~ " y H 1 co~any(companyt,compname,compaddress,compphanetI passenqer(passH.paisname.pasladdress.pas~ph~~~H1 flight-~e~ervation(fti~k~tH,f~fllghtR,frdatel t r a i n - r e s e i v a t i o n i t t i c k e t b , t r t r a i n # , t r d a t e )

Original query: SELECT flight# FROM flight WHERE fdeparture > 09:OO

SELECT flight*, [departure, farrival FROM flight WHERE fdeparture > 09: f l f l

flight# fdeparture farrival 5 1 O : f l f l 14:OO 6 09:30 11:OO 8 1O:OO 11.30 9 12:OO 16:OO

1 0 13:OO 18.00 11 12:30 13.30

Enter new condition: (fdeparture > 09:OOl

Modified query with additional attributes:

*,**.************** Query Answer ..... f f f . ** f . f ******

In this query, a non-defined attribute was used in the WHERE clause. The interface informs this fact to the user, shows him the schema describing the structure of each re- lation, and asks him to define another selection condition. The interface then adds the attributes fdeparture and farrival to the attribute list of the command SELECT, since they are related to the same topic, and executes the query.

195

Query 2: I ? - selecriifdeparture.name1). This query involves relations: passenger. flight These relations are not directly related

The sets of related relations in the schema are: [ f l i g h t , f l i g h t - r e s e r v ~ t i ~ ” , p ~ ~ ~ ~ ~ g ~ ~ , = ~ ~ ~ ” y l [ f l i g h t . f l i g h t - r e s e r v ~ t i ~ ~ , p = ~ ~ ~ ~ g ~ ~ l

Original query: which set do you want to usel [flight,flight-reservation.passengeri

SELECT fdeparture, passname FROM flight, flight-reservation, passenger

SELECT fdeparture, farrival, passname FROM flight, flight-reservation. passenger

Modified query with additional attributes:

************ f . f f f f * Query Answer . . . . . . . . . . . . . . . . . . . . fdeparture farrival passname 07:OO 08:30 Jose Silva 0 9 : o o 10:30 Jose silva 07:OO 08:30 Maria Joaquina 0 9 : O O 10:30 Maria Joaquina 08:30 09:30 Joao Soares

In this query, attributes from relations flight and passenger were used in the attribute list of the command SELECT. However, it is not possible to directly relate (join) these two relations. Thus the interface informs this fact to the user, shows him the possible paths between these two relations, and asks him to choose one of them. It then mod- ifies the query and executes it.

Query 3: 1 ? - ~electilpassname,d~tef,fromfl.where~Lfdeparture~2S:OOl~~. Constraint violation: [fdeparture>25.001 Constraint defined: O=< fdeparture =e 24 Enter a new Condition: (fdeparture>Ol.OOI Original query: SELECT passname, fdate, ffrom FROM flight-reservation, passenger, fllght WHERE fdeparture > 08:OO

SELECT passname. fdate, fdeparture, farrlval, ffrom. fto FROM flight-reservation. passenger, fllght WHERE fdeparture > 08:OO

Modified query with additional attributes:

I****************** Answer ~ * * f * * * f * * * * f * * * f * * *

Jose Silva 2 1 / 0 9 / 9 5 09:OO 10.30 Belo Horizonte Sa0 Paul0 Maria Joaquina 18/11/95 0 9 : O O 10 30 Belo Horizonte Sa0 Paul0 Joaa Soares 20/12/95 08:30 09:30 Belo Hozimnte R I O de JanelrO

passname fdate fdeparture farrival ffrom ft0

In Query 3, the condition specified violates a domain integrity constraint, expressed by 0 5 fdeparture 5 24. The interface informs this to the user and asks him to define another condition for the attribute, then proceeding by executing the modified query. It is worth noting that the attributes fto, fdeparture, and farrival were added to the attribute list of the modified query based on the concept of topic of interest.

Query 4: I 7 - select~[fdate.ffroml,wh~~~~lfdeparture,lO:OOl~~. Original query: SELECT fdate, ffrom FROM flight-reservation. flight WHERE fdeparture > 1O:OO

SELECT fdate, fdeparture. farrival. ffrom. fto FROM flight-reservation. flight WHERE fdeparture > 1O:OO

SELECT fdate. fdeparture, farrival. ffrom. fto FROM f light-reservation. flight WHERE fdeparture > 08:OO

Modified query with additional attributes:

Modified query with relaxed canditions-

*..**ffff. .***f**t* Query hSwer . . . . . . . . . . . . . . . . . . . . fdate fdeparture farrival ffrom fto

21/09/95 09:OO 10:30 Belo Horizonte Sao Paulo 18/11/95 09.00 10:30 Belo Horizonte Sa0 Paulo 20/12/95 08.30 0 9 ~ 3 0 Belo Horizonte Rio de Janeiro

Query 4 was correctly specified by the user and the inter- face proceeds by executing it. In the first step, it modifies the

query by expanding the list of attributes based on the con- cept of topic of interest. Attributes f to , fdeparture, and f arrival were added to the query in this step. As the modified query does not have an answer in the database (see Appendix), a second transformation step is carried out. In this step, the selection condition of the WHERE clause is relaxed by replacing the condition fdeparture > 1 0 : 00 by fdeparture > 08: 00. The final query then results in an approximate answer to the user.

Query 5: I ? - ~electi[flight#,ffrom,fpricel,wherei~fdeparture~16:001~~. original query: SELECT flight#, ffrom. fprice FROM flight WHERE fdeparture > 16:OO

SELECT flight#, fdeparture, farrival, ffrom, fto, fprice FROM flight WHERE fdeparture > 16:OO

Modified query with additional attributes:

Modified m e m with relaxed conditions: . - SELECT flight#, FROM flight WHERE fdeparture

fdeparture, farrival. ffrom, fta, fprice

z > 14:OO Modified query with alternative relations: SELECT trainti, tdeparture, tarrival, tfrom, tto, tprice FROM train WHERE fdeparture > 16:OO

f f f , ** f . * f , f t t *** t t Query Answer . . . . . . . . . . . . . . . . . . . . train# tdeparture tarrival tfram tto tprice

4 17:OO 17:OO Bela Horizonte Salvador 40.00 5 18:OO 16:OO Belo Horizonte Port0 Alegre 60.00

In this last example, the query was also correctly spec- ified by the user. The interface then modifies the query by expanding the list of attributes with the attributes f to, fdeparture, and farrival. However, this modified query does not have an answer in the database (see Ap- pendix). Thus a second transformation step is performed by the interface by relaxing the selection condition of the WHERE clause. The new modified query also does not have an answer in the database, and a new and final trans- formation step is performed to replace the relation specified in the query. For this, the interface uses the concept of neigh- borhood inference and the relation flight is then replaced by the relation train. This transformation requires the at- tribute names in the SELECT and WHERE clauses to be modified accordingly. The modified query is then executed, finally generating an approximate but relevant answer to the user.

5 Concluding remarks

This paper proposes an architecture for cooperative querying in relational databases. This architecture ex- tends the functionality of relational DBMSs providing non- experienced and casual users with a friendlier environment for database querying. The architecture uses two knowl- edge bases, one containing rules that describe the database schema and the other one containing rules that describe the transformations that may be applied to the users’ queries in order to provide cooperativeness when querying the database.

196

Based on this architecture, we have implemented a coop- erative interface for querying relational databases that uses a subset of SQL for query formulation. Through this interface, non-experienced and casual users can formulate queries to a relational database without having to know details of the SQL syntax and the database structure. The interface assists the user in the query formulation and, whenever required, modifies the input query in order to generate alternative or more relevant answers to the user.

The interface has been implemented using some of the ideas proposed in [2] and [4]. However, our approach, based on a two-level knowledge base architecture [ 121, provides a more general framework, with a richer set of query trans- formation rules, which considerably improves the database querying process by non-experienced and casual users.

References

[ 13 Cha, S. K. Kaleidoscope: A Cooperative Menu- Guided Query Interface (SQL Version). IEEE Trans. on Knowledge and Data Engineering 3 , l (March 1991).

[2] Chu, W. W., Chen, Q. and Lee, R. Cooperative Query Answering via Type Abstraction Hierar- chy. In Deen, S.M. (ed.). Cooperating Knowl- edge Based Systems. Springer-Verlag, London, 1991.

[3] Chu, W. W., Chen, Q. and Page Jr., T. W. CoBase: Cooperative Distributed Databases. Proc. ofthe 6th Brazilian Symposium on Databases, Manaus, Brazil, 1991.

[4] Cuppens, F. and Demolombe, R. Cooperative Answering: a Methodology to Provide Intelli- gent Access to Databases. Proc. of the 2nd In- ternational Conference on Expert Database Sys- tems, Tysons Corner, Virginia, 1988.

[5] Cuppens, F. and Demolombe, R. Extending An- swers to Neighbor Entities in a Cooperative An- swering Context. Decision Support Systems 11, 1 (January 1991).

[6] Clocksin, W.F. and Mellish, C.S. Programming in Prolog, 4th ed. Springer-Verlag, Berlin, 1995

[7] Elmasri, R. and Navathe, S. B. Funda- mentals of Database Systems, 2nd ed. Ben- jaminKummings, Redwood City, California, 1994.

[9] Motro, A. FLEX: A Tolerant and Cooperative User Interface to Databases. IEEE Trans. on Knowledge and Data Engineering 2, 2 (June 1990).

[ 101 Ramos, C. V. A Cooperative Interface for Rela- tional Database Querying. MSc Thesis, Departa- mento de CiCncia da ComputaqBo, Universidade Federal de Minas Gerais, Belo Horizonte, 1996. (in Portuguese)

[ 111 Rettig, M. Cooperative Software. Communica- tions ofthe ACM 36 ,4 (April 1993).

[12] Silva, E. C., Laender, A. H. E and Braga, J. L. Relational Database Query Optimization Driven by a Multi-level Knowledge Base. Re- vista Brasileira de Computapio 4, 4 (1990). (in Portuguese)

[8] Melton, J. and Simon, A. R. Understanding the new SQL: a Complete Guide. Morgan Kauf- mann, San Francisco, California, 1993.

197

Appendix: the database instance

company# compname compaddress 01 TAM AV. Paulista 1100 02 VASP AV. Getulio Vargas 1250 03 TRANSBRASIL AV. Janio Quadros 2500 04 VARlG AV. Brasil 2254 05 RFFSA AV. Andradas 1001 06 CVRD AV. Tancredo Neves 1200

flight

compphone#

011-334-5674 011-557-8798

01 1-212-1444

01 1-447-5231 031 -489-9977 01 1-454-651 2

I “ I I

pass# 01 02 03 04

passname passaddress passphone# Jose Silva AV. dos Leoes 45 011-332-3322

Maria Joaquina Rua dos Tigres 84 031-441-4141 Joao Soares Rua dos Alpes 8 01 1-443-3443 Ana Antonia Rua dos Jacares 1004 01 1-21 2-21 22

flight-reservation I fticket# I frfliaht# I frdate I

09/21 195 10/22/95

02 03

train-reservation Ricket#

04/20/95

05

198