A Model for Context Aware Relational Databases ... › pubs › uploads › TR-2008-6.pdf · far, no data model or language for context aware data has been proposed which can be considered

Technical Report: September 2008

A Model for Context Aware Relational Databases*

Ioannis N. Roussos1, Timos Sellis1,2 1School of Electrical and Computer Engineering, National Technical University of Athens, Hellas

2Institute for the Management of Information Systems (R.C. “Athena”), Hellas {iroussos, timos}@dblab.ece.ntua.gr

* This work has been submitted to 12th International Conference on Extending Database Technology (EDBT

2009)

ABSTRACT Context aware information, i.e. information that depends on a set of external parameters like location, time, user’s profile etc, is commonplace in real world data management scenarios. There is a growing need to uniformly represent, store and manage information entities with different instances (values) or even different schemas (attributes) under varying circumstances as defined by context. Considerable effort has been directed towards context acquisition from sources and context modeling. However, so far, no data model or language for context aware data has been proposed which can be considered a natural analogue to the relational model or algebra. In this work, we argue about the need for a conceptual model for context aware information and propose a context aware data model based on the definition of multi-schema relations. Finally, we present a set of operations that extend the relational algebra in order to incorporate the notion of context aware data management in the relational model.

1. INTRODUCTION In the recent years, the emergence of mobile computing, pervasive computing, ambient intelligence and context aware applications has created the necessity of new data management paradigms. Applications for those domains need to be able to handle and manage information that depends on a set of external parameters like location, time of day, user preferences and other sensed or derived data. For example, the answer to a simple query of the form “Find me a restaurant” may depend (i) on the location of the user posing it, so that only nearby restaurants are returned, (ii) whether he is alone, with colleagues or friends, (iii) if it is winter or summer, (iv) if it is raining and what kind of transportation is available to him, (v) his personal gastronomic preferences, etc.

All those external parameters that affect the information that is returned to a user as a result of a query are modeled through the notion of context. We can designate two categories of context: context of the user that requests some information and context of the information itself. According to a general definition of context [2], a user’s context is defined as “any information that can be used to characterize the situation of an entity. An entity could be a person, a place or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves”. Accordingly, information is characterized by context that either specifies some important features of the information itself, like for example the location of a restaurant, or labels the information according to the context of users that may request for it, e.g. a nice restaurant for couples. In each application, the business logic determines which information in particular is considered as context.

In Table 1 we present a possible high level classification of context and basic context types based on the classification presented in [3]. User-centric context includes anything that characterizes a user who requests for some kind of information. Environmental context is comprised of parameters that describe the relationship of a user or an information entity to the environment. Finally, system-related context specifies the characteristics of the device or the network which are used to request and browse the

information. All the context types shown in Table 1 affect i) what will be returned to a user as a response to a query, i.e. the exact information returned, and ii) how it will be returned, i.e. the schema of the returned results.

Without using any of the proposed in literature complex representations or models for context, we can think of context as an ordered set of any number of the above parameters. For the rest of the paper we will call those external parameters context attributes. For example, in a mobile application context could be defined as <Location, Time, Vehicle>, with some context instances <Athens, 9:30, train> or <Route 66, 11:00, on foot>.

Table 1. Context Categories and Types Context

Category Type Examples

User-Centric Context

User’s profile Age, nationality, educational background. User’s dynamic behavior Activity, task, situation or intention.

User’s physiology and emotional state

Body temperature, heart rate, happiness, stress.

Environ-mental Context

Spatial information and spatial relationships

Location, orientation, distance from points of interest, traffic conditions.

Temporal information Current date or time.

Physical environment Temperature, lighting, noise level.

System-related Context

Device characteristics Screen resolution, memory, computational power, operating system.

Network characteristics Network bandwidth, neighboring devices.

As also defined in [2], a system is context-aware if “it uses context to provide relevant information and/or services to the user, where relevancy depends on the user’s task”. The main tasks of context aware applications are to a) characterize provided information using context and b) when a user with a specific context C requests for some kind of information, to retrieve the appropriate information for context C, i.e. match user’s context with information’s context. Additionally, more complex queries containing arbitrary restrictions on context attributes may result in answers with data from multiple contexts. For example, a user may request for all restaurants in Europe with a specific property like e.g. “a large wine vault”, resulting in restaurants from Greece, Italy and Spain, even though they are defined in different contexts.

As Relational Database Management Systems are the most widely used data management systems and provide the most mature database technology today, in this work we focus on the relational model. We assume that information is stored as records in the relations of a DBMS, but our discussion holds for other data models as well. Specifically, when using a relational DBMS, the main tasks of a context aware application are translated into two requirements from the system; the ability to characterize the context of each record (tuple) and to retrieve all records for a specific input context. The information stored in a record is valid only within the context where it is defined.

From a data management perspective, we can consider modeling context aware information as modeling relations with different instances or even different schemas in different contexts. For example, relation Restaurant is populated with a completely different set of tuples when context attribute location is “Athens” than when it is “Paris”. Furthermore, for a specific relation R, a set of attributes may make sense in only some contexts. For example, an attribute “distance from the beach” of relation Restaurant holds potentially useful information if context is “Mykonos” or “Saint Tropez”, as they are coastal cities, but has no meaning at all if context is “Munich”, while attribute “Oktoberfest special offers” should be defined only if context is “Munich”.

In practice, although a lot of research has focused on modeling context, efficient acquisition of context and system architectures for mobile, pervasive or context aware computing, the aspect of managing context aware data is delegated to the applications. To the best of our knowledge, so far no structured data model or language for context aware data, which can be considered a natural analogue to the relational model or algebra, has been proposed. We believe that context and context aware information should be incorporated in Database Management Systems as first class citizens in order to allow the uniform management of any type of context and context aware information.

Our contributions can be summarized as follows:

1. We illustrate the need for an inherently context aware data model and identify the requirements that must be satisfied from such a model.

2. We elevate context and context aware relations to first class citizens of the database through a context aware data model.

3. We facilitate the management of context aware information by defining a set of operations over context aware relations that extend the (traditional) relational algebra.

The rest of the paper is organized as follows. In Section 2 we argue about the need for a conceptual model for context aware information and discuss related previous work. Section 3 provides a running example that is used throughout the paper. In Section 4 we present a context aware data model and in Section 5 a basic set of operations. Finally, in Section 6 we present an overall example in order to evaluate our operations and in Section 7 we discuss future directions of our research and our conclusions.

2. MOTIVATION AND RELATED WORK Our work is motivated by numerous application scenarios and environments. It is easy to find examples of context aware applications not only from the area of mobile computing and location based services, but also from systems with a large number of heterogeneous data sources with related information and semantics, or even enterprises with complex business logic, which leads to vertical or horizontal partitioning of relations according to context. Moreover, advances in web and communication technologies form a global environment for applications and content providers, where the result of a query may depend on localization discrepancies, grouping of users according to some characteristics or personalization and user preferences. In order to provide a uniform solution to the above problems, a model, which will allow a dynamic, declarative definition of the instances and the type of information returned according to context, is required.

In practice, there has not been proposed any solution for uniform treatment of context and context aware data, which can be considered compatible with the relational model. Therefore, management of context and context aware data is done through procedural methods at the application layer. Furthermore, using the relational model to solve the above problems is not straightforward; even simple requests translate into very complex queries or multiple queries over the relations of a database. Each application uses an implementation-specific solution that depends on the design decisions about how context is modeled in the database and how context aware relations are represented. For example, an application might use a set of database relations for each context aware relation or embed context as part of the primary key of each relation. Either way, context aware queries are dynamically prepared by checking the user’s context and creating a set of appropriate queries that are sent to the database. The results are merged afterwards, again in the application layer. Like in the case of data management without a DBMS and a declarative query language, implementing a part of the context aware data modeling and management at the application layer is (i) error prone, (ii) implementation specific and (iii) a hindrance for maintenance. Furthermore, it obstructs query optimization techniques and in the long run, results in worse system performance.

The same arguments hold for using the object oriented model or a semistructured data model for representing and querying context aware information. Even though they are more flexible in representing multiple instances or incompatible schemas of the same information entity, they lack the semantics and the operations for managing context and context aware information. In both models, like in the case of the relational model, there is no inherent support for multiple “versions” of the same entity according to a set of external parameters, so context modeling and management of context aware information must be explicitly implemented by each application.

With respect to handling data that differ under various contexts, database research has already dealt with some specific types of context. Temporal databases [1] deal with context that is a simple parameter <Time>, <Date > or <Date Interval>, while spatial databases [6] deal with context of the form <Location> or <Spatial Interval>, and spatiotemporal data models [7] with context of the form <Time, Location>. All of those models are oriented to the semantics of the context attribute, which they try to incorporate as a first class citizen in the database. However, even though they deal implicitly with a form of multiple instances in multiple contexts, they do not treat multiple instances of a relation explicitly in the data model, nor do they propose a uniform and complete solution for

relations with multiple instances. Also, the case of multiple schemas is not considered at all. The same holds for general approaches for metadata management [13].

View mechanisms [8] allow for a formal treatment of multiple instances and even multiple schemas for a specific relation. In a solution using views for representing context aware data, only a single global relation exists with all attributes defined in at least one context (in a “universal” relation fashion). For each context, a view is defined over the global relation in order to select the appropriate tuples and project the defined schema for this context. There are a number of drawbacks though in such a solution:

1. The semantics of context are lost as it is either represented as a simple key attribute with no distinction among other attributes, or in the worst case it is only part of the view description or stored in the data dictionary as schema information.

2. Choosing the correct view can only be done in the application layer and not through a first order query language with data scope, as for example the SQL language.

3. Queries over multiple contexts need complex view joins in order to be evaluated.

Research in the area of multidatabase systems and query languages over multidatabases [5,9,19] can be considered the closest research area to our work, as it has focused on answering queries over multiple schemas. Using the multidatabase perspective, we could model context aware information as a set of contexts forming “data spaces” with a number of relations defined in them. Each context can be defined as a database and each relation schema of a context relation as a relation in the corresponding context, with the constraint that relation schemas of the same context relation will have the same name in different databases. The main problems with the multidatabase approach are that in order to pose a query, a user should know in advance all the possible schemas in the multidatabase and, also, that there is no feature for referencing and querying all the schemas of a context relation as an atomic entity. Another drawback is that even if variables over relations are allowed, relations that are referenced by the same variable must have the same schema. Moreover, even if we could easily extend multidatabase languages in order to overcome those problems, all multidatabase operations accept as input a set of relations that may reside in the same or different databases, but always return a single database as an output. Thus, they prevent any implementation of a language with operations that have context relations both as domain and scope.

A formal framework for reasoning upon a subset of a global knowledge base can be found in [4], which is extended in the Local Relational Model [12] to allow for inconsistent databases and to support semantic interoperability in the absence of a global schema. Examples of how context can be used for partitioning an information base into manageable fragments of related objects can be found in [10,18]. In [14], multidimensional semistructured information entities are presented, which assume different facets, with varying content (value and/or structure), under different contexts. The work in [15] shows how context can be used to represent and query valid time of data and histories of semistructured databases. In [16], a model and algorithms for managing context aware preferences are presented. Finally, [11] argues about the necessity of a context aware relational model and proposes a conceptual model for context aware information and some possible operations over context, but no work followed on completing this model or on proposing a corresponding data model. With the exception of [11], none of the proposed solutions focuses on a model downwards compatible to the relational model, resulting in models with difficult or even impossible implementations over relational database systems.

A final note on related work concerns the notion of context. This has been extensively studied by several research disciplines, such as cognitive science, linguistics, artificial intelligence and software engineering. Numerous models for context have been proposed, including modeling contextual information using key-value pairs, object oriented models, ontologies, graphical models, logic-based models and meta-model approaches for context. An extended survey of context modeling approaches can be found in [17]. The focus of our work is on modeling context aware information and data models incorporating context, so we keep a general, relational definition of context. Using more complex context models is orthogonal to our work and can be studied independently.

In closing, we summarize the requirements that we set for a context aware data model:

• A context relation Rc is defined with respect to a context definition C and may have a different schema defined for each different context instance of context C.

• Even if a context aware relation Rc has the same schema in some context instances, it may have a different instance, i.e. set of tuples, in each of them.

• Operators defined on context aware relations (as well as a context aware query language) should be defined on context aware relations and not on a specific instance or schema.

• In general, results of queries over context relations are also context relations with multiple schemas and/or instances in multiple context instances.

• Operators must allow the reference to any instance or schema of a context aware relation and the treatment of each instance or schema as if it were a completely different relation.

Apart from the above basic requirements, we are also interested in context aware data models that have a feasible, non-intrusive with respect to the database engine, relational implementation and a corresponding query language downward compatible to the relational algebra.

In the following sections, we propose a model that covers all of the above requirements and a core set of operations for the efficient manipulation of context aware data. Before that, we give an illustrative example in order to point out some crucial issues.

3. ILLUSTRATIVE EXAMPLE Consider a worldwide electronic marketplace where retail companies, suppliers or even independent users sell products. The system allows each individual seller to dynamically construct a structured schema for his products, catalogue (based on product categories), collaborating vendors and other basic, predefined relations. The schema of each such relation is built using some global predefined attributes with well-defined semantics like, for example, product name, price, delivery time, description, quantity, address, etc. A seller creates his schema by iteratively (a) choosing from the global pool of relations which relation he wants to create, (b) adding attributes that he can find from a set of predefined attributes for this relation and (c) confirming predefined constraints, like “NOT NULL” or foreign keys. An example of a simple initial schema created from supplier SA is presented in Figure 1, where pID and cID are global unique identifiers for Products and Categories.

Figure 1. Example of a simple schema from supplier SA.

Users are characterized by a set of metadata defining their context, e.g. current user location, date and time, browsing device, etc. We assume that all those metadata are automatically retrieved by the system from the user profile or are sent from the users in real time, while they browse the site. That way, the system is aware of or can derive the context of each user. The definition of context and what kind of metadata are part of context is global and the same for all users.

Along this example, context is defined as <Location, Date, Browsing Device Capabilities> with an example context instance <Greece.Athens, 01/01/2008, High-end PC> for someone browsing the market place from Athens on New Year’s day, using his recently bought desktop PC. Using context, sellers can offer specialized services or products for specific users. That means that suppliers can define many data instances for each relation, namely one for each dynamically specified context. The definition of multiple instances per relation is achieved through the definition of multiple schemas, one for each instance that a supplier wants to define. For example, some products from supplier SA are sold in Italy but not in USA, so he creates for relation Product one schema for context < Italy, *, *> and a second schema for context < USA, *, *>, where ‘*’ is a special value of the domain of context attributes for representing the whole domain.

Moreover, even a relation’s schema, i.e. the set of defined attributes, may change based on context, reflecting different conceptual logic under different circumstances. For example, in some countries internet products are tax free, so VAT is not defined, while in others the definition of VAT is

obligatory and receipts must include it. In an automated system like the one of our example, in order to be able to model and use complicated policies according to what is part of the description (i.e. schema) of an object, we need a multi-schema relation model.

The proposed implementation of the abovementioned functionalities is through the fully automated generation of the marketplace for each single user according to his context. The significant detail in this process is that the automated generation of multiple schemas and the query facility is not accomplished through the use of some procedural language in the application or database layer (like C++ or PL-SQL), but through the use of a context aware model. So, for example, the presentation of the available iPod offers for Greece could be written inside the system using the simple query:

σ name = “iPod” (σc Country = “GREECE” Product) where σc is the operation for selecting specific relation schemas from a context relation using conditions on context.

Our interest is to propose solutions in order to be able to model and automatically implement in the database layer context aware systems like the one presented here, i.e. without using additional programming modules in the application layer. This requirement naturally leads to the need for a uniform, simple data model of relations with multiple schemas for different contexts.

4. A CONTEXT AWARE DATA MODEL In order to support what is described in the previous sections, we first present a simple model for context and then proceed to define context relations, i.e. relations with multiple instances and schemas depending on context.

4.1 Context Definition We can think of context as a multidimensional parameter that determines the scope of relation instances or relation schemas. A context is formally defined through the definition of a context schema Cs, i.e. the set of attributes that compose it. We call an assignment of values to a given context schema Cs, a context instance Ci.

In order to support context as a first class citizen, regardless of the specific implementation that we choose, a set of operations must be defined. We want to be able to define new context schemas and handle context instances through the use of context specifiers. A context specifier Cp can store a specific context instance Ci or a set of context instances {Ci, Cj,…, Cm}. We deliberately choose this general definition of context specifiers as sets of context instances, in order to allow potential future complex representations of context over our model. For example, according to the work presented in [14], context specifiers are defined as syntactic constructs that are used to specify sets of context instances, called worlds in the scope of their model, and a set of operations on context specifiers are defined. Those definitions can be easily incorporated in our general set-based model for context.

DEFINITION 4.1. Context Attributes, Schemas and Specifiers

Let c_att be a countable infinite set of attribute names (context attributes) and c_dom a countable infinite set of domains called the underlying context domains. Without loss of generality, each attribute A ∈ c_att takes values from the set DOM(A) ⊆ c_dom, where DOM is a mapping on c_att. All domains are enhanced with a special value * for representing the whole domain. Context attributes refer to the available set of attributes that are used in a specific database schema to represent context.

A context schema Cs is defined as a set of context attributes Cs = {c_att1, c_att2, …, c_attn}. A context instance Ci is defined as an assignment of values {c_v1, c_v2, …, c_vn}, with c_vi ∈ DOM(c_atti), to the context attributes of Cs. A context specifier Cp, defined with respect to a context schema Cs, is a set of context instances of Cs.

In order to distinguish context schemas and context instances from ordinary sets, throughout the next sections we represent them using the ‘<…>’ notation instead of ‘{…}’. A context instance with the special value ‘*’ in some of its context attributes, represents a context that holds for any value assigned to the specific attributes. For example, if we assume the context schema Cs2 of Table 2, then context instance Ci2 refers to the year 2005 regardless of the location. More formally, for any set of values Vi: Vi ⊆ {*}.

The special value ‘*’ is important as we assume that any context instance of a context schema Cs can be extended to any context schema Cs’ that includes Cs (Cs’⊃ Cs) by assigning the value ‘*’ to the additional attributes. For example, if we think of a context schema Cs = <Date> and a context instance Ci = <2008>, then any information defined in Ci, is also defined e.g. for any location, so it is also defined in Ci’ = <2008, *> with respect to Cs’ = <Date, Location>. Finally, we assume that any information defined without explicit context, as for example data stored in traditional relational databases, is implicitly defined using an empty context schema Cs = ∅ . That means, according to our definitions, that we can assume for any context schema Cs’ = <c_att1, c_att2, …, c_attn>, that information without explicit context is labeled using context instance <*,*,..,*>. This conclusion is consistent to our argument that traditional information is defined for any possible context, as is the case for example with non-temporal data, which are valid for any point in time. Further examples of context schemas, context instances and context specifiers are presented in Table 2. Context specifier Cp3 refers to users {jack, mary} when using as device a “PC” or to user “george” when he uses any valid device, i.e. any device in the domain of context attribute Devices. In order to have more compact representations of context specifiers in the next sections, a specific context specifier of the form {<A1, B>,…, <An, B>} will be notated as <{A1, …, An}, B>.

We can have more complex (rich) definitions for context, like for example context attributes defined as ontologies or an object oriented definition of context, but for the purposes of this work we will keep this simple definition. The semantics of each context domain can be modeled through the definition of special operations. For example, in the case of the context domain of time intervals we can assume that there are additional operations like BEGIN(T), END(T), EQUALS(T1, T2), OVERLAPS(T1, T2), MERGE(T1, T2), etc.

Table 2. Examples of context schemas, context instances and context specifiers Context

Schema Cs Context Instance Ci

Context Specifier Cp

Cs1 = <Language> Ci1 = <Italian> Cp1 = {<French>, <Greek>} Cs2 = <Location, Date> Ci2 = <*, 2005> Cp2 = {<*, 2003>, <Athens, 2004>} Cs3 = <Users, Devices> Ci3 = <david,PDA> Cp3 = { <george, *>, <jack,PC>,

<mary,PC>}

As we define a context schema Cs in the same way as a relational schema, a context specifier Cp corresponds to a relation instance of this schema. More generally, a context specifier Cp can be defined as any arbitrary query over a relation with all possible instances of Cs. Similarly, a context instance Ci corresponds to a valid tuple of an instance of this relation schema. Thus, there is a straightforward definition of an algebra for context through the use of the relational model and the relational algebra. This definition can be expressed with relational operations and more specifically, we can define the following:

i) Set operations on context specifiers.

ii) Projection of specific context attributes from a context specifier. For example, using the Cp3 context specifier of Table 2, Cp4 = π Users Cp3 will result to a new context specifier Cp4 = {<george>, <jack>, <mary>}.

iii) Evaluation of conditions over context specifiers by using the relational σ operation over the context instances of a context specifier Cs. A context specifier satisfies the select condition if at least one context instance is returned using the select operation, i.e. if the result of the select operation is not the empty set.

4.2 Context Relations We define a context relation Rc with respect to a context schema Cs as a set of relation schemas {R1, R2,…, Rn}. Each relation schema is associated with a context specifier Cp, defined with respect to context schema Cs. We require that all context specifiers of relation schemas in a context relation Rc are disjoint. Each relation schema Ri is defined for a specific set of context instances, as defined by the corresponding context specifier, and each context instance Cj has at most one specific relational schema Rj associated with it. If two context specifiers had a context instance Ci in common, then requesting for the relation schema defined in Ci would result in two possibly incompatible relation schemas. A conceptual view of a context relation is presented in Figure 2.

DEFINITION 4.2. Context Relation Schema

We assume a countable infinite collection of attribute names. Each attribute Ai is characterized by a name and a domain dom(Ai). A relation schema R is a finite set of attributes and a relation instance is a finite subset of the Cartesian product of the domains of the attributes of the relation schema.

A context relation schema Rc is an (ordered) pair of (i) a reference to a context schema Cs and (ii) a finite set of (ordered) pairs of relation schemas and context specifiers defined with respect to context schema Cs:

Rc = < Cs, {<R1(A11, A12, …, A1n), Cp1>, <R2(A21, A22, …, A2k), Cp2 >, …} > Context specifiers {Cp1, Cp2, …} are defined with respect to the same context schema Cs = {c_att1, c_att2, …, c_attm}. All the defined context specifiers are disjoint, i.e. for every Cpi, Cpj ∈ {Cp1, Cp2, …},

with i ≠ j, Cpi ∩ Cpj = ∅ .

Context-Relation: Rc

A3A2A1

R1

C1

A4 A5A3A1

R2

C2

A5A4A6

R3C3

…

Figure 2. Conceptual View of a context relation Rc.

Attributes in different relation schemas may have the same name, i.e. refer to the same element of the set of possible attribute names. Note that the definition of a context relation schema allows for context relations with an empty set of relation schemas, which is convenient during the construction of context relations and for managing intermediate query results. Also, a context relation Rc could be defined with Cs = ∅ and at most one relation schema R1 with Cp1 = ∅ , which is a natural incorporation of traditional non-context relations to our model.

DEFINITION 4.3. Context Relation Instance

A context relation instance of a context relation schema Rc is an (ordered) pair of (i) a reference to context schema Cs and (ii) a finite set of (ordered) pairs of relation instances and context specifiers defined over Cs. If, assuming a relation Ri, I(Ri) is the instance of Ri, i.e. the set of tuples in Ri, then:

I(Rc) = < Cs, {<I(R1), Cp1>, <I(R2), Cp2 >, …} >

Where {R1 R2, …} are the relation schemas defined in Rc and {Cp1, Cp2,…} the corresponding context specifiers.

For convenience, in the rest of the paper, we will refer to a context relation instance as a context relation.

DEFINITION 4.4. Context Database Schema

A context database schema DBc is defined as a nonempty finite set of context relation schemas.

Revisiting the example of Section 3 and simplifying the example in order to be able to illustrate some key features of our model and operations, we define the context database schema e-Marketplace = {Product, Category}. Both context relation schemas Product and Category are defined with respect to the same context schema: Cs = <Supplier, Location, Date>, where Supplier and Location are Strings and Dates are only years in this simple example. An example instance of context database e-Marketplace is presented in Figure 3. Context relation Category has only one schema for all possible context instances and has the same semantics with traditional relations as it is defined for any context. Attribute PID is the key in all schemas of context relation Product and is a unique identifier for products. Attributes Name, price and the foreign key CID are defined in all schemas of Product, while

attributes VAT and Qty (available quantity) are defined in some of them. We can see that under different contexts the same real world product may have different values. For example, product with PID = 5 is called ‘iCD’ in UK, ‘myCD’ in Greece and is not sold from any supplier in the USA. Also, the price of any product depends on the supplier selling it and the country where it is sold.

PID Name Price Qty CID

1 ipod 140 250 12

2 walkman 35 180 12

3 mouse 22 20 11

< SA, Greece, 2008>

PID Name Price VAT CID

2 walkman 43 19 12

3 mouse 28 8 11

5 iCD 47 19 12

<SA, UK, 2008>

PID Name Price CID

1 ipod 160 12

2 walkman 35 12

5 myCD 44 12

< SB, Greece, {2007,2008}>


1 ipod 180 8 12

3 mouse 22 8 11

4 keyboard 30 19 11

< SB, UK, 2008>

CID Name

11 computers

12 music players

< *, *, *, *>

Context Relation Product

Context Relation Category

PID Name Price CID

1 ipod 110 12

3 mouse 20 11

< SA, Greece, 2007 >

PID Name Price VAT Qty CID

1 ipod 140 19 95 12

2 walkman 46 8 140 12

3 mouse 22 8 220 11

< SB, USA, 2008>

Figure 3. Example instance of context database e-Marketplace

5. OPERATIONS In this section, we define a basic set of operations over our context aware model. In the following definitions, we will use the notation R, S for traditional, non context relations, Rc, Sc for context relations and {R1, R2, … , Rn} for relation schemas of context relation Rc. A relation schema is always referenced using the context relation it belongs to, but for ease of use in some examples, we use Ri implying Rc.Ri. If Cs is a context schema, then CRi, or more simply Ci, is a context specifier in Cs and relation schema Ri is defined over CRi. Also, assuming a relation R, I(R) is the instance of R, i.e. the set of tuples in R, and Schema(R) represents the Schema of R. Finally, in the presentation of the operations we use the notation operation(INPUT) OUTPUT, in order to show the type of input(s) of an operation and the type of output. For example, (σc CONDITIONS Rc Rc΄) means that the σc operation accepts as input a context relation Rc and its output is a new context relation Rc΄. If not explicitly stated, it is assumed that the output context relation is defined with respect to the same context schema as the input context relation.

We will begin in Section 5.1 by extending the basic operations of the relational model, namely select, project, cartesian product and set operations, in order to define them over context relations. In Section 5.2, we define all the additional fundamental operations that are not covered by the relational model, but are needed for the completeness of our model. In Section 5.3, we show that our set of operations is downwards compatible with the relational algebra. In Section 5.4 we present two additional operations. Even though they could be left out of a core model, they increase the expressive power of any language built over our proposed model. Finally, in Section 5.5 we present a set of high level operations that can be practically useful to context aware applications and we show how they can be expressed through the use of our fundamental operations.

5.1 Extended Relational Operations In this section, we discuss how to extend the basic relational operations and define them over context relations. In this way, we allow for a uniform treatment of the different relation schemas defined for a context relation through the use of a single query. The operands of those operations are context relations and the result is also a new context relation. We define the extended relational operations in such a way that if we extend every traditional relation to its straightforward context aware counterpart, all the properties of the relational operations will still hold and the result of any expression written in relational algebra will not change. Given any valid context schema Cs, we can extend a traditional

relation R by defining a context relation Rc with a single relation schema R1, such that Schema(R1) = Schema(R), I(R1) = I(R) and the context specifier for R1 is C1 = <*,..,*>. Using a set of additional operations that are defined in Section 5.2, we show in Section 5.3 that our set of operations is consistent and downwards compatible with the relational algebra.

(a) Project: π {A1, …, An} Rc Rc΄ The project operation is executed for every relation schema Ri of input context relation Rc where all projected attributes A = {A1,…, An} are defined, while relation schemas with at least one of the requested attributes in A not defined are not returned at all. If there is no relation schema in Rc where all attributes in A are defined, then the operation is not valid. For example, in the database of Figure 3, π{VAT, Qty}Product will result to a context relation with a single relation schema: < Result(VAT , Qty), <SB, USA, 2008 > >.

(b) Select: σ CONDITIONS Rc Rc΄ The result of a select operation over a context relation Rc is a new context relation, with each relation schema Ri having only those tuples that satisfy the select condition. Valid select conditions are those that are also valid in the relational model, while relation schemas with at least one of the requested attributes in CONDITIONS not defined are not returned at all. If there is no relation schema in Rc with all attributes in CONDITIONS defined, then the operation is not valid.

PID VAT

2 19

5 19

<SA, UK, 2008>

PID VAT

4 19

< SB, UK, 2008>

Context Relation Result

PID VAT

1 19

< SB, USA, 2008>

Figure 4. Result of σ operation

Returning to the example database of Figure 3, assume a query requesting for all products that have a high VAT, i.e. a VAT more than 10%: Result = π {PID,VAT} (σ(VAT > 10)Product). The result of this query is presented in Figure 4.

(c) Cartesian Product: Rc × Sc Tc Cartesian Product is a binary operation between context relations defined with respect to the same context schema. Given two context relations Rc and Sc, the result of Rc × Sc is a new context relation Tc with a relation schema defined for every pair of relation schemas in Rc and Sc that are defined with respect to the same context. That means that the Cartesian product is not applied between all the relation schemas of the input context relations, but only on those pairs that are compatible with respect to the context in which they are defined. As there is a possibility that two relation schemas of the input context relations have overlapping but not equal context specifiers, the formal definition that follows is given with respect to every atomic context instance Ci in Rc and Sc and not to the context specifiers. For each atomic context instance Ci that is defined in both Rc and Sc, and the corresponding relation schemas Ri, Si defined in Ci, a relation schema Ti is defined in Tc such that CTi = Ci and Tc.Ti = Rc.Ri×Sc.Si.

PID Price

1 140

< SA, Greece, 2008>

PID Price

<SA, UK, 2008>

PID Price

1 160

< SB, Greece, {2007,2008}>

PID Price

1 180

< SB, UK, 2008>


PID Price

1 110

< SA, Greece, 2007 >

PID Price

1 140

< SB, USA, 2008>

Figure 5. Result of example query with × operation

Using Cartesian Product, we can request for the product ID and price of Products in the category “music players” that cost more than 50 Euros:

Result = π PID, Price (σ (Price > 50) AND (Category.Name = “music players”) AND

AND (Product.CID = Category.CID) (Product × Category ) )

The result of the query is presented in Figure 5. Note that as the only relation schema of context relation Category is defined in <*,*,*>, it can be paired with any relation schema of Product.

(d) Set operations: Rc {∪, ∩, −} Sc Tc. In contrast to the other basic relational operations, where the operation over context relations results from naturally extending the basic one, in the case of set operations we have two possible semantics which are both consistent with the traditional relational algebra; the strong and the weak semantics. In order to clarify and better justify our definitions, we present both, even though we keep only the weak semantics in the end.

Strong Semantics {∪str, ∩str, −str}: Given two context relations Rc and Sc, defined with respect to the same context schema, and a set operation SetOp, the relational version of SetOp is executed for any pair of relation schemas {Ri, Sj} of Rc and Sc that are context union-compatible. Two relation schemas {Ri, Sj} are context union-compatible iff i) they are union-compatible, ii) they are defined with respect to the same context schema and iii) their context specifiers are not disjoint: CRi ∩ CSj ≠ ∅ .

The result of a strong set operation between two context relations Rc and Sc is a new context relation Tc, defined with respect to the context schema that Rc and Sc are defined. The relation schemas of Tc are created from the subset of relation schemas in Rc and Sc that are context union-compatible. For each relation schema Tk in Tc that results from context union compatible relation schemas {Ri, Sj} of Rc and Sc, Tk = Ri SetOp Sj and CTk = CRi ∩ CSj.

Weak Semantics {∪weak, ∩weak, −weak}: A remark about strong semantics is that the result of the

operations is not always what intuition dictates. For example, Tc = Rc ∪str Sc will keep in Tc only those relation schemas {Ri, Sj} of Rc and Sc that are context union-compatible and not all relation schemas from Rc and Sc. Intuitively, in the case of union operation, in the result we would want i) the union of all relation schemas from Rc and Sc that are context union-compatible and ii) all relation schemas that are defined for a context specifier Cp solely in one of the input context relations.

In order to define the three weak set operations, we assume that two context relations Rc and Sc, defined with respect to the same context schema Cs, are given as input. The result of a weak union (∪weak) between two context relations is a new context relation Tc, defined with respect to context schema Cs. For each atomic context instance Ci of Cs defined in either Rc or Sc:

• If Ci is defined in both Rc and Sc and relation schemas Ri, Si, defined in Ci, are union-compatible, then a relation schema Ti is defined in Tc with (CTi = Ci), Schema(Ti) = Schema(Ri) = Schema(Sj) and with Tc.Ti = Rc.Ri ∪ Sc.Si.

• If Ci is defined in both Rc and Sc and relation schemas Ri, Si, defined in Ci, are not union-compatible, then no relation schema is defined for context instance Ci in Tc.

• If Ci is defined only in context relation Rc (Sc respectively), then a relation schema Ti is defined in Tc with the same schema as Ri (Si respectively), such that Tc.Ti = Rc.Ri (= Sc.Si respectively).

Weak intersection (∩weak) is defined in the same way as strong intersection. Finally, weak minus operation (Rc −weak Sc) is defined in the same way as weak union, with the exception that if for a context instance Ci there is a relation schema defined only in Sc, then no relation schema is defined for context instance Ci in Tc.

In order to summarize the differences between the strong and the weak semantics for set operations, let’s assume that two context relations Rc and Sc, defined with respect to the same context schema Cs, are given as input. Context schema Cs has 3 possible context instances: {C1, C2, C3}. Relation schema Rc.R1 is defined in {C1, C2}, Sc.S1 in {C2} and Sc.S2 in {C3}. In order to simplify the example, Schema(R1) = Schema(S1). In Table 3 we present the result of each one of the basic set operations between Rc and Sc, when using the strong or the weak semantics. Strong semantics are summarized in one row as they have the same result when relation schemas for only one or for both input context relations are defined in a specific context.

We can see that when we are using the weak semantics, the resulting extended set operations are more powerful than their strong counterparts. Moreover, as we will show in Subsection 5.2(a), strong set

operations can be written using the weak set operations and an additional primitive operation of our model. As the weak semantics result to more powerful operators and also satisfy our requirement for downwards compatibility with the relational algebra, we keep the weak semantics for the definition of the extended set operations.

Table 3. Summary of Set Operations. Cs C1 C2 C3

Input Context Relations Rc R1 R1 Sc S1 S2

Result Strong Semantics for Set Operations Op={∪,∩ ,−}strong R1 Op S1

Weak Semantics for Set Operations ∪weak R1 R1 ∪ S1 S2 ∩weak R1 ∩ S1 −weak R1 R1 − S1

5.1.1 Extended Relational Operations Properties It is easy to prove that the defined extended relational operations of Section 5.1 have all the common algebraic properties that also hold for the relational operations. For example, assuming that A,B are selection conditions, we can show that: • σΑ(σB(Rc)) = σB(σA(Rc)) = σΑ AND B(Rc) • σΑ(Rc × Sc) = σC AND D AND E(Rc × Sc) = σE(σC(Rc) × σD(Sc))

where (A = C AND D AND E), C only contains attributes from Rc, D contains attributes only from Sc and E contains the part of A that contains attributes from both Rc and Sc.

• σΑ(Rc ∪ Sc) = σA(Rc) ∪ σA(Sc) • σΑ(Rc ∩ Sc) = σA(Rc) ∩ σA(Sc) = σA(Rc) ∩ Sc = Rc ∩ σA(Sc) • π{a1, …, an}(σΑ(Rc)) = σΑ(π{a1, …, an}(Rc)), when A ⊆ {a1, …, an} • π{a1, …, an}(π{b1, …, bn}(Rc)) = π{a1, …, an}(Rc),

when {a1, …, an} ⊆ {b1, …, bn} The proofs are omitted due to lack of space.

In contrast, there is one property of the relational algebra that doesn’t hold for our set of extended operations under specific circumstances. Assume the scenario in which some of the attributes in A (or B) are not defined in a specific context Ci, while all of the attributes in B (accordingly A) are defined in Ci. In that case, even though the equality holds for any other scenario, it is possible that:

σΑ OR B(Rc) ≠ σΑ(Rc) ∪ σB(Rc) The reason for the inequality is that in the result of the expression σΑ OR B(Rc), the relation schema for context Ci is omitted as at least one attribute is not defined in Rc, while in the result of the latter expression we may have some information returned for Ci due to the expression σB(R). We believe that this inconsistency does not essentially affect the behavior of our proposed set of operations with respect to its compatibility to the relational algebra, as it is produced only (i) in the case of disjunctive select conditions, which in many definitions are not even part of the core relational model and, most importantly, (ii) when attributes of the select condition are not defined in specific relation schemas, which would result in a non valid relational expression anyway. Moreover, in Section 5.3 we show that for any given valid relational query, i.e. queries whose select conditions do not include attributes that are not defined in the input relations, our extended set of operations is downwards compatible with the relational model. Furthermore, when drifting away from the traditional relational model and focusing on a wider multi schema relational model, like the one presented here, we can see that the extended set operations that use the weak semantics are essential for defining a complete algebra over context relations.

5.2 Management of Relation Schemas As context relations include more than one relation schemas, we need a basic set of additional operations that are not available in the relational algebra, for referencing and managing relation schemas of a context relation. The most prominent, Schema_Select and Map, are also very useful in expressing many non primitive operations like the set operations which use strong semantics, merge

and split operations, cross context cartesian product and other high level operations like the ones presented in Section 5.5. The rest of the presented operations are not of the same importance, but are necessary for completeness.

(a) Schema_Select: σc CONDITIONS Rc Rc΄ This operation permits the selection of a subset of a context relation’s schemas based on a select condition over (i) its context specifiers, e.g. (location = “Paris” AND date = 2008), or (ii) its relation schema, e.g. ‘VAT is defined’. Valid conditions over relation schemas are of the form ‘Att is defined’, which is evaluated to true if a relation schema includes attribute Att in its definition, and ‘Att is NOT defined’, which is evaluated to true if a relation schema does not include attribute Att in its definition. If the select condition is not satisfied by any relation schema of Rc, a context relation with no relation schemas is returned.

For example, if we want to see products available only in 2007, the query ‘σc (Date = 2007) Product’ over the database of Figure 3 will return a context relation with only the two relation schemas for <SA, Greece, 2007> and <SB, Greece, {2007,2008}>. Moreover, query (σc (Location = Greece) AND (Qty is defined) Product), will result to a context relation with the single relation schema for <SA, Greece, 2008>.

Finally, any strong set operation SetOps can be written using the corresponding weak set operation SetOpw as follows:

Rc SetOps Sc = σc CND (Rc SetOpw Sc) where CND is the condition selecting only contexts in CRc ∩ CSc.

(b) Map: Map (Rc, c_att, ctx_expression) Tc Using Map operation, we are able to alter the values of the context attributes of each context instance in Rc. This operation, together with the following Add and Delete context attribute operations, are crucial for context aware applications that want to query context relations with different context schemas. Using those operations, input context relations can be transformed in order to have compatible context schemas and similar context specifiers for their relation schemas, so that they can be joined or used together in set operations.

The input attribute c_att must be the name of a context attribute in the context schema Cs of Rc or the operation is not valid. Expression ctx_expression can be a constant in the domain of c_att, including *, or any function over c_att. The result of the operation is a new context relation Tc, identical to the input context relation Rc, except for the values of the context specifiers of every relation schema in Rc, in which c_att = ctx_expression. For example, if context relation Rc has only a single relation schema R1 with context specifier C1 = <1998>, then MAP(Rc, Date, Date+1) results in a new context relation Tc with a single relation schema T1

= R1, defined in CT1 = <1999>.

Map and the following add and delete context attribute operations might seem as update operations that should not be defined as data manipulation operations. However, the special role of context in our model for labeling relation schemas makes them an important part of the basic set of algebraic operations, as they are needed in order to be able to produce all possible valid transformations over context relations.

(c) Add Context Attribute: Add_Cxt_attr (Rc, c_att = c_val) Tc Context relation Tc is created by adding context attribute c_att to the context schema Cs of Rc. Each context specifier in Tc results from the corresponding context specifier in Rc by adding attribute c_att with c_val as the default value or ‘*’ if c_val is missing.

(d) Delete Context Attribute: Del_Cxt_attr (Rc, c_att ) Tc Context relation Tc is created by removing context attribute c_att from the context schema Cs of Rc and from each context specifier in Rc. If two or more relation schemas of the resulting context relation Tc have the same or overlapping context specifiers after removing context attribute c_att, they are merged into a single relation schema using a union operation that has the same semantics as the weak union operation defined in Section 5.1(d). If those relation schemas are not union compatible, then Del_Cxt_attr operation is not valid.

As an overall example of add and delete context attribute operations, the result of the following query over the context database of Figure 3 is presented in Figure 6: Result = Add_Cxt_attr( Del_Cxt_attr( π{PID,Name} Product, Supplier), Device=‘PC’)

(e) De-Contextualization: deCxt (Rc) R This operation results to a single (non context) relation and its purpose is to return relational data that will be used in a traditional relational system. Input context relation Rc should have only one relation schema that will be returned. If more than one schemas are contained, the operation will fail. It is the only operation that does not result to a context relation, so it is naturally applied as the final operation after the relation schemas of an input context relation have been filtered through schema selection σc and/or merged using set operations, Map and Add or Delete context attribute operations. For example, in order to show products from supplier SA sold in UK during 2008 using an application developed on top of a relational DBMS, we could use the following query: deCxt (σc (Supplier = SA) AND (Location = UK) AND (Date = 2008) Product)

(f) Contextualization: Cxt (R, Cs, Cp) Rc This operation is the opposite of De-Contextualization operation. Given a relation R, a context schema Cs and a context specifier Cp defined with respect to Cs, a context relation Rc is returned with Rc = < Cs, {<R, Cp>} >.

PID Name

1 ipod

2 walkman

3 mouse

5 myCD

<Greece, 2008, PC>

PID Name

1 ipod

2 walkman

3 mouse

4 keyboard

5 iCD

<UK, 2008, PC >Context Relation Result

PID Name

1 ipod

2 walkman

3 mouse

5 myCD

<Greece, 2007, PC >

PID Name

1 ipod

2 walkman

3 mouse

<USA, 2008, PC>

Figure 6. Map operation

5.3 Compatibility with the Relational Algebra Assume a given set of non-context relations R = {R, S, T, …} and any valid query Q of the relational algebra containing selections, projections, cartesian products and set operations over the relations in R. We transform the set of relations in R to their context aware counterparts Rc = {Rc, Sc, Tc, …} through the use of the Contextualization operation, with context schema Cs = ∅ and context specifier Cp = ∅ . Effectively, what follows would also hold for any given context schema Cs = <c_att1, c_att2, …, c_attn>, if Cp = <*,*,…,*>, or for any arbitrary assignment of values Cpi provided that all context relation counterparts of R are created with the same context parameter Cp = Cpi. We transform query Q to Qc by replacing the relational operations with the extended relational operations of our model and the relations in R with their context aware counterparts in Rc. It is straightforward to prove that as long as Q is a valid query of the relational algebra: Q(R) = deCxt(Qc(Rc)), which essentially means that our extended set of operations is downwards compatible to the relational model. We omit the proof due to lack of space.

5.4 Weak versions of select and project An interesting problem arising from the definition of context relations with multiple schemas is how to evaluate the query in a relation schema of Rc, where an attribute referenced (i) in the project clause or (ii) in one of the conditions of a select operation, is not defined. By observing this schema in isolation from the context relation, the query is invalid and this relation schema should not return at all, as in the definition of extended select and project operations of Section 5.1. In contrast, in order to fetch as much data as possible, we may also want a second interpretation of relational operations that will allow us to evaluate the given query even in those schemas that is not considered valid by strict relational algebra evaluation rules. For example, if we request for attributes {a, b, c} in a relation schema where only attributes {a, b} are defined, we may want the tuples that evaluate the select condition to true to be returned even if attribute c is not defined. Either way, a model for context relations should support both interpretations.

Evaluation of not-defined attributes in select conditions is less simple, as we have to define a set of evaluation rules for select conditions including not-defined attributes on them. A straightforward approach to this problem is to define a special not-defined value (NDF) and a 4-valued logic extending the 3-valued logic (true, false, null). For example, we can define that (NDF AND null) results to NDF or that (NDF AND false) results to false. In order to compare, the approach in the relational model to

the evaluation of queries is that any condition with an NDF value results to NDF and this is the reason why a query that references an attribute that is not defined in a relation schema is always not defined.

We define a 4-valued logic over {TRUE, FALSE, NULL, NDF} by extending the 3-valued logic for nulls. The intuition underlying our definitions is that NDF represents a stronger lack of knowledge than NULL values. The new evaluation rules for clauses of the form (A op B) or (op A) that contain NDF values are presented in Table 4.

Table 4. 4-valued Logic for NDF Values A op B Result

NDF {AND, OR} NDF NDF NDF NOT NDF NDF AND TRUE NDF NDF AND FALSE FALSE NDF AND NULL NDF NDF OR TRUE TRUE NDF OR FALSE NDF NDF OR NULL NULL

(a) Force Evaluation Project: πF {A1, …, An} Rc Rc΄ The result of the πF operation is the same as the extended project operation π when all the requested attributes are defined in a specific relation schema. In relation schemas where at least one requested attribute is not defined, the subset of the requested attributes that are defined is returned. If none of the requested attributes is defined in a specific schema, then it is not returned at all. For example, in the database of Figure 3, πF

{VAT, Qty} Product will result to a context relation with four relation schemas: { <Result(Qty), <SA, Greece, 2008 > >, <Result(VAT), <SA,UK,2008> >, <Result(VAT),<SB,UK,2008> >, <Result(VAT , Qty), <SB, USA, 2008> > }

(b) Force Evaluation Select: σF CONDITIONS Rc Rc΄ The result of the σF operation is the same as the extended select operation σ, except in the case of conditions with attributes not defined in specific schemas. When σF

is used, the conditions are evaluated according to the evaluation rules of the 4-valued logic for NDF values. If the select condition is evaluated to NDF for a specific tuple, then this tuple is not returned in the result. If none of the attributes in the select condition is defined in a specific relation schema, then this relation schema is not returned at all.

Returning to the example database of Figure 3, let’s assume a query requesting for all products except those that have both a high VAT, i.e. more than 10%, and a small available quantity (Qty), i.e. less than 200 available for sale products:

Result = σ NOT ( (VAT > 10) AND (Qty < 200)) Product

PID Name Price Qty CID

1 ipod 140 250 12

< SA, Greece, 2008>


3 mouse 28 8 11

<SA, UK, 2008>


1 ipod 180 8 12

3 mouse 22 8 11

< SB, UK, 2008>


PID Name Price VAT Qty CID

2 walkman 46 8 140 12

3 mouse 22 8 220 11

< SB, USA, 2008>

Figure 7. Result of σF operator

If we use the extended select operator σ, the query can be evaluated to true only for the relation schema with Cp = <SB, USA, 2008> and the result will include only the tuples with PIDs 2 and 3. However, if we consider the semantics of the query that we posed, we would also want to retrieve tuples from

schemas with only attribute VAT or attribute Qty defined. Using the σF operator and due to the evaluation rules of 4-valued logic, the user would get the desired result that is presented in Figure 7.

5.5 High Level Operations In practical scenarios, context aware applications may need some more advanced, high level operations in order to efficiently process context aware information. We will briefly present three of the most interesting high level operations here and show how they can be expressed using the fundamental operations presented in Sections 5.1 and 5.2.

(a) Merge and Split operations can be thought of as the analogous operation for context aware relations of the fold and unfold operations that are used in temporal databases. Instead of applying those operations only to dates and time, we can extend them in order to work over any set of context attributes. The basic operation can be defined as Merge (Rc, Ri, Rj). Using this operation, we can merge two union-compatible relation schemas Ri, Rj of a context relation Rc into one, in order to present or use in following queries similar information from multiple contexts. The result of the operation is a new context relation Tc that is identical with the input context relation Rc for every relation schema except for the two relation schemas Ri and Rj, which are replaced by a new relation schema resulting from their union. If Ri, Rj are not union-compatible, then the operation is not valid. We can write a Merge operation as follows: Merge ( Rc, Ri, Rj ) = TMP1 ∪ TMP2 TMP1 = Map (σc Context ≠ Ctx(Rj) Rc, CRi, CRi ∪ CRj ) TMP2 = Map (σc Context ≠ Ctx(Ri) Rc, CRj, CRi ∪ CRj ) where, in order to simplify the presentation, the select conditions in the Schema_Select σc operations are an abbreviation of the conditions that would select any relation schema except the one for Rj and Ri accordingly, while the map operations are also an abbreviation of the set of map operations that are needed to map all the context attributes of Ri or Rj to their union.

Equivalently, operation Split (Rc, Ri, Ci1, Ci2) can be defined as the opposite of Merge operation. It is used in order to split a relation schema Rc.Ri with context specifier CRi to two identical relation schemas Ri1 and Ri2 with non empty context specifiers Ci1 and Ci2, such that CRi = Ci1 ∪ Ci2 and that Ci1 ∩ Ci2 = ∅ . Split operation can also be written using the union operation of 5.1, if we define an empty temporary context relation TMP1 with only one relation schema, which is identical to the schema of Ri.

Split (Rc, Ri, Ci1, Ci2) = Rc ∪ TMP1 TMP1 = < Cs, { <{}, Ci1>} >

(b) Cross Context Cartesian Product: Rc.Ri ×c Sc.Sj Tc.T1 This is a special version of the Cartesian Product operation, which allows the join of any two relation schemas irrespectively of their context specifiers. Recall that the extended Cartesian Product operation is an operation over context relations and that only those pairs of relation schemas that are compatible with respect to the context in which they are defined, are joined. Cross context Cartesian Product is a very useful operation, as it allows the comparison between tuples in different contexts. The most common scenario is the cross context Cartesian Product between relation schemas of the same context relation, so Rc = Sc in this case, in order to compare different versions of the same tuple under different contexts. Assume two relation schemas Ri and Sj of context relations Rc and Sc, with context specifiers Ci and Cj respectively. The result of a cross context Cartesian Product between Ri and Sj is a new context relation Tc with a single relation schema T1 = Ri × Sj and context specifier CT1=Ci ∪ Cj. Using the same abbreviations as in the case of Merge operation, we can express cross context Cartesian Product as follows: Rc.Ri ×c Sc.Sj = Map (σc Context = Ctx(Ri) Rc, CRi, CRi ∪ CSj ) × Map (σc Context = Ctx(Sj) Rc, CSj, CRi ∪ CSj )

6. EVALUATION EXAMPLE In this section, we present an evaluation example in order to illustrate some of the proposed operations using a more realistic example and justify why an inherently context aware model is needed in comparison to relational implementations.

Using the context aware database of Figure 3, supplier SA has realized that during the last years his profits in products that cost less than 50 Euros have been significantly reduced, because many customers in the UK buy the products from abroad. He found that the problem is originating especially

from countries where he doesn’t sell those products, while one or more other suppliers offer them in lower prices than his prices for the UK. So, he wants to find for each country and year, those products that cost less than 50 Euros, are offered by at least one supplier but not by him, and cost less than his products for UK in the same year. The first step is to find all products for each country and year that are sold only by other suppliers and cost less than 50 Euros:

RSA = Del_Cxt_attr (π PID ( σc (Supplier = SA) Product), Supplier) RNotSA = Del_Cxt_attr(π{PID, Price}( σc (Supplier ≠ SA) Product),Supplier) Rothers = σc (Country ≠ UK) ( π PID (RNotSA) − RSA ) RTmp1 = σ(Price < 50) ( (RNotSA × Rothers) )

PID Price

5 44

<Greece, 2008>

Context Relation RTmp1

PID Price

2 35

5 44

<Greece, 2007 >

PID Price

2 46

3 22

<USA, 2008>

PID Price

2 43

3 28

5 47

<*, 2008>

Context Relation RTmp2

(a) (b)

PID Price

5 44

<Greece, 2008>


PID Price

3 22

<USA, 2008>

(c) Figure 8. Results for the Evaluation Example

In RSA we kept only the relation schemas for supplier SA and in RNotSA for all other suppliers. Using Del_Cxt_attr operation, we modified their context schema to <Location, Date> in order to make relation schemas for the same country and year context union compatible and be able to subtract RSA from RNotSA and store in Rothers the PId of all products sold only by other suppliers outside UK. The final Cartesian product is needed in order to keep both PID and Price information. Result relation RTmp1 is presented in Figure 8.a. The next step is to isolate all relation schemas of supplier SA in UK in order to join them with the temporary result RTmp1: RTmp2 = Map( Del_Cxt_attr( π {PID, Price} ( σc (Supplier = SA) AND (Location = UK) Product) ), Supplier), Location, *) Context attribute Location was mapped to ‘*’ so that information for UK will be joinable with that of other countries for the same year. In example database of Figure 3, supplier SA has defined only one relation schema for UK, so result RTmp2, which is presented in Figure 8.b, has a single relational schema for 2008, but in a real world database he would have one relation schema for each year. Finally, in order to retrieve the products that are the answer to his analysis query, the following operation is applied: Result = π RTmp1.PID, RTmp1.Price ( σ (RTmp1.Price < RTmp2.Price) AND

AND (RTmp1.PID = RTmp2.PID) (RTmp1 × RTmp2) )

So, from the result, which is presented in Figure 8.c, he concludes that the products that possibly caused him loss of profit in 2008 were product with PID=5 in Greece and product with PID=3 in USA. By merging all the operations presented, we can see that we can write the data analysis query of supplier SA in a single expression that uses our proposed operations. Moreover, this expression is always the same, regardless of the number of suppliers or relation schemas defined in general in our context aware database. Even if we had information about products in 100 countries and for the last 20 years, the query would not have to change. The same holds for database updates with new relation schemas. In contrast, if we had used a relational implementation and relational algebra for querying the information, we wouldn’t be able to write this query as a single operation. We would have to externally process the intermediate results using a procedural language and pose multiple queries to the database, for example in order to implement the minus operation over varying number of defined relation schemas. Moreover, any query created would be valid only for the specific set of relation

schemas defined and dependent on the number of suppliers, countries and years defined in the database.

7. CONCLUSION AND FUTURE WORK In this work, we illustrated the evident need for incorporating the notion of context and context aware data management into database management systems. We identified the requirements for a context aware model and the main features that a context aware data management system should provide. Our key observations are that: i) context is used to partition the instances of a relation, ii) a relation may have different schemas in different contexts, and iii) complete management of context aware information cannot be supported by relational algebra per se, without the use of procedural methods in the application layer. Consequently, based on those requirements, we defined a context aware data model, which inherently supports context and context aware relations as first class citizens. Finally, we defined a set of operations over context aware relations that extend the relational algebra using a downwards compatible approach.

Our ongoing work focuses on query languages over the context aware data model and implementation of the proposed model. We have defined a downwards compatible extension of SQL for the efficient manipulation of context aware data. The language provides the full data manipulation capabilities of SQL, while all operations are defined over context relations. Our intention is to realize a non-intrusive implementation that would require minimal additions to the query engine of a traditional relational Database Management System. We are currently comparing different implementations of context aware relations over the relational model, which, due to space constraints, will be presented in future publications, and investigate possible alternatives for native storage of context aware data. Indexes over context aware data will also be considered in a future proposed native storage scheme.

Regarding the query engine, the definition of a query language is strictly linked to translating queries of the extended SQL to query plans composed merely of the operations that are defined in this paper. Our immediate concern is to identify the best query plans and to optimize queries expressed in the extended SQL. Finally, we try to address the very interesting issues that rise with regard to the complexity and completeness of languages over our model through the definition of an extended relational calculus, which will help us understand the fundamental properties of such a model.

8. REFERENCES [1] Date, C., Darwen, H., and Lorentzos, N. Temporal Data and the Relational Model. Morgan

Kaufmann Publishers, 2002. [2] A. Dey and G. Abowd. Towards a Better Understanding of Context and Context-Awareness, In

CHI Workshop, 2000. [3] Feng, L., Apers, P., and Jonker, W. Towards Context-Aware Data Management for Ambient

Intelligence. In DEXA 2004. [4] Ghidini, Ch., and Giunchiglia, F. Local models semantics, or contextual reasoning = locality +

compatibility. Artif. Intell. 127, 2 (Apr. 2001), 221-259. [5] Grant, J., Litwin, W., Roussopoulos, N., and Sellis, T. Query languages for relational

multidatabases. VLDB Journal 2, 2 (Apr. 1993), 153-172. [6] Güting, R. H. An introduction to spatial database systems. VLDB Journal 3, 4 (Oct. 1994), 357-

399. [7] Güting, R. H., et.al.. A foundation for representing and querying moving objects. ACM Trans.

Database Syst. 25, 1 (Mar. 2000), 1-42. [8] Halevy, A. Answering queries using views: A survey. VLDB Journal 10, 4 (Dec. 2001), 270-294. [9] Lakshmanan, L. V., Sadri, F., and Subramanian, S. N. SchemaSQL: An extension to SQL for

multidatabase interoperability. ACM Trans. Database Syst. 26, 4, 2001. [10] Mylopoulos, J., and Motschnig-Pitrik, R. Partitioning Information Bases with Contexts. In

CoopIS'95, pages 44-55, Vienna, Austria, 1995. [11] Roussos, I. N., Stavrakas, Y., and Pavlaki, V. Towards a Context-Aware Relational Model. In

CRR’05 Workshop. [12] Serafini, L., Giunchiglia, F., Mylopoulos, J., and Bernstein, P. Local relational model: A logical

formalization of database coordination. In CONTEXT, 2003.

[13] Srivastava, D. and Velegrakis, Y. Intensional associations between data and metadata. In SIGMOD 2007, 401-412.

[14] Stavrakas, Y. and Gergatsoulis, M. Multidimensional Semistructured Data: Representing Context-Dependent Information on the Web. In CAiSE 2002, 183-199.

[15] Stavrakas, Y., Gergatsoulis, M., Doulkeridis, C., and Zafeiris, V. Representing and querying histories of semistructured databases using multidimensional OEM. Inf. Syst. 29, 6 (Sep. 2004), 461-482.

[16] K. Stefanidis, E. Pitoura and P. Vassiliadis. Adding Context to Preferences. In ICDE 2007. [17] Strang, T., Linnhoff-Popien, C. A context modeling survey. In UbiComp, 2004, 34-41. [18] Theodorakis, M., Analyti, A., Constantopoulos, P., and Spyratos, N. Context in Information

Bases. In CoopIS 1998. [19] Wyss, C. M. and Robertson, E. L. Relational languages for metadata integration. ACM Trans.

Database Syst. 30, 2 (Jun. 2005), 624-660.

Documents

A Model for Context Aware Relational Databases ... › pubs › uploads › TR-2008-6.pdf · far, no data model or language for context aware data has been proposed which can be considered