2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

Embed Size (px)

Citation preview

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    1/77

    - i -

    A Framework for Trust Management in

    Mediated Query Systems

    BY

    HOWARD HOW LEUNG LOUIE

    B.S. (University of California, Davis) 1999

    M.S. (University of California, Davis) 2001

    THESIS

    Submitted in partial satisfaction of the requirements for the degree of

    MASTER OF SCIENCE

    in

    Computer Science

    in the

    OFFICE OF GRADUATE STUDIES

    of the

    UNIVERSITY OF CALIFORNIA

    DAVIS

    Approved:

    _________________________

    _________________________

    _________________________

    Committee in Charge

    2001

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    2/77

    - ii -

    Acknowledgements

    The work in this thesis is a result of considerable effort on my part, which would not have

    been possible without the support of many people. I thank my parents, Ton Been and Chung Ping

    Louie. I also thank my brother, Kenneth, who supported me throughout this writing.

    I thank my advisors, Michael Gertz and Premkumar Devanbu, for giving me the

    opportunity to work on this project and for teaching and guiding me throughout this thesis.

    I also thank Hewlett-Packard and Boeing for their generous financial support. The

    summer I spent at Hewlett-Packard Laboratories with Troy Shahoumian, Pankaj Garg, Jerremy

    Holland, Vijay Machiraju, Mohamed Dekhil, and Klaus Wurster was a fun and memorable

    experience.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    3/77

    - iii -

    Table of Contents

    1 Introduction...................................................................................................................................1

    1.1 Motivation..............................................................................................................................1

    1.2 Requirements..........................................................................................................................2

    1.3 Contributions..........................................................................................................................4

    1.4 Structure of the thesis.............................................................................................................5

    2 Background...................................................................................................................................6

    2.1 Software life cycle management ............................................................................................6

    2.2 Trust management .................................................................................................................. 7

    2.3 Mediated query systems......................................................................................................... 9

    3 Infrastructure...............................................................................................................................12

    3.1 Data model ...........................................................................................................................12

    3.2 Trust model ..........................................................................................................................13

    3.2.1 Trust types ..................................................................................................................... 14

    3.2.2 Flow of trust metadata ................................................................................................... 15

    3.3 Trust authorities....................................................................................................................16

    3.4 Trust broker.......................................................................................................................... 19

    3.4.1 Trust broker schema ......................................................................................................20

    3.4.2 Trust broker services .....................................................................................................21

    3.5 Mediator ............................................................................................................................... 23

    3.6 Client .................................................................................................................................... 24

    3.7 Individual component knowledge ........................................................................................25

    4 Formulation of trust in queries....................................................................................................28

    4.1 Overview.............................................................................................................................. 28

    4.2 Conceptual model.................................................................................................................29

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    4/77

    - iv -

    4.3 Query language extensions...................................................................................................37

    4.4 Pragmatic issues ................................................................................................................... 46

    5 Effect of trust metadata on query processing..............................................................................49

    5.1 Overview of query processing..............................................................................................49

    5.2 Changes to mediation in query processing...........................................................................50

    5.3 Integration into mediation .................................................................................................... 62

    6 Conclusions and future work ...................................................................................................... 67

    References......................................................................................................................................70

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    5/77

    - v -

    Table of Figures

    Figure 1.1 Mediated query system...................................................................................................9

    Figure 3.1 Infrastructure overview diagram ..................................................................................16

    Figure 3.2 Overlap of trust statements for wrapper DTD.............................................................. 18

    Figure 3.3 Overlap of specifiers for mediator DTD.......................................................................25

    Figure 3.4 Properties of components known to other components................................................27

    Figure 5.1 Steps to process a query for MQS without trust extensions ......................................... 49

    Figure 5.2 Steps to process a query for MQS with trust extensions .............................................. 50

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    6/77

    1

    1 Introduction

    With the advent of the Web, software life cycle management has the potential to improve

    by leaps and bounds. The old methods of installing software from a CD has already been slowly

    phased out to the purchasing and installation of software directly from the Web. Commercial

    products such as Marimba [Mar98] and research prototypes such as the Software Dock [Hal99]

    take a step further by managing the life cycle of software from installation to retirement of

    software directly via the network.

    The impetus for our information systems research stems from the desire to build a

    software life cycle management system that is both scalable and secure. Previous research

    prototypes such as Software Dock are scalable but have not considered security issues. Our

    research aims to address issues of trust in managing retrieving software life cycle management

    data.

    1.1 Motivation

    We will present a scenario to motivate our research. Consider a user Joe working in an

    organization ABC. Joe currently uses the Java Development Kit (JDK) 1.1 [Sun] and would like

    to upgrade to JDK 1.2. There are many configurations of JDK 1.2 for as many platforms.

    Variations include differences in operating system version, standard or enterprise edition, with or

    without advanced cryptography, etc.. Joe would like to obtain the correct software configuration

    description for his workstation. He submits a query, which states his own desktop configuration,

    plus company ABC's trust policies, to a software configuration information system. The

    information system then retrieves the requested configuration data, perhaps by pulling data from

    one or more sources. The configuration data retrieved satisfies both the trust policies of the

    organization ABC and solves Joe's upgrade problem.

    Our goal is to provide a framework to allow the trust portion of Joe's queries to work.

    Joe trusts the information system to respect the trust policies of ABC. He knows that the data he

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    7/77

    2

    gets back satisfies the trust constraints, and has been annotated to let him know how the query

    result satisfies the trust constraints.

    Our research falls into a hybrid of trust management and data quality issues. Trust

    management has been described as deciding whether requested actions should be allowed. Data

    quality research aims to provide clients of information systems a certain degree of confidence

    about their data. Trust management systems such as REFEREE [CFL97] allow for general

    assertions regarding the properties of Web sources and are not designed for any specific

    information system. REFEREE itself is simply a platform with a language and evaluation

    environment for trust policies. Integrating the language and trust policy evaluation environment

    into some information system requires more research.

    Information systems that allow for requirements on data often do so under the viewpoint

    of data quality. Systems such as described in [NLF99] allow for specifying a certain degree of

    quality necessary in the query result. Such systems, however, contain metadata regarding quality

    that is centrally administered and somewhat static. Distributing the task of creating metadata

    would make such frameworks more scalable, dynamic, and responsive to information resource

    changes over time.

    The combination of decentralized and flexible creation of metadata assertions along with

    policies for specifying requirements on data, combined within the information systems paradigm

    - that is the framework envisioned by our research.

    1.2 Requirements

    We outline our requirements for the framework below. The requirements are divided into

    four broad categories and we discuss each one in turn.

    First and foremost, we must allow decentralized assertions of trust for information

    sources. Decentralization allows for scalability and dynamism. Other frameworks, such as

    [Kha96] also decentralize their approach to prescribing assertions. If we allow for distributed

    components to state assertions we are no longer limited to almost static metadata. At the same

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    8/77

    3

    time, the decentralized sources of assertions automatically decouple the producers and consumers

    of those trust assertions. Thus, assertions, once created, may be utilized by multiple consumers.

    Any individual that desires the advantage of using trust assertions made by others in

    collecting data should be provided a conceptual model to formulate trust requirements. The

    requirements must be independent of the data content requested. When the trust requirements are

    independent of the data, the trust requirements may be specified separately from the data

    requirements, which is important from an administrative viewpoint. Our software life cycle

    application specifies that not all users of the information system may know when or why trust is

    granted. The user may simply rely on a central security administrator to provide the trust

    requirements. Therefore the data content and trust requirements must be independent of each

    other.

    The trust model developed must specifically be compatible with the XML data model

    [BPS00]. There are two reasons for this. First, we are motivated to address the software life

    cycle management problem. Although other languages and schemas have been developed to

    describe software configurations [HHW98, HHW98b], a schema for software configurations has

    been defined using the XML data model, e.g. a DTD has been defined for software configurations

    [HHW99]. Second, the XML data model is industry standard for data exchange and data

    integration. It is flexible enough to handle data for all kinds of applications from all types of

    heterogeneous information sources.

    We choose to build our framework for trust on top of the mediated query system (MQS)

    [DD99] due to its advantages of flexibility, dynamism, and transparency. Therefore, our trust

    model must be easily integrated with existing mediation frameworks. The advantage of

    flexibility, dynamism, and transparency of MQS must not be hindered. Mediators provide a

    value-added service by making the disparate data from information sources more useful as a

    whole [Wie92]. Our additions must not limit the current functionality of MQS in any way.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    9/77

    4

    1.3 Contributions

    Our contributions include the following:

    1 An architecture to enhance MQS with trust extensions

    2 Formalizing the notion of trust assertions, including its types and semantics

    3 Describing a central entity to collect trust assertions and transform the

    assertions into trust metadata.

    4 A model to conceptualize trust requirements, including a language to

    express the requirements.

    5 Outline of mediation extensions to utilize trust metadata for the benefit of

    trust-aware data integration.

    Our architecture extends, not replaces, the MQS architecture. We formalize the notions

    of trust authorities, which are the producers of trust assertions, by defining the abilities and

    identifying their knowledge. The infrastructure is decentralized because it allows independent

    producers of trust assertions. Since the architecture is decentralized, it is also scalable and

    dynamic. It is dynamic because producers may join or leave the system at will - they are not

    bound to the system. It is scalable because there is no upper limit to the number of trust

    authorities.

    Borrowing from paradigms found in real-world organizations, trust authorities form trust

    assertions. The trust assertions are similar to accreditation for academic institutions, or ratings of

    the strength of insurance companies. The trust assertions are a certification of some aspect of the

    data from information sources. The semantics for the use of trust assertions are also defined.

    Usage of trust assertions is non-destructive, so we may have no limit on the number of trust

    assertion consumers.

    We design a trust broker to store and convert trust assertions into trust metadata.

    Management of the trust metadata is handled by the logically centralized trust broker. The

    decentralized producers encourage dynamic trust metadata that is responsive to changing sources.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    10/77

    5

    The conceptual model we provide to clients for specifying trust requirements is

    independent of the data content requested. The advantage of the data request and trust

    requirements independence, as Section 1.2 points out, is that the trust requirements may be reused

    for many queries. Another advantage of the independence is that the query language extensions

    are not bound to any particular query language. The query language extensions also allow for

    specifying a liberal or a conservative application of trust requirements.

    Finally, we outline and give examples of how mediation may be extended to include

    trust. The additional considerations from the trust metadata assist in eliminating sources not

    trusted according to the clients requirements, and for conflict resolution according to the clients

    requirements.

    1.4 Structure of the thesis

    Chapter 2 provides background information on relevant research and technologies.

    Chapter 3 discusses the infrastructure that supports the trust model. We detail the functionality of

    a trust broker to collect and manage the trust assertions. We show how assertions of trust may be

    formulated in a decentralized and scalable manner. Chapter 4 examines the conceptual model of

    trust available to clients. This same chapter also specifies the usage of the language used to

    represent the criteria of clients regarding trust. Chapter 5 gives an outline of how the mediator

    makes use of trust metadata in its mediation. Finally, Chapter 6 gives the conclusions and future

    work.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    11/77

    6

    2 Background

    Our research draws on many related technologies. The areas of trust management,

    software life cycle management, and mediated query systems provide the basis for our work, and

    in return we make a contribution related to those fields. Section 2.1 gives the background on

    software life cycle management and related works. Section 2.2 covers trust management. This

    includes policy languages and paradigms for trust. Section 2.3 explains the principles of mediated

    query systems and gives some examples of existing systems.

    2.1 Software life cycle management

    Software life cycle management is concerned with the management of software, from the

    delivery of the software to retiring the software at the client site. Hall's Ph.D. thesis [Hal99] is

    the first to formalize the notion of software life cycle management. He provides a framework

    within which he divides software life cycle management into identifiable, distinct stages: release,

    retire, install, activate, deactivate, reconfigure, update, adapt, and remove. Hall also architected

    the Software Dock [HHH97], which takes full advantage of the software life cycle framework.

    The Software Dock uses agents to facilitate software lifecycle management. Agents

    travel to and from release docks (representing software producers) and field docks (representing

    software consumers). The agents learn of software releases from the release docks, and make

    changes at the consumer side through the field docks as necessary. A wide-area event system

    provides communication between docks and agents, providing notification of changes.

    Marimba [Mar98] has a product called Castanet which handles delivery, update,

    management, and repair of custom and shrink-wrapped applications over the networks. The

    Castanet model is that the application server provides all the file and directory artifacts and

    registry changes one needs to configure applications. Encryption, user authentication and

    application authentication provides the necessary security measures. The Castanet infrastructure

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    12/77

    7

    deployment tools are proprietary and organizations need to buy into Castanet products before

    deploying their software.

    Microsoft and Marimba together has created the Open Software Description (OSD)

    format [HPT97]. Currently a W3 Consortium proposed standard, OSD has a vocabulary for

    describing relationships between software components with their various versions. OSD is

    related to Microsoft's Channel Definition Format (CDF) [Ell97], which is used to "push" software

    systems for automatic installation and update.

    The Desktop Management Task Force, a personal computer management standards

    setting consortium, has created a software management interface called the Desktop Management

    Interface (DMI) [DMI98]. The DMI is a common interface to manage applications. The

    Management Information Format (MIF), also part of the DMI specification, was created to

    describe computer systems. The Common Information Model (CIM) [CIM98] is an object-

    oriented model for describing the systems, and replaces the MIF.

    2.2 Trust management

    Research for trust management has been done in the public key infrastructure space

    (PKI), and the Web space. Regardless of the domain space, the central issue of trust remains the

    same: Why do we believe/adopt that data/code?

    Trust in most systems is assumed to be a property we assign to an entity. This property

    allows that undertaking an operation using the entity will not violate the security and integrity of

    the underlying system in any way [CFL97]. The entity that is assigned trust may be, for example,

    some data, a process, or a person. Policies for assigning trust may vary. Usually policies are

    based on the limitations and intended security of the system. 1) Trust all, 2) trust only if meet

    criteria, such as authentication (examples are Microsoft Authenticode [Auth], public key

    infrastructure [PKI00]), and 3) don't trust are all examples of policies enforceable based on the

    abilities of the underlying system. Don't trust, in some cases, allows that if the entity is some

    executable code then there is some ability to monitor the code or modify the code [ET99].

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    13/77

    8

    PolicyMaker [BFL96, BFL96b], PGP [Zim94], and hierarchical public key

    infrastructures (PKI) [Win98] each offers different approaches for trust. Hierarchical PKI assume

    some central omnipotent certificate authority which everyone trusts. PGP assumes an ad hoc

    approach in which we trust our trusted friends to vouch for keys. PolicyMaker allows for writing

    trust policies, and advocates binding actions to keys (instead of identity to keys), thus anyone

    holding the key may perform the corresponding action.

    In the Web space, REFEREE [CFL97] allows for specifying trust policies and provides

    an environment to safely evaluate compliance of policies and actions with the specified policy.

    REFEREE builds on the infrastructure supported by PICS [RM96]. PICS is a W3C

    recommendation for labeling anything with a Uniform Resource Indicator (URI) [BF98] on the

    Web. For example, Web resources are described using PICS labels. The PICS labels define

    properties of the resource (e.g. executable code has been virus checked). The labels are made by

    rating services. Users specify trust policies that assert what rating sources are trusted and the

    requirements imposed on the PICS labels. Users are not necessarily aware of the resources PICS

    labels describe, but rely on the rating services with the PICS labels to select trusted resources.

    Labels may be collected in a label bureau. PICS labels are machine readable so programs can be

    written to automatically categorize, filter, etc. labeled Web resources.

    Trust Management on the World Wide Web [KR98] outlines the basic elements of trust

    on the Web, and the implications of trust management for future Web applications. They define

    trust management as a "framework for decentralizing security decisions that helps developers and

    others in asking ' why ' trust is granted rather than immediately focusing on ' how ' cryptography

    can enforce it" [KR98]. It states the issues in describing principles, principals, policies, and

    pragmatics of trust management infrastructures. It also highlights the need for people working

    and using the Web to help turn the Web into a Web of trust.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    14/77

    9

    2.3 Mediated query systems

    Our approach to the enabling of trusted query results is developed on the foundation of

    mediated query systems (MQS). MQS allows for using a single, dynamic schema to access

    multiple, dynamic and heterogeneous sources of data. The MQS architecture is divided into three

    layers: the application, integration, and data sources layer [DD99]. Mediators at the integration

    layer provide a single interface for clients at the application layer and perform integration of data

    sources with heterogeneous data models [Wie92]. Rules and constraints written by application

    domain experts constitute the simple intelligence mediators have for data integration. Clients

    formulate queries based on the mediated schema. The unmaterialized schema is composed of

    views corresponding to information sources and other mediators. The exported schema of a

    mediator may be used as a query interface for other mediators thus, mediators may be stacked.

    To enable mediators to handle heterogeneous data models and schemas, wrappers at each

    data source provide a uniform data model to the integration layer. Thus the mediation component

    itself (exclusive of the wrappers) deals only with an uniform data model. Figure 1.1 depicts

    multiple MQS with clients and multiple data sources. Wrappers are not shown but assumed where

    we have sources.

    Figure 1.1 Mediated query system

    Client Client

    Query flow & Data flow

    Mediator

    Source

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    15/77

    10

    Query decomposition and data integration are the two principal tasks for mediators.

    During the decomposition of queries, mediators use rules to form subqueries to send to sources.

    When the result objects are returned from sources, mediators perform data integration using rules

    to create objects for export.

    Rules can be specified in the number of languages [BRU96], including Mediator

    Specification Language (MSL) [GPQ95], datalog [Ull97], and Object Query Language (OQL)

    [Cat96]. In general, rules are really queries. The body of the rule selects from the source and the

    head of the rule forms objects, which the mediator exports. Datalog is a prolog-like logical rule-

    based pattern matching language. MSL is a variant of datalog that allows for querying

    unstructured as well as structured data. In contrast, datalog can only be used to query structured

    data. OQL is an object-oriented version of the Structured Query Language (SQL).

    Rules are used in object fusion. Object fusion entails constructing a result object (e.g.

    Object Exchange Model (OEM) [PGU95] object or XML document) from data gathered from

    querying multiple sources [PAG96]. Sometimes the source data is inconsistent or has redundant

    data. For inconsistencies, conflict resolution is necessary. For example, rules may provide a

    priority that favors objects with the most recent date subobject. This type of conflict resolution

    always favors the most recent date. Also, to eliminate retrieving the same data twice, or even to

    avoid inconsistencies, the rule may say to retrieve a subobject from a secondary source only if a

    primary source does not already have the same subobject.

    Some examples of MQS include TSIMMIS [CGH94], HERMES [Sub], Information

    Manifold [LRO96], and InfoSleuth [BBB97]. TSIMMIS provides for the rapid, declarative

    generation of mediators and wrappers that integrate diverse and dynamic data from multiple

    heterogeneous sources. Some of the chief contributions of the TSIMMIS project include the

    Object Exchange Model (OEM) [PGU95], the Mediator Specification Language [PGU95], and

    wrapper- and mediator-generators [GPQ95]. HERMES provides a general, declarative language

    for creating extensible mediators. Such mediators allow for incremental integration of new

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    16/77

    11

    systems into existing mediator systems. InfoSleuth is an agent-based information retrieval and

    processing system. Information Manifold uses descriptions of source content and capability to

    prune efficiently the set of available sources and thus allow for scaling up to hundreds of

    information sources.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    17/77

    12

    3 Infrastructure

    In this chapter, we present an infrastructure that allows for the management of trust and

    its application in the mediation of data. After a brief introduction to the components in our

    infrastructure, we present our data model in Section 3.1. Section 3.2 states the trust model used

    throughout the rest of the thesis, including notions of trust types and the flow of trust metadata.

    Section 3.3 discusses trust authorities and trust statements. Section 3.4 details the trust broker,

    which includes its schema for storing trust metadata and services for manipulating trust metadata.

    Section 3.5 examines the schema extensions to the mediator. Section 3.6 discusses how clients

    specify trust requirements for queries and for trust their associated semantics. Finally, Section

    3.7 examines the knowledge required for each component in the order to fit together this

    framework.

    Our infrastructure builds on a mediated query system (MQS) infrastructure [DD99]. As

    discussed in the background section, the MQS infrastructure provides mediators through which

    clients submit queries and the mediator integrates a query result for the client from various

    sources. We will introduce trust extensions to the mediator, and introduce the notions of trust

    authorities(TA) and trust broker(TB). Trust authorities validate sources according to their own

    special standards and publish trust metadata on those sources. The trust authorities are known to

    clients through mediators. Mediators are known only to the client. The trust broker provides the

    separation of concerns for managing trust metadata. The trust broker provides a direct trust

    metadata collection and dissemination service for those mediators with trust extensions.

    3.1 Data model

    Our research focuses on extending the MQS where the MQS is based on the XML data

    model. Many sources with heterogeneous data models and schemas are integrated into the MQS.

    Wrappers convert data from each source into a common XML data model. The wrappers have

    different schemas that each conform to a document type definition (DTD) [BPS00]. The

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    18/77

    13

    mediator supports a DTD and exports valid XML data to the client. A DTD can be used to

    specify a grammar or a schema. XML standards such as SAX, document object model (DOM)

    [ABC98], and InfoSet [Cow00] allow the mediator to process the disparate data from sources

    before returning the query result to the client.

    Other data models we have considered for the mediator include the relational model. It is

    possible to develop this theory for the relational model and for the object-oriented model. Since

    the XML data model can be used to represent relational and object-oriented data, we only

    concentrate on the XML data model.

    We assume that the mediator and wrappers in the MQS are all required to support a DTD.

    The DTD may be different for each wrapper. If the DTD varies from the wrapper to the mediator

    then we assume the mediator has methods available to transform XML data from a wrapper's

    schema to the internal schema of a mediator. XSL [ABC00] programs are an example of how one

    can translate from the wrapper DTD to the mediator DTD. Henceforth, we assume MQS

    wrappers and sources are treated as a single component, and each wrapper exports all of its

    source's schema and data. We assume that different mediated systems with the same data model

    and different mediated schemas will nevertheless use identical wrappers if the source happens to

    be shared.

    3.2 Trust model

    Determining whether data from Web-based sources may be trusted or not can be a

    difficult problem. We seek to provide assertions about trust for data. These assertions, in the

    form of trust statements, may be processed by MQS components in order to provide a trusted

    query result. To provide more flexibility in trust statements, we allow trust statements to be

    classified by trust types. The notion of a type for trust statements allows different applications to

    define their own meanings and intentions for trust. For example, in a mission-critical application,

    the trust type may state that published trust statements have legal ramifications.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    19/77

    14

    Trust statements form the basis of our trust model. A trust statement structure is a 4-ary

    tuple . The semantics of the trust statement is

    that the trust authority asserts that the data at source satisfying qualifier are trusted with

    respect to trust type.

    3.2.1 Trust types

    There can be many reasons why a source is trusted. These reasons are expressed via the

    trust type. Trust types allow a trust statement to specify how or why a source is trusted. A trust

    type is a pair , where TYis a single English word, and TY-URI is an Uniform

    Resource Indicator (URI) [BF98] specifying where the definition of the trust type TYmay be

    found. By having an English word denote the trust type, the intuitive meaning of the trust type is

    immediately available. Also, by having a document written in a natural language detailing the

    specific meaning of the trust type, the exact definition of the trust type can be examined at

    anytime. By having multiple trust types available within the MQS, more flexibility is allowed for

    trust statements. This allows the most appropriate trust type to be utilized by a component

    wishing to publish the trust statement.

    A set of trust types and their definitions are created for each application by consensus

    among trust authorities. Trust statements are based only on trust types from the set. New trust

    types may be added and old trust types may be deleted if they are no longer appropriate. When a

    trust type is deleted, trust statements based on that trust type also deleted. Adding and removing

    from the set of trust types is done by a centralized component, the trust broker. Adding and

    removing trust statements is also guarded by the trust broker. No other component may directly

    add or remove trust statements.

    The sample trust types below covers some data quality [NLF99] issues. Trust in these

    examples is based on the quality of data. The Insured100K example below shows that a

    monetary value can be placed on the trust a trust authority has for a source's data. Since the trust

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    20/77

    15

    type can be arbitrarily defined, the basis and definition of the trust type may provide for

    guarantees, insurance, and other methods of demonstrating trust.

    Approved The source has high reliability and excellent reputation. Reputation is based on

    professional experience. Reliability is based on experimental methods.

    Insured100K The trust authority will guarantee that all data from a source is 99.99 percentreliable. The trust authority will insure for any losses due to the use of any information from this

    source for up to 100K US dollars.

    AuditedThe source's data has been regularly audited and verified for timeliness. The trustauthority regularly verifies that 95 percent of data is less than one week old.

    3.2.2 Flow of trust metadata

    A layout of the flow and availability of trust metadata is crucial to understanding the

    behavior and properties of the system. Mediators notify the trust broker of new sources added to

    its MQS. The trust broker notifies trust authorities of the new source. Trust authorities evaluate

    the source according to their proprietary standards and (if passing) publishes the trust statement

    for the source to the subscribed trust broker. This is done via a push model from trust authorities

    to the trust broker. Mediators with trust extensions retrieve a set of trust authorities from their

    trust broker and makes the trust authorities available to their clients. Application level software

    allows the user to submit queries using a subset of all the available trust authorities and using a

    subset of all available trust types. They also allow the user to submit additional requirements for

    the trust authorities and trust types. The additional requirements allow the user to state the trust

    relationships among trust authorities and trust types.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    21/77

    16

    Figure 3.1 Infrastructure overview diagram

    Solid arrows indicate flow of data. Dashed arrows indicate flow of trust metadata.

    The wide arrow indicates the trust authority evaluates the source.

    3.3 Trust authorities

    We desire a component called trust authority within the MQS with trust extensions to

    validate sources. We assign to trust authorities the responsibility of assigning trust properties to

    sources.

    Trust authorities correspond to well-known entities such as the W3C, the Department of

    Defense, or Microsoft. Each trust authority has a software agent on the Web with a unique URI

    representing the trust authority. When we use the term trust authorities it means both the

    software agent and the real-world entity it represents. Trust authorities are always known to the

    trust broker. Mediators know of trust authorities after the mediator queries the trust broker for

    available trust authorities. Clients know of trust authorities after they query the mediator for

    available trust authorities.

    Trust authorities make assertions about sources with respect to trust by publishing trust

    statements. Trust authorities receive notification of new sources and the available trust types

    from the trust broker. The source notification information is actually a pointer to the wrapper

    corresponding to the source. Hence trust authorities know wrappers. When a trust authority

    Client

    Trust

    Broker Wrapper

    Source

    Mediator

    Trust Authority

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    22/77

    17

    receives notification of a new source from the trust broker, the trust authority evaluates the

    source. For each available trust type, the trust authority determines whether it should publish a

    trust statement using that trust type. A subset of the available trust types are selected. The trust

    statements are then pushed to the trust broker.

    Trust authorities may not always want to certify that a whole source is trusted. Rather,

    they may prefer to specify trust on a finer granularity. Elements in a document provide the ideal

    granularity for association with trust statements. Elements are able to represent an entire object.

    For example, the root element of a software configuration document represents all possible

    configurations of a software. The Properties element and its subelements describe all properties

    of the software. The Licensing element contains all information regarding licensing of the

    software. We can point to arbitrary elements in a document by using XPath [CD99] expressions.

    The semantics are that the trust statement is only valid for the elements designated by the XPath

    expression. The trust statement also applies recursively to any subelements, but does not extend

    to elements referenced in attributes, although that may be potentially extended with further

    research. Hence, XML documents are modeled as trees, not as graphs.

    Trust authorities may publish trust statements for arbitrary elements from the wrapper's

    DTD. However, the elements specified by XPath expressions from different trust authorities may

    be intersecting. Figures 3.2 A and B display two ways XPath expressions normally may specify

    overlapping document regions. We do not need to consider Figure 3.2A in our scenarios because

    we only consider XML document trees and not graphs. For intersections of the type shown in

    Figure 3.2B, the semantics are that the trust statement TS2 for the subtree associated with XPath

    expression 2 co-exist with TS1. Neither TS1 nor TS2 override each other.

    The set of trust statements is allowed to change over time. Trust authorities have

    primitive operations to publish or revoke trust statements. Generally, updating trust statements

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    23/77

    18

    requires a revoke operation followed by publish operation. The operations are pushed to the trust

    broker for execution.

    If two trust statements point to the same element, for the same source and trust type from

    a single trust authority, then the more recently published trust statement replaces the prior trust

    statement. Otherwise, the later trust statement adds, not replaces, to the current set of trust

    statements. Once a trust statement is added to the current set of trust statements, there is no longer

    any notion of chronological ordering of trust statements. Only during the operation of adding a

    trust statement to the current set of trust statements we consider that the yet-to-be-added trust

    statement is more recent and all the trust statements in the current set of trust statements are

    considered less recent.

    Figure 3.2 Overlap of trust statements for wrapper DTD

    Figure A Figure B

    From Section 3.2 we have the structure of the trust statement as . source corresponds to the wrapper URI of an information

    source. Trust types have been discussed in detail in Section 3.2.1. The qualifieris the XPath

    expression selecting a set of elements from the wrappers DTD. It is beneficial for qualifiertobe based on the wrapper DTD and not on the source schema because sources are heterogeneous

    and may have different data models. We avoid the problem of qualifiers for many different data

    models by using a uniform data model exported by wrappers. Complete coverage of the DTD is

    not necessary. Qualifiers may simply specify the elements of the wrappers that are trusted,without covering the whole DTD.

    XPath expression 1XPath expression 2 XPath expression 1 (TS1)

    XPath expression 2 (TS2)

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    24/77

    19

    No negative trust statements are allowed. This includes that trust types are always defined

    in a positive fashion, i.e. there are no trust types that are used for defamatory or distrustful

    assertions. The presence or absence of trust statements is the only representation of trust. When a

    trust statement exists for a source, it means the associated trust authority trusts that source. When

    a trust statement does not exist for a source, it means the trust authority does not know or does not

    trust that source, but either way the source is not trusted. Conflicting statements are not possible.

    Two trust statements with the same source and qualifier may be duplicates or else uses different

    trust types. Since no negative trust statements are allowed no two trust statements will conflict

    each other.

    Our semantics are flexible because trust authorities may publish and rescind trust

    statements to fit their trust assertions. Future research may add the benefit of timestamps and

    expiration to explore the different semantics allowed by the chronological ordering of trust

    statements.

    3.4 Trust broker

    While the mediator is a well understood component for the retrieval and integration of

    diverse data, the trust broker will assist the mediator in selectively deciding how data from

    various sources will be retrieved and relatively ordered in the presentation to the client, based on

    client specifications. The trust broker is a logically centralized component, separated from the

    mediator, dealing only with trust metadata.

    The goal of the trust broker is to provide the mediator with the most up-to-date, accurate

    and complete information service regarding the trust metadata associated with sources. The

    responsibilities of the trust broker, in pursuit of those goals, are (1) to provide value-added client-

    parameterized processing on trust metadata with respect to sources provided by the mediator and

    (2) to obtain the most up-to-date, accurate and complete trust metadata regarding sources.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    25/77

    20

    3.4.1 Trust broker schema

    The trust broker's schema is very structured and rigid. Object-oriented and XML schemas

    are unnecessary. Thus, we choose to use the simplest data model for the trust brokers schema,

    the relational model. In this section we present the structure of the trust broker schema. The

    schema may be specified as a set of relations. The trust broker schema is application-independent

    and is therefore constant across all applications of this framework.

    The trust broker is the manager of trust types. It records new trust types, and deletes old

    ones. Thus trust broker has knowledge of all the trust types in the MQS with trust extensions.

    The trust types are communicated from the trust broker to mediators or trust authorities as

    necessary. The structure required to hold information about trust types is AvailableTY (TY, TY-

    URI).

    The trust broker is the point of contact for trust authorities. New trust authorities are

    added to the MQS when the trust broker provides a handle to the trust authorities that allows the

    trust authorities to push trust statements to the trust broker. The list of trust authorities is

    provided when requested by mediators. The structure to record information about trust

    authorities is AvailableTA (TA, TA-URI).

    Lastly, the trust broker stores trust statements. Trust statements are pushed to the trust

    broker from trust authorities. The trust broker signs the trust statements before storing them. The

    trust statements are processed and sent to mediators when requested by mediators. Add and

    delete operations on trust statements are supported. Trust statements are valid until their

    requested revocation by the authoring trust authority. Revocation of a trust statement results in its

    deletion.

    Without the prerequisite trust types and trust authorities, there are no trust statements.

    Therefore certainly trust types and trust authorities are less dynamic than trust statements. In a

    closed environment where the number of trust authorities is limited, there may be limited or no

    dynamism in trust authority participation. That is, no new trust authorities join the MQS and no

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    26/77

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    27/77

    22

    return receives new trust statements from trust authorities concerning source. These trust

    statements are added to TAstatement. The trust broker may receive zero or more trust

    statements from zero to all trust authorities as a result of pushing to trust authorities. There is no upper limit to how many trust statements the trust broker

    may receive for each pushed out since trust authorities are

    allowed to send more than one trust statement for the same source. Multiple trust statements

    from the same trust authority for the same source may simply have variations in the trust type or

    qualifier.

    The requests the trust broker receives from trust authorities may be either to add a trust

    statement or to remove a trust statement from the TAstatementrelation. On the delete request,

    wildcards are allowed in place of any of the trust statement attributes except the TA attribute.

    Trust authorities may only request trust statements to be deleted if they were the original author

    of the trust statement. The template for the SQL-like query the trust broker executes to add trust

    statements is:

    insert into TAstatement (source, TA, TY, qualifier) values (,, , )

    An example SQL-like query the trust broker executes to remove trust statements is:

    delete from TAstatement where TY = and TA =

    The trust broker services requests from mediators for trust metadata. Trust metadata are

    trust statements processed for use by mediators. Through the trust broker's interface to mediators,

    the trust broker services the following requests:

    1 What are the available trust types for which there exists trust statements?

    2 For a given trust type, what are the available trust authorities that have issued

    trust statements using the trust type?

    3 Given a particular trust type and a source, what trust statements apply to the

    source, if any?

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    28/77

    23

    4 Given a trust requirement, if more than one trust statement applies to the

    source, which trust statement or statements satisfy the trust requirement?

    5 If two sources provide conflicting data then given the available trust

    statements, which source's data should be chosen?

    The trust broker does not need to add special functionality to answer question number 1

    and question number 2. All the data necessary to answer questions 1 and 2 are in relation

    TAstatement. The mediator must simply formulate an appropriate query against TAstatement

    to answer questions 1 and 2. In Section 3.5, we will provide details of these query formulations

    which will be presented as integration mapping rules from the mediator to the trust broker.

    Through the trust broker's interface for mediators, the mediator may submit data

    integration problems dealing with trust similar to the problems posed by questions 3, 4 and 5.

    However, the answers to questions 3, 4, and 5 requires algorithmic processing by the trust broker.

    Processing the results to those questions will require additional input considerations from the

    client and therefore will be discussed in Chapters 4 and 5.

    3.5 Mediator

    The mediator has schema extensions for trust metadata. Like a normal mediator schema,

    the trust schema is also not materialized. Instead, rules provide a mapping from the mediator's

    trust metadata schema directly to the trust broker's schema. We express our rules in an SQL-like

    language. The mediator supports two virtual relations in its trust metadata schema. The

    structure, integration rules, and semantics of the virtual relations are as follows:

    Stucture: MAvailableTY(TY, TY-URI)Integration rule: select TY, TY-URI from AvailableTY

    Semantics: MAvailableTY simply provides a set of all the trust types available to theclient for querying. AvailableTY is the relation from the trust broker. A

    sample query against MAvailableTY from a client would be "select TYfrom MAvailableTY"

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    29/77

    24

    Stucture: MAvailableTA (TA, TA-URI)

    Integration rule: select TA, TA-URI from AvailableTA

    Semantics: MAvailableTA provides a set of all trust authorities. A sample queryagainst MAvailableTA from the client would be "select TA fromMAvailableTA

    Since the mediator only uses rules to map from its trust metadata schema to the trust

    broker schema, the mediator does not materialize any trust metadata on its site (except

    temporarily during query processing). Thus, there is no maintenance of trust metadata required at

    the mediator. The mediator also uses the trust brokers services to assist in data integration.

    Chapter 5 will be devoted to showing how mediators use trust metadata from the trust broker.

    3.6 Client

    The abstract notion of a client includes any individual or organization that requests the

    services of the mediator. When we specifically refer to an individual we will use the term end-

    user client, otherwise we make no distinction between the individual or organization and simply

    refer to any entity that uses the services of the mediator as a client.

    In the fully closed trusted world, one need not be concerned about the qualifications of

    sources providing data. But in the open, dynamic web, clients of information systems need

    reassurance that answers to queries can be trusted. In our framework, clients specify a trust

    requirement T plus a query to the MQS. Clients know how to query the DTD of the mediator, but

    at what level of granularity are the trust requirements? Should there be one or many trust

    requirements?

    In some applications it may be desirable to specify different trust requirements for

    different portions of the same DTD. For example, the section of a DTD that deals with software

    licensing issues may have a different trust requirement then the rest of the DTD. Therefore, we

    allow for associating trust requirements with particular substructures of the DTD.

    Just as trust authorities specify trust statements at the granularity of elements, clients can

    also specify trust requirements at the granularity of elements. XPath expressions can again be

    used to point to a set of elements. Overall, clients should be able to specify multiple trust

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    30/77

    25

    requirements for the same query. The question now is: what constraints are there on the trust

    requirements with the XPath expressions?

    In general, the subdocuments specified by XPath expressions may be intersecting. Figure

    3.3A and Figure 3.3B display two ways XPath expressions normally may specify overlapping

    document regions. Similar to the case for trust statements, we do not need to consider Figure

    3.3A in our scenarios because we only consider XML document trees and not graphs. For

    intersections of the type shown in Figure 3.3B, the semantics are that the trust requirement T 2 for

    the subtree associated with XPath expression 2 overrides T1. T1 only applies recursively down

    the XML tree until another trust requirement replaces T1. This semantics effectively partitions the

    XML document tree into disjoint sections according to the trust requirements

    To specify that T2 overrides T1 is the most flexible semantics. Any other semantics that

    involves meshing T1 with T2 would subject T2 to unnecessary constraints, e.g. T2 and T1 must be

    mutually consistent. Our semantics allows the client the flexibility to specify T2 in anyway

    desired, regardless of T1. We make no limiting assumptions and therefore clients may restrict or

    free T2 at will.

    Figure 3.3 Overlap of specifiers for mediator DTD

    Figure A Figure B

    3.7 Individual component knowledge

    We present the knowledge that components have of other components. Components

    include mediators, wrappers, sources, clients, trust authorities, and trust brokers. We focus on the

    knowledge of properties of components. These properties include trustworthiness, schemas,

    XPath expression 1XPath expression 2 XPath expression 1 (T1)

    XPath expression 2 (T2)

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    31/77

    26

    interfaces, etc. Although trustworthiness is more abstract than concrete, it is still useful to know

    that, e.g., clients attribute a measure of trustworthiness to trust authorities. The knowledge

    discussed here may be acquired during design-time or after deployment of the MQS

    infrastructure.

    In keeping with the original MQS framework, sources know nothing about the MQS that

    they participate in. They also know nothing of trust nor the trust extensions to the MQS

    framework. Wrappers know only of the source they cover. They understand how to access a

    source but do not know the rest of the MQS, trust, nor of the trust extensions to the MQS.

    Mediators know of trust and use trust metadata to assist in data integration. Mediators access

    sources through wrappers so mediators know the wrapper's DTD. Mediators also need to access

    trust metadata so they know the trust broker's schema and interface. Clients are aware of trust.

    They have their own personal trust level for trust authorities. They are also aware of the data

    schema and schema extensions for trust of mediators. Clients do know that their own personal

    preferences for trusting trust authorities will affect the query outcome from mediators.

    The trust broker knows of trust authorities since the trust broker is the trust authority's

    point of contact. The trust broker has no knowledge of mediators, the trust broker simply

    provides its services to any entity that requests it.

    Trust authorities know of trust but are oblivious to its uses for data integration. Trust

    authorities do not know of mediators, but they do know the interface (not the schema) of the trust

    broker. Trust authorities also know the trustworthiness of sources based on their own standard of

    trust (e.g. quality) trust authorities also know the wrapper DTD associated with sources since they

    must specify qualifiers in terms of the DTD.

    Figure 5 presents a diagram of the knowledge each component in our architecture has of

    other components. An arrow from component x to component y indicates component x needs and

    has knowledge of the component properties listed next to the arrow.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    32/77

    27

    Figure 3.4 Properties of components known to other components

    Non-italicized properties indicate knowledge acquired before deployment. Italicized properties

    are discovered by the individual components after the infrastructure has been deployed.

    Client

    Trust Broker

    Trust broker schema

    Trust broker interface

    Mediator DTD

    Trust Authorities

    Trust types

    Wrapper

    Source

    Source schema,

    Data model,

    & Query language

    DTD

    Mediator

    Trust

    Authority

    Trust authority interface

    Trust authority trustworthiness

    Trust broker interface

    Source qualification

    DTD

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    33/77

    28

    4 Formulation of trust in queries

    Chapter 4 details the conceptual model for trust requirements and the language-based

    representation of the model. In Section 4.1, we give an overview of and motivate the conceptual

    model. Section 4.2 explains the model and its semantics. Section 4.3 details the query language

    extensions that allow for expressinginstances of the model. Finally, Section 4.4 lists pragmaticissues that affect an implementation of the query language extensions.

    4.1 Overview

    Mediated query systems allow for clients to submit queries against mediators. The

    execution of a query produces data that comprise the query result. We seek to allow that clients

    may specify required properties of query result data. The properties are based on notions of trust

    for data. Client specification of trust properties for data requires formulation of conditions on

    trust. Our goal is to allow clients to express their trust requirements within queries against

    mediators and have the MQS deliver data that is either all trusted or partially trusted as

    determined by the trust requirement.

    The trust requirement formulated by clients may apply to a portion or all of the query

    result. An issue is the level of granularity clients should be allowed to specify via trust

    requirements. If the mediator supports an XML schema or DTD, should trust be applied to

    document substructures or only complete documents? We seek to motivate and determine an

    appropriate level of granularity in the following sections.

    In particular, we seek to provide a model for clients to abstract their trust requirements.

    This conceptual model should allow for clients to specify at a transparent level what they trustand whom they trust more and whom they trust less. At the same time, clients should not have to

    know anything about sources. Clients with knowledge of the conceptual model may solidify an

    instance of the model that represents their trust requirements with respect to applications at the

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    34/77

    29

    client side. The model instance is interpreted by MQS components in interaction with the trust

    broker in order to construct appropriately trusted query results.

    In order to communicate the trust requirement, clients need to use a language. Since

    clients utilize query languages it would be natural to integrate an extension into a query language.

    The extension would allow for declaring the trust requirements. When the mediator receives the

    query plus trust requirement extensions, the mediator may simply parse, split and pass on the trust

    requirements to the trust broker.

    4.2 Conceptual model

    Since trust types and trust authorities are factors that determine trust, trust requirements

    should specify the effect on trust of each such factor. In fact, the most basic domains in our

    conceptual model are the set of all trust authorities (TA) and the set of all trust types (TY).

    Clients query the mediator's extended schema for trust to get the list of TA and TY. This has

    been detailed in Section 3.5. These two sets provide the basis for a higher abstraction,

    pairs. pairs, in turn, are one of the basic components in the trust preference. We give

    the definition of trust preference as follows:

    Definition 4.1 (Trust Preference) Given a set of trust authorities (TA), and a set of trust

    types (TY), a trust preference is a partial ordering of . pairs selected to

    participate in the partial ordering are trusted. The relation indicates that one pair

    is trusted more than the other. A sequence of pairs connected by is a trust

    expression. Trust expressions can be connected together by ; (AND) to form a trust preference.

    The partial ordering corresponds to a set of disconnected graphs (Hasse Diagrams), and

    the consistency of the partial ordering must be maintained by having no cycles. Each graph in the

    set of disconnected graphs corresponds to a trust expression. The nodes of the graph correspond

    to pairs and a directed arrow in the graph represents that one pair is trusted

    more than the other.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    35/77

    30

    We introduce the usage of trust preferences by the examples here. With the exception of

    the most primitive trust preference (our first example below), there are many ways of expressing

    semantically equivalent trust preferences. The most succinct representation is presented first,

    along with its meaning, then less succinct representations are given. In the following examples,

    we assume the set of all trust authorities is {x, y, z} and the set of all trust types is {a, b}. First,

    we demonstrate simple preferences and then work our way to more difficult ones.

    Example 1 This trust preference pair means that sources qualified by trust authority x for

    trust type a are trusted. There is no other representation for this trust preference.

    Example 2 This pair indicates that sources qualified by trust authority x for all trust types

    are trusted. An alternate expression with the same meaning is ; That is, the * is

    expanded to include all trust types. The ; symbol means that and are trusted.

    Hence sources qualified by either or are trusted.

    Example 3 Sources qualified by any trust authority for trust type a are trusted. An

    alternate expression is ; ; .

    Example 4 We expand by taking the cross product of TA and TY. Hence, sources

    that are qualified by any TA for any TY are trusted.

    Example 5 Sources qualified by either trust authorities x or y for trust type a

    are trusted. Furthermore, the operator indicates that is trusted more than .

    This implicitly imposes a partial ordering on sources qualified by either x or y for a. Effectively,

    sources qualified by x for a are trusted more than sources qualified by y for a.

    Example 6 Sources qualified by any trust authority for trust type a are trusted.

    Furthermore, with regards to trust type a, a partial ordering is imposed on trust authorities where

    all trust authorities are trusted more than trust authority y. The ordering interpretation is more

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    36/77

    31

    easily understood when we expand the example by substituting all available trust authorities into

    the wild-card "*":

    ; ; ;

    The implied trust preference is simply a reflexive expression, which

    we eliminate. The result is that all trust authorities (except for y) for trust type a are trusted more

    than y for trust type a. Implicitly, all sources qualified by any trust authorities (except for y) for

    trust type a are trusted more than sources qualified by y for trust type a. The binary operator ;

    corresponds to logical AND semantics for two operand expressions, and so no order is implied

    for the two operands.

    For the similar example , the same reasoning of expanding "*" to

    include all trust authorities means sources qualified by x for a are trusted more than sources

    qualified by any other trust authority for trust type a. Both types of sources are trusted when

    qualified either way.

    Example 7 Sources qualified by trust authority x for any trust type or qualified

    by trust authority y for trust type a are trusted. Furthermore, sources qualified by x for any trust

    type are trusted more than sources qualified by y for a.

    Similarly, means sources qualified by x for a are trusted more than

    sources qualified by y for any trust type. Of course, both types of sources are trusted as long as

    they are qualified by x for a or y for any trust type.

    Example 8 In this case, expanding involves taking the cross product of

    TA and TY. Expanding we get:

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    37/77

    32

    After elimination , the expansion of the example leaves us to five

    orderings of pairs, and the meaning of each one of those orderings is easy to

    understand as simple variations of Example 4. Also, just the mere existence of the pair in

    the expression indicates that sources qualified by any pair are trusted.

    Example 9 This example is very similar to the previous Example 8. However, in

    this case, sources qualified by are more trusted than sources qualified by any other pair. The indicates that all sources qualified by any pair are trusted.

    The expansion of gives us six expressions. One of the expressions is reflexive so

    we eliminate it. The other five expressions indicate that is trusted more than each one

    of, , , , and .

    Example 10 Sources qualified by or or are

    trusted. Furthermore, sources qualified by x for a are trusted more than sources qualified by y for

    any trust type. Sources qualified by y for any trust type are trusted more than sources qualified by

    z for b. By the property of transivity, sources qualified by are also more trusted than

    sources qualified by .

    Example 11 ; . First we expand :

    ; ;

    Our final trust preference is then

    ; ; ; .

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    38/77

    33

    The first four expressions in the trust preference are simply variations of Example 5.

    Each of their individual meanings is clear. The last expression is also clear from Example 1. The

    meaning of the entire expanded trust preference is that sources qualified by trust authority x for

    trust type a are trusted more than sources qualified by any other trust authority for trust type b.

    All sources qualified by , , , or are trusted. Although

    sources qualified by z for a are trusted, they are not ordered with the other trusted sources (unless

    the sources happen to be qualified by , , or also, which places the

    sources in an ordering).

    Example 12 ; This example is almost exactly like Example 11.

    Expansion of is covered in Example 11. We also expand to obtain:

    ;

    is redundant in the trust preference since the expansion of

    already includes that is trusted. ( is an expression expanded from

    ). Therefore, after expanding , we have a trust preference equal to Example 11.

    Example 13 This example is semantically illegal, because we can expand the

    example to include ; which is an inconsistency. As a matter

    of fact, any expression (with the wild-card "*") that can expand to new expressions must be given

    careful consideration, since inconsistencies may be formed inadvertently.

    Trust annotations

    Up until this point, each and every one of the examples describe conditions on data, and

    implicitly on sources that possibly provide data. However, in many circumstances, the query

    result may be incomplete if data must always satisfy the trust requirement. Thus, it may not

    always be desirable to leave out data that does not satisfy the trust requirement. When we need to

    include such data, we need to let the client know the data is "untrusted".

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    39/77

    34

    In consideration of data that does not satisfy the trust requirement, and to provide an

    added advantage to clients, we propose to annotate result objects. Clients like to know what are

    the trust properties of objects that are part of the query result. Result objects include

    substructures of documents and even entire documents. Annotating query results provides clients

    feedback on how the result objects are trusted, including that the objects are untrusted. We give

    the definition of trust annotation below:

    Definition 4.2 (Trust annotation)Given an XML element E, a trust annotation is an attribute-

    value pair for the XML element. The attribute label is trust and the value is a set of

    pairs. The set of pairs qualified the source using a qualifier which designates an

    element that is E or an ancestor of the XML element E, according to the DTD of the wrapper that

    the XML element E is obtained from. The pairs in the set of pairs are

    separated by the symbol ;. The symbol ; indicates that the two adjacent pairs are

    not ordered, but both qualify the same source the element is obtained from, and both have

    qualifiers which designate E or an ancestor of E.

    XML constructs provide more than one opportunity to place annotations. For example,

    we may place annotations as comments, processing instructions, CDATA, attributes of element

    tags, or element tags themselves. We annotate every element object, and only attributes are

    associated with element tags. Therefore, attributes are the best method to represent annotation

    information. For example, . Here, SoftwareVersion is

    the element tag, and trust is the attribute that contains the annotation.

    We annotate an object using the pairs that are closest to the top of the trust

    preference expressed by the client. The imprecise notion of being closest to the top of the trust

    preference is made precise by defining the following three functions and a rule. Let s be a

    variable representing a source and TP be a variable representing a trust preference. Let x, y be

    variables representing trust authorities and a, b be variables representing trust types. Let f, g be

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    40/77

    35

    variables representing qualifiers and E be a variable representing an element from source s. Let

    DE be the DTD for the wrapper schema containing element E.

    We define functions ancestor-self (E, f, DE), bucket(s) and LMax as follows:

    ancestor-self (E, f, DE) means that f is a qualifier designating an element

    that is E or an ancestor of E in the XML tree that corresponds to the DTD

    DE.

    bucket(s) = { | trust statement has been published}

    LMax(bucket(s),TP, E) = { bucket(s) ancestor-self (E,

    f, DE) bucket(s) ancestor-self (E, g, DE) : ( )

    TP}

    We give a natural language explanation ofLMaxto clarify its meaning. Given a bucket

    for a source s, a trust preference TP and an element E, LMaxreturns the set of pairs

    that 1) are mentioned in the trust preference and 2) have trust statements issued by the TA trust

    authorities for the TY trust type where 3) the source s is qualified by the qualifier associated with

    the trust statement and the qualifier references element E or an ancestor of E and 4) according to

    the trust preference, no other pair that also satisfies the requirements 1, 2, and 3 just

    listed are trusted more than the pairs return by LMax.

    Finally, the precise rule for annotating an XML element is:

    Annotate element E with { s, E source sk

    LMax(bucket(sk, TP, DE)) }

    In other words, the set of pairs used for annotating the element is the very

    reason for the elements inclusion in the query result. We delimit (without order) multiple pairs in

    the annotation by the symbol ;.

    End-users normally do not directly view XML documents. Rather, the application

    software offers a user-friendly presentation interface. Thus, we may annotate all elements and let

    the application software interpret or filter the annotations. At the system level, we are not

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    41/77

    36

    concerned with the possibility of too many annotations. For this reason, we will annotate each

    and every query result.

    Properties of trusted and untrusted data

    In some security applications, clients may also want the option to either retrieve only

    trusted data, or to retrieve trusted and untrusted data. For example, the military only wants

    software configurations from trusted sources, but in contrast, a user at home may only care to get

    all possible information on configurations (subject to the users trust preferences).

    Here we discuss the properties of query results that contain only trusted data and query

    results that contain a mix of trusted and untrusted data. When only trusted data is requested, the

    data must satisfy certain properties. Given that the client has specified some kind of a trust

    preference, the properties are as follows:

    1. The data must come from only those sources that are qualified by some pair in thetrust preference, and the qualifier must specifically designate the element (or its ancestor) that

    the data is from.

    2. If two sources provide conflicting data, then if both sources are qualified by different pairs for their respective data, and the pairs are ordered in the trust

    preference, then the data must be selected from the source that is qualified by the one pair that is trusted more than the other pair. This required property of data

    enables the use of trust preferences in conflict resolution. Although we do not resolve all data

    conflicts, at least this may assist in some cases.

    When untrusted data is also added to the query result, the trusted portion of the query

    result must satisfy the same conditions that query results containing only trusted data must

    satisfy. The untrusted data must satisfy the following properties:

    1. The untrusted data is not available from a source qualified by any of the pairs inthe trust preference or the qualifier did not designate the element containing the untrusted

    data or its ancestors. Only the other sources have the data.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    42/77

    37

    2. The untrusted data must be annotated trust= untrusted.In Section 4.3 we will detail query language constructs that allows for specifying whether

    only trusted data is desired or a mix of trusted and untrusted data is desired.

    4.3 Query language extensions

    Here we focus on embedding the model in a query language. We assume the client uses

    some query language for XML data to query the mediator's schema, which is represented in form

    of a DTD. The DTD is just a schema for data, and has nothing regarding trust or trust

    requirements. The client gathers the prerequisite information to formulate trust requirements by

    querying the mediator's schema extensions, which has already been discussed in Section 3.5. We

    assume the query language supports a condition clause that specifies the pattern of the data to

    select. XML-QL [DFF98] is an example of such a query language.

    In the case of query languages for the XML data model, in general user queries to

    mediators take the prototypical form

    where

    construct

    pattern may contain variables which bind to attribute or text values. The attribute or text

    values may belong to some objects or elements. The variables may then be used in template in

    order to construct new data (perhaps according to some schema). To the prototypical query

    language for XML data we add a constraint for trust.

    Our modifications to the query language introduces new keywords related to specifying

    trust requirements. Each combination of keywords has its own semantics and influences the

    integration of data using trust metadata.

    In order to include trust requirements, the minimum keyword that must be added to a

    query is trust . Thus, the following is a query that expresses some trust criteria on the

    data:

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    43/77

    38

    where

    construct

    [trust ]

    The optional keyword trust indicates that the specified trust requirements should be

    respected in integrating the query result. The parameter is a language-based

    representation of multiple trust preference. The language used to specify criteria is given as a

    context-free BNF grammar:

    criteria :- condition

    specifierfor conditioncriteriaand specifierfor condition

    condition :- [ONLY] [OPTPES] statement

    statement :- clause statement ; clause

    clause :- pair clause > pair

    pair :- (TA,TY)

    specifier :- pointer to Element

    specifier is an XPath expression that points to a set of XML elements in the DTD. TA

    and TY are names corresponding to trust authorities and trust types. pair simply corresponds to

    a pair in the conceptual model. clause allows for specifying the partial ordering of

    pairs. The > symbol is essentially the operator in the conceptual model. statement

    allows for multiple clauses. Each clause is separated by the ";" delimiter.

    The following optional keywords provide some additional modifications for utilizing

    trust metadata. Adding the optional keyword only indicates that only trusted data are to be

    integrated into the query result. Thus, any data returned will be from a source qualified by pair listed in statement. Omitting only means both trusted and untrusted data will be

    included in the query result.

    Often, there is a need to combine data from two or more sources into a single object. For

    example, XML elements from two or more sources taken to form a single XML document is a

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    44/77

    39

    merge operation. When the object is constructed from two or more sources, it is not immediately

    obvious how the new, composed object should be annotated. Should we randomly choose one of

    the sources, then annotate the object as if it is from that source? Or, should we perform some

    computation on the respective sources' trust metadata to derive some annotation?

    In addressing the issues of annotating merged objects, we want to provide some

    flexibility to accommodate applications where the importance of trust may be either critical or

    simply informational. Hence, we do not pre-declare rigid rules for annotating merged objects.

    Instead, we allow the client to provide some input into the annotating process. We include the

    option for controlling annotations into the query language.

    The optional use of either of the keywords OPT or PES correspond to, respectively, an

    optimistic (function OPTAnnotate) or pessimistic (function PESAnnotate) annotating of merged

    objects. OPT specifies to annotate optimistically so that data merged from multiple sources is

    annotated the same as the most trusted of the unmerged data. PES specifies to annotate

    pessimistically so that data merged from multiple sources is annotated the same as the least

    trusted of unmerged data. If neither is specified then PES is assumed by default. Below, we give

    the exact semantics of the two keywords in terms of set-oriented operations.

    First we must review some functions already defined in Section 4.2.

    ancestor-self (E, f, DE) means that f is a qualifier designating an elementthat is E or an ancestor of E in the XML tree that corresponds to the DTD

    DE.

    buckets(s) = { | trust statement has been published} LMax(bucket(s),TP, E) = { bucket(s) ancestor-self (E,

    f, DE) bucket(s) ancestor-self (E, g, DE) : ( )

    TP}

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    45/77

    40

    We may now define OPT annotating (OPTAnnotate) and the PES annotating

    (PESAnnotate). Let s1 and s2 be variables representing sources.

    OPTAnnotate(s1,s2) = LMax{bucket(s1) bucket(s2)}

    PESAnnotate(s1,s2) = LMax{bucket(s1) bucket(s2)}

    If the result of an OPTAnnotate or PESAnnotate operation is the empty set, then we

    annotate the merged element as being untrusted.

    The following examples illustrate expressing trust requirements using the grammar. Just

    like the conceptual model examples in Section 4.2, we assume the set of all trust authorities is {x,

    y, z} and the set of all trust types is {a, b}. We do not show the where

    construct portion of the query because it is irrelevant to the trust requirement

    portion. Therefore we only show the trust requirement portion of the example queries. The trust

    preference in some of these examples is selected from the conceptual model examples shown

    earlier in Section 4.2.

    Example 14

    trust (x, a)

    This example corresponds to the formulation in the conceptual model. The trust

    requirement applies to the entire result document, including all its elements. The meaning of is that sources qualified by trust authority x for trust type a are trusted. According to this

    trust requirement, data from such sources are trusted. Since only is left out of the condition,

    untrusted data are annotated trust= untrusted and integrated into the query result also. Any

    necessary annotating of data merged from multiple sources will be done pessimistically by

    default.

  • 8/7/2019 2001 - HOWARD HOW LEUNG LOUIE, A Framework for Trust Management in Mediated Query Systems

    46/77

    41

    Example 15

    trust only (*, a)

    The meaning of (*, a) corresponds to the meaning of , which is that sources

    qualified by any trust authority for trust type a are trusted. Additionally, only data from trusted

    sources will be integrated into the query result. Data from any other source will not be in the

    query result and pessimistic annotating is the default.

    Example 16

    trust OPT (x, *)

    The trust preference for this example comes from Example 2. The meaning is that

    sources qualified by trust authority x for all trust types are trusted. OPT specifies to use optimistic

    annotations. Omitting only means that both trusted and untrusted are in the query result.

    Example 17

    trust only PES (*, *)

    The trust preference for this example comes from Example 4. The (*, *) indicates that

    sources by any trust authority for any trust type are trusted. only indicates only trusted data are in

    the query result, and PES indicates pessimistic annotating of data merged