CoBase: Scalable and Extensible Cooperative Information System

Preview:

DESCRIPTION

CoBase: Scalable and Extensible Cooperative Information System. Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu. Conventional Query Answering. Need to know the detailed database schema Cannot get approximate answers - PowerPoint PPT Presentation

Citation preview

1

CoBase: Scalable and Extensible Cooperative Information System

Wesley W. ChuComputer Science Department

University of California, Los Angeles

http://www.cobase.cs.ucla.edu

2

Conventional Query Answering

Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries

Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries

3

Find a seaport with railway facility in Los Angeles

CoBase ServersHeterogeneousInformation Sources

CoBase provides: Relaxation Approximation Association Explanation

Find a nearby friendly airport that can land F-15

Domain Knowledge

Find hospitals with facility similar to St. John’s near LAX

Cooperative Queries

4

Generalization and Specialization

More Conceptual Query

Specific Query

Conceptual Query Conceptual Query

Specific Query

Generalization

SpecializationGeneralization

Specialization

5

Type Abstraction Hierarchy (TAH)

Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes

Large_SizeSmall_Size

Very_Small

Small_to_Medium

Large_to_Extra_Large

Very_Large

XL XXLLMSXXSXXXS

Provide multi-level knowledge representations

6

Type Abstraction Hierarchy (TAH)

CA

N. CAS. CA C. CA

SanJose

PaloAltoSacramento

DavisSanDiego

LongBeach

LA SF

(Location Example)

7

Relaxation Agent

query conditionsconstraints

Use knowledge-based approach (generalization

and specialization via Type Abstraction Hierarchy)

to relax the followings for matching:

8

Query Relaxation

Yes

Query

Display

AnswersRelaxAttribute Database

No

QueryModificationTAHs

9

10

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

11

12

Relaxation Control Primitives

not-relaxable runway-length

relaxation-order (runway length,

location)

preference-listunacceptable-listanswer-sizerelaxation-level

13

Relaxation Primitives

^ (approximate) ^ 9 am

betweennear-to (context-sensitive) Airport near-to

LAX Restaurant near-to

UCLA

similar-to Airport similar-

to LAX base-on (traffic,runway)

within

14

Similar-to

Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.

select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’

based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd

15

Similar-to Result

APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27

Similar-to module ranks the returned answersaccording to mean-squared error.

16

Unacceptable List Operator

NETunisia

CentralTunisia

NWTunisia

SWTunisia

Tunisia

Bizerte El Borma...

CentralTunisia

SWTunisia

Tunisia

Gafsa El Borma

Type Abstraction Hierarchy Trimmed TAH

Avoid Northern Tunisia!

CoBaseRelaxationManager

Constraint

Gafsa

17

TAH Generation for Numerical Attribute Values

Relaxation Error Difference between the exact value and the

returned approximate value The expected error is weighted by the

probability of occurrence of each value

DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

18

TAH Generation for Non-numerical Attribute Values

Pattern Based Knowledge Induction (PBKI)

Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)

19

Type Abstraction Hierarchy (TAH)

Location Name Runway Length

All

Short Medium Long

0 ... 700 700 ... 1K 1K ... 5K

Tunisia

NE Tunisia

Bizerte

Tunis

Djedeida

CentralTunisia

SW Tunisia

El Borma

...

Provide multi-level knowledge representations

20

Associative Query Answering

Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500

and approximately 10000 feet

Airport Name Runway Length (feet)Jerba 10171

Monastir 9700Tunis 10500

Weather Runway QualitySunny GoodRain Good

Foggy Damaged

Military or Civilian Flag

Refrigerated Storage Capacity (Tons)

CC 0.00C 1000.00

Query Answers

Associated Attributes and Answers Associated Attributes and Answers

User Type = Pilot User Type = Planner

21

CoBase and GLADIntegration

22

CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton

Provide conceptual query answering Find “Earth Moving” Equipment

Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)

Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)

23

Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation

Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)

24

CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies

Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to

their distance from the selected location

25

CoBase and GLAD TIE

ReportCollection

Report QueryConstructor

Filter

Editor

ObjectCache

DisplayGenerator

QueryCollection

GLAD

CoBase QueryEditor

CoBaseRelaxationManager

KnowledgeBase

DataCacheCoBase

Data Source

Manager

Databases

NSNs

SpatialArea

Selection

26

GLAD Query

Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'

and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

27

CoGLAD Query with Relaxation Control Operators

Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston

28

CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.

select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'

based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))

at-least 4

* '0000IB0000961' is an answer from the previous query

29

CoGLAD Querywith Approximate Operator

Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.

select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

on_hand_quantity = ~150

30

Adding Constraints to a Query

GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’

Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’ and on_hand_quantity = ~150

andsize_in_square_feet = 350

31

Example of Spatial Relaxation

NSNsselected an area on the mapconstraint: quantity on hand

CoBaseRelaxationManager

satisfyconstraints

Yes

No

return the answers

QueryProcessing

relax the selected areabased on the context-sensitive TAHs

32

Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)

not-relaxable: price

at-least: value: size of the tarpaulin quantity on hand: relax until enough

quantity on hand (specified by the user) is obtained

33

Scalable and Extensible CoBase Architecture

34

Mediator Inter-Communications via KQML

ModuleObjects

APIs

Content LanguageDataActions

CoBaseOntology

Mediator A

Module A

CoBase Ontology

CoBase Content Language

KQML

Mediator B

Module B

CoBase Ontology

CoBase Content Language

KQML

35

36

Query Answers Without CoBase

Query: find chemical suits

37

38

39

40

41

42

43

Electronic Warfare

Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters

Determine platform sites near the line of the bearing of an emitter

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

44

Performance Improvement by Using CoBase in EW

Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2

identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%

Conventional DB: parameter ranges from emitter specificationsCoBase:

DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters

TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

45

Current CoBase Users and Applications

ARPI members ISI Unisys

Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation

UCLA KMeD Project Medical School

Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and contents explanation of approximate matching quality

Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views

Lockheed/Martin Marietta

Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms

BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation

46

47

XML Query Relaxation

51

XML Overview

XML (eXtensible Markup Language) is a format for specifying structured documents and data. XML is extensible since it allows users to define their own schema (unlike HTML which is a pre-defined markup language).

52

XML (cont.)XML is a hierarchical data model.A XML document consists of two parts

1. Schema2. Data

The schema describes the structure of the data.Example:

<?xml version="1.0" encoding="ISO-8859-1"?><!-- Edited with XML Spy v4.2 --><!DOCTYPE note [

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

]><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

Schema

Data

53

XML Query Languages

XML can be represented as an ordered tree with: Nodes representing elements and attributes Edges representing inclusion relationships

An XML query can similarly be represented as a tree with edges of two types: “/” for parent-child relationships “//” for ancestor-descendent relationships

54

XML Query Language: Example

The following XML<a>

<d><b/>

</d></c>

</a>Yields the following tree:

1,a2,d 4,c

3,bA possible query is:

$1=a

$2=b $3=c

55

Query Relaxation

XML Query Relaxation can be categorized into two main types:

1. Value Relaxation: values are relaxed to expand the scope values are allowed to take

2. Structure Relaxation: the structure of the query tree is relaxed to allow for more answers

57

Structure Relaxation

In structure relaxation nodes and/or edges of the query tree can be relaxed to allow for more answers.There are three types of structural relaxation:

1. Edge Relaxation2. Node Relaxation3. Order Relaxation

58

Edge Relaxation

A parent-child edge can be relaxed to a ancestor-descendent edge.

For example:

1,a

2,b 4,b 7,b 9,d 12,d

3,d 5,d 8,c 10,b 13,b

6,c 11,c 14,d

15,c

Original query “a/b/c” 1,7,8

Relaxed queries: “a//b/c” 1,7,8 & 1,10,11 “a/b//c” 1,7,8 & 1,4,6 “a//b//c” 1,7,8; 1,10,11; 1,4,6; 1,13,15

59

Node Relaxation

Nodes can be relaxed in several ways: A node can be relabeled with a similar tag name

based on the domain knowledge. For example: article/sec article/section

A node can be replaced with a “don’t care” such that it will match any non-null answer.

For example: /a/b/c a/_ /c A node can be removed while ensuring the

“superset” property. For example: a/b/c a/b

60

Order Relaxation

The order in an XML query can be relaxed to allow any ordering of search conditions.For example:

$1=a $1=a

$2=b < $3=c $2=b$3=c

Two documents:D1 D2

<a> <a><d> <c/>

<b> <d></d> <b/><c/> </d>

<a> </a>

Original query matches D1 onlyRelaxed query matches D1 and D2

66

Conclusions

Provide user and context sensitive query relaxations (structured ,semi-structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators

CoXML( Cooperative XML) Value relaxation Structure relaxation ( edge, node, order)

67

References

[1] W.W.Chu,H.Yang, K.Chiang, M.Minock, G.Chow, and C.Larson, "CoBase: A Scalable and Extensible Cooperative Information System", Journal of Intelligence Information Systems, 6, 1996

[2] Shaorong Liu and Wesley W. Chu, Cooperative XML(CoXML) Query Answering at INEX 2003, INEX Workshop 2003

[3] Dongwon Lee "Query Relaxation for XML Model“ In Ph.D Dissertation, University of California, Los Angeles, June 2002

[4] Dongwon Lee, Murali Mani, Wesley W. Chu"Effective Schema Conversions between XML and Relational Models“ In European Conf. on Artificial Intelligence (ECAI), Knowledge Transformation Workshop (ECAI-OT), Lyon, France, July 2002 (Invited)

http://www.cobase.cs.ucla.edu

Recommended