58
1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu

CoBase: Scalable and Extensible Cooperative Information System

  • Upload
    newman

  • View
    21

  • Download
    0

Embed Size (px)

DESCRIPTION

CoBase: Scalable and Extensible Cooperative Information System. Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu. Conventional Query Answering. Need to know the detailed database schema Cannot get approximate answers - PowerPoint PPT Presentation

Citation preview

Page 1: CoBase: Scalable and Extensible Cooperative Information System

1

CoBase: Scalable and Extensible Cooperative Information System

Wesley W. ChuComputer Science Department

University of California, Los Angeles

http://www.cobase.cs.ucla.edu

Page 2: CoBase: Scalable and Extensible Cooperative Information System

2

Conventional Query Answering

Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries

Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries

Page 3: CoBase: Scalable and Extensible Cooperative Information System

3

Find a seaport with railway facility in Los Angeles

CoBase ServersHeterogeneousInformation Sources

CoBase provides: Relaxation Approximation Association Explanation

Find a nearby friendly airport that can land F-15

Domain Knowledge

Find hospitals with facility similar to St. John’s near LAX

Cooperative Queries

Page 4: CoBase: Scalable and Extensible Cooperative Information System

4

Generalization and Specialization

More Conceptual Query

Specific Query

Conceptual Query Conceptual Query

Specific Query

Generalization

SpecializationGeneralization

Specialization

Page 5: CoBase: Scalable and Extensible Cooperative Information System

5

Type Abstraction Hierarchy (TAH)

Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes

Large_SizeSmall_Size

Very_Small

Small_to_Medium

Large_to_Extra_Large

Very_Large

XL XXLLMSXXSXXXS

Provide multi-level knowledge representations

Page 6: CoBase: Scalable and Extensible Cooperative Information System

6

Type Abstraction Hierarchy (TAH)

CA

N. CAS. CA C. CA

SanJose

PaloAltoSacramento

DavisSanDiego

LongBeach

LA SF

(Location Example)

Page 7: CoBase: Scalable and Extensible Cooperative Information System

7

Relaxation Agent

query conditionsconstraints

Use knowledge-based approach (generalization

and specialization via Type Abstraction Hierarchy)

to relax the followings for matching:

Page 8: CoBase: Scalable and Extensible Cooperative Information System

8

Query Relaxation

Yes

Query

Display

AnswersRelaxAttribute Database

No

QueryModificationTAHs

Page 9: CoBase: Scalable and Extensible Cooperative Information System

9

Page 10: CoBase: Scalable and Extensible Cooperative Information System

10

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

Page 11: CoBase: Scalable and Extensible Cooperative Information System

11

Page 12: CoBase: Scalable and Extensible Cooperative Information System

12

Relaxation Control Primitives

not-relaxable runway-length

relaxation-order (runway length,

location)

preference-listunacceptable-listanswer-sizerelaxation-level

Page 13: CoBase: Scalable and Extensible Cooperative Information System

13

Relaxation Primitives

^ (approximate) ^ 9 am

betweennear-to (context-sensitive) Airport near-to

LAX Restaurant near-to

UCLA

similar-to Airport similar-

to LAX base-on (traffic,runway)

within

Page 14: CoBase: Scalable and Extensible Cooperative Information System

14

Similar-to

Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.

select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’

based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd

Page 15: CoBase: Scalable and Extensible Cooperative Information System

15

Similar-to Result

APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27

Similar-to module ranks the returned answersaccording to mean-squared error.

Page 16: CoBase: Scalable and Extensible Cooperative Information System

16

Unacceptable List Operator

NETunisia

CentralTunisia

NWTunisia

SWTunisia

Tunisia

Bizerte El Borma...

CentralTunisia

SWTunisia

Tunisia

Gafsa El Borma

Type Abstraction Hierarchy Trimmed TAH

Avoid Northern Tunisia!

CoBaseRelaxationManager

Constraint

Gafsa

Page 17: CoBase: Scalable and Extensible Cooperative Information System

17

TAH Generation for Numerical Attribute Values

Relaxation Error Difference between the exact value and the

returned approximate value The expected error is weighted by the

probability of occurrence of each value

DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

Page 18: CoBase: Scalable and Extensible Cooperative Information System

18

TAH Generation for Non-numerical Attribute Values

Pattern Based Knowledge Induction (PBKI)

Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)

Page 19: CoBase: Scalable and Extensible Cooperative Information System

19

Type Abstraction Hierarchy (TAH)

Location Name Runway Length

All

Short Medium Long

0 ... 700 700 ... 1K 1K ... 5K

Tunisia

NE Tunisia

Bizerte

Tunis

Djedeida

CentralTunisia

SW Tunisia

El Borma

...

Provide multi-level knowledge representations

Page 20: CoBase: Scalable and Extensible Cooperative Information System

20

Associative Query Answering

Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500

and approximately 10000 feet

Airport Name Runway Length (feet)Jerba 10171

Monastir 9700Tunis 10500

Weather Runway QualitySunny GoodRain Good

Foggy Damaged

Military or Civilian Flag

Refrigerated Storage Capacity (Tons)

CC 0.00C 1000.00

Query Answers

Associated Attributes and Answers Associated Attributes and Answers

User Type = Pilot User Type = Planner

Page 21: CoBase: Scalable and Extensible Cooperative Information System

21

CoBase and GLADIntegration

Page 22: CoBase: Scalable and Extensible Cooperative Information System

22

CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton

Provide conceptual query answering Find “Earth Moving” Equipment

Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)

Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)

Page 23: CoBase: Scalable and Extensible Cooperative Information System

23

Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation

Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)

Page 24: CoBase: Scalable and Extensible Cooperative Information System

24

CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies

Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to

their distance from the selected location

Page 25: CoBase: Scalable and Extensible Cooperative Information System

25

CoBase and GLAD TIE

ReportCollection

Report QueryConstructor

Filter

Editor

ObjectCache

DisplayGenerator

QueryCollection

GLAD

CoBase QueryEditor

CoBaseRelaxationManager

KnowledgeBase

DataCacheCoBase

Data Source

Manager

Databases

NSNs

SpatialArea

Selection

Page 26: CoBase: Scalable and Extensible Cooperative Information System

26

GLAD Query

Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'

and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

Page 27: CoBase: Scalable and Extensible Cooperative Information System

27

CoGLAD Query with Relaxation Control Operators

Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston

Page 28: CoBase: Scalable and Extensible Cooperative Information System

28

CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.

select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'

based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))

at-least 4

* '0000IB0000961' is an answer from the previous query

Page 29: CoBase: Scalable and Extensible Cooperative Information System

29

CoGLAD Querywith Approximate Operator

Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.

select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

on_hand_quantity = ~150

Page 30: CoBase: Scalable and Extensible Cooperative Information System

30

Adding Constraints to a Query

GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’

Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’ and on_hand_quantity = ~150

andsize_in_square_feet = 350

Page 31: CoBase: Scalable and Extensible Cooperative Information System

31

Example of Spatial Relaxation

NSNsselected an area on the mapconstraint: quantity on hand

CoBaseRelaxationManager

satisfyconstraints

Yes

No

return the answers

QueryProcessing

relax the selected areabased on the context-sensitive TAHs

Page 32: CoBase: Scalable and Extensible Cooperative Information System

32

Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)

not-relaxable: price

at-least: value: size of the tarpaulin quantity on hand: relax until enough

quantity on hand (specified by the user) is obtained

Page 33: CoBase: Scalable and Extensible Cooperative Information System

33

Scalable and Extensible CoBase Architecture

Page 34: CoBase: Scalable and Extensible Cooperative Information System

34

Mediator Inter-Communications via KQML

ModuleObjects

APIs

Content LanguageDataActions

CoBaseOntology

Mediator A

Module A

CoBase Ontology

CoBase Content Language

KQML

Mediator B

Module B

CoBase Ontology

CoBase Content Language

KQML

Page 35: CoBase: Scalable and Extensible Cooperative Information System

35

Page 36: CoBase: Scalable and Extensible Cooperative Information System

36

Query Answers Without CoBase

Query: find chemical suits

Page 37: CoBase: Scalable and Extensible Cooperative Information System

37

Page 38: CoBase: Scalable and Extensible Cooperative Information System

38

Page 39: CoBase: Scalable and Extensible Cooperative Information System

39

Page 40: CoBase: Scalable and Extensible Cooperative Information System

40

Page 41: CoBase: Scalable and Extensible Cooperative Information System

41

Page 42: CoBase: Scalable and Extensible Cooperative Information System

42

Page 43: CoBase: Scalable and Extensible Cooperative Information System

43

Electronic Warfare

Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters

Determine platform sites near the line of the bearing of an emitter

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

Page 44: CoBase: Scalable and Extensible Cooperative Information System

44

Performance Improvement by Using CoBase in EW

Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2

identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%

Conventional DB: parameter ranges from emitter specificationsCoBase:

DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters

TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

Page 45: CoBase: Scalable and Extensible Cooperative Information System

45

Current CoBase Users and Applications

ARPI members ISI Unisys

Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation

UCLA KMeD Project Medical School

Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and contents explanation of approximate matching quality

Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views

Lockheed/Martin Marietta

Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms

BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation

Page 46: CoBase: Scalable and Extensible Cooperative Information System

46

Page 47: CoBase: Scalable and Extensible Cooperative Information System

47

XML Query Relaxation

Page 48: CoBase: Scalable and Extensible Cooperative Information System

51

XML Overview

XML (eXtensible Markup Language) is a format for specifying structured documents and data. XML is extensible since it allows users to define their own schema (unlike HTML which is a pre-defined markup language).

Page 49: CoBase: Scalable and Extensible Cooperative Information System

52

XML (cont.)XML is a hierarchical data model.A XML document consists of two parts

1. Schema2. Data

The schema describes the structure of the data.Example:

<?xml version="1.0" encoding="ISO-8859-1"?><!-- Edited with XML Spy v4.2 --><!DOCTYPE note [

<!ELEMENT note (to, from, heading, body)><!ELEMENT to (#PCDATA)><!ELEMENT from (#PCDATA)><!ELEMENT heading (#PCDATA)><!ELEMENT body (#PCDATA)>

]><note>

<to>Tove</to><from>Jani</from><heading>Reminder</heading><body>Don't forget me this weekend!</body>

</note>

Schema

Data

Page 50: CoBase: Scalable and Extensible Cooperative Information System

53

XML Query Languages

XML can be represented as an ordered tree with: Nodes representing elements and attributes Edges representing inclusion relationships

An XML query can similarly be represented as a tree with edges of two types: “/” for parent-child relationships “//” for ancestor-descendent relationships

Page 51: CoBase: Scalable and Extensible Cooperative Information System

54

XML Query Language: Example

The following XML<a>

<d><b/>

</d></c>

</a>Yields the following tree:

1,a2,d 4,c

3,bA possible query is:

$1=a

$2=b $3=c

Page 52: CoBase: Scalable and Extensible Cooperative Information System

55

Query Relaxation

XML Query Relaxation can be categorized into two main types:

1. Value Relaxation: values are relaxed to expand the scope values are allowed to take

2. Structure Relaxation: the structure of the query tree is relaxed to allow for more answers

Page 53: CoBase: Scalable and Extensible Cooperative Information System

57

Structure Relaxation

In structure relaxation nodes and/or edges of the query tree can be relaxed to allow for more answers.There are three types of structural relaxation:

1. Edge Relaxation2. Node Relaxation3. Order Relaxation

Page 54: CoBase: Scalable and Extensible Cooperative Information System

58

Edge Relaxation

A parent-child edge can be relaxed to a ancestor-descendent edge.

For example:

1,a

2,b 4,b 7,b 9,d 12,d

3,d 5,d 8,c 10,b 13,b

6,c 11,c 14,d

15,c

Original query “a/b/c” 1,7,8

Relaxed queries: “a//b/c” 1,7,8 & 1,10,11 “a/b//c” 1,7,8 & 1,4,6 “a//b//c” 1,7,8; 1,10,11; 1,4,6; 1,13,15

Page 55: CoBase: Scalable and Extensible Cooperative Information System

59

Node Relaxation

Nodes can be relaxed in several ways: A node can be relabeled with a similar tag name

based on the domain knowledge. For example: article/sec article/section

A node can be replaced with a “don’t care” such that it will match any non-null answer.

For example: /a/b/c a/_ /c A node can be removed while ensuring the

“superset” property. For example: a/b/c a/b

Page 56: CoBase: Scalable and Extensible Cooperative Information System

60

Order Relaxation

The order in an XML query can be relaxed to allow any ordering of search conditions.For example:

$1=a $1=a

$2=b < $3=c $2=b$3=c

Two documents:D1 D2

<a> <a><d> <c/>

<b> <d></d> <b/><c/> </d>

<a> </a>

Original query matches D1 onlyRelaxed query matches D1 and D2

Page 57: CoBase: Scalable and Extensible Cooperative Information System

66

Conclusions

Provide user and context sensitive query relaxations (structured ,semi-structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators

CoXML( Cooperative XML) Value relaxation Structure relaxation ( edge, node, order)

Page 58: CoBase: Scalable and Extensible Cooperative Information System

67

References

[1] W.W.Chu,H.Yang, K.Chiang, M.Minock, G.Chow, and C.Larson, "CoBase: A Scalable and Extensible Cooperative Information System", Journal of Intelligence Information Systems, 6, 1996

[2] Shaorong Liu and Wesley W. Chu, Cooperative XML(CoXML) Query Answering at INEX 2003, INEX Workshop 2003

[3] Dongwon Lee "Query Relaxation for XML Model“ In Ph.D Dissertation, University of California, Los Angeles, June 2002

[4] Dongwon Lee, Murali Mani, Wesley W. Chu"Effective Schema Conversions between XML and Relational Models“ In European Conf. on Artificial Intelligence (ECAI), Knowledge Transformation Workshop (ECAI-OT), Lyon, France, July 2002 (Invited)

http://www.cobase.cs.ucla.edu