118
1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles http://www.cobase.cs.ucla.edu

1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

  • View
    219

  • Download
    2

Embed Size (px)

Citation preview

Page 1: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

1

CoBase: Scalable and Extensible Cooperative Information System

Wesley W. ChuComputer Science Department

University of California, Los Angeles

http://www.cobase.cs.ucla.edu

Page 2: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

2

Conventional Query Answering

Need to know the detailed database schemaCannot get approximate answersCannot answer conceptual queries

Cooperative Query AnsweringDerive approximate AnswersAnswer Conceptual Queries

Page 3: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

3

Find a seaport with railway facility in Los Angeles

CoBase ServersHeterogeneousInformation Sources

CoBase provides: Relaxation Approximation Association Explanation

Find a nearby friendly airport that can land F-15

Domain Knowledge

Find hospitals with facility similar to St. John’s near LAX

Cooperative Queries

Page 4: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

4

Generalization and Specialization

More Conceptual Query

Specific Query

Conceptual Query Conceptual Query

Specific Query

Generalization

SpecializationGeneralization

Specialization

Page 5: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

5

Type Abstraction Hierarchy (TAH)

Chemical-Suit Size TAH(A non-numerical TAH) All_Sizes

Large_SizeSmall_Size

Very_Small

Small_to_Medium

Large_to_Extra_Large

Very_Large

XL XXLLMSXXSXXXS

Provide multi-level knowledge representations

Page 6: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

6

Type Abstraction Hierarchy (TAH)

CA

N. CAS. CA C. CA

SanJose

PaloAltoSacramento

DavisSanDiego

LongBeach

LA SF

(Location Example)

Page 7: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

7

Relaxation Agent

query conditionsconstraints

Use knowledge-based approach (generalization

and specialization via Type Abstraction Hierarchy)

to relax the followings for matching:

Page 8: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

8

Query Relaxation

Yes

Query

Display

AnswersRelaxAttribute Database

No

QueryModificationTAHs

Page 9: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

9

Page 10: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

10

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

Page 11: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

11

Page 12: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

12

Relaxation Control Primitives

not-relaxable runway-length

relaxation-order (runway length,

location)

preference-listunacceptable-listanswer-sizerelaxation-level

Page 13: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

13

Relaxation Primitives

^ (approximate) ^ 9 am

betweennear-to (context-sensitive) Airport near-to

LAX Restaurant near-to

UCLA

similar-to Airport similar-

to LAX base-on (traffic,runway)

within

Page 14: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

14

Similar-to

Find all airports in Tunisia similar to the Bizerte airport based on runway length and (more importantly) runway width.

select aport_name, runway_length, runway_widthfrom runways, countrieswhere aport_name similar-to ‘Bizerte’

based-on ((runway_length 1.0) (runway_width 2.0)) and country_state_name = ‘Tunisia’ and countries.glc_cd = runways.glc_cd

Page 15: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

15

Similar-to Result

APROT_NM LENGTH WIDTH RANKBezerte 8000 148 0.00El Borma 7200 144 0.09Monastir 9700 137 0.20Jerba 10171 148 0.24Bjedeida 6000 122 0.27

Similar-to module ranks the returned answersaccording to mean-squared error.

Page 16: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

16

Unacceptable List Operator

NETunisia

CentralTunisia

NWTunisia

SWTunisia

Tunisia

Bizerte El Borma...

CentralTunisia

SWTunisia

Tunisia

Gafsa El Borma

Type Abstraction Hierarchy Trimmed TAH

Avoid Northern Tunisia!

CoBaseRelaxationManager

Constraint

Gafsa

Page 17: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

17

TAH Generation for Numerical Attribute Values

Relaxation Error Difference between the exact value and the

returned approximate value The expected error is weighted by the

probability of occurrence of each value

DISC (Distribution Sensitive Clustering) is based on the attribute values and frequency distribution of the data

Page 18: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

18

TAH Generation for Non-numerical Attribute Values

Pattern Based Knowledge Induction (PBKI)

Rule-based approachClusters attribute values into TAH based on other attributes in the relation (i.e., Inter-Attributes Relationships)Provides attribute correlation value (measure how well the rules applied to the databases)

Page 19: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

19

Type Abstraction Hierarchy (TAH)

Location Name Runway Length

All

Short Medium Long

0 ... 700 700 ... 1K 1K ... 5K

Tunisia

NE Tunisia

Bizerte

Tunis

Djedeida

CentralTunisia

SW Tunisia

El Borma

...

Provide multi-level knowledge representations

Page 20: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

20

Associative Query Answering

Provide relevant information not explicitly asked by the userUser Query: List all airports with runway length between 8500

and approximately 10000 feet

Airport Name Runway Length (feet)Jerba 10171

Monastir 9700Tunis 10500

Weather Runway QualitySunny GoodRain Good

Foggy Damaged

Military or Civilian Flag

Refrigerated Storage Capacity (Tons)

CC 0.00C 1000.00

Query Answers

Associated Attributes and Answers Associated Attributes and Answers

User Type = Pilot User Type = Planner

Page 21: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

21

CoBase and GLADIntegration

Wesley W. Chu

Page 22: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

22

CoBase FunctionalityProvide approximate matching Find HETs with capacity of approximate 5-ton

Provide conceptual query answering Find “Earth Moving” Equipment

Provide content-sensitive spatial queries Find storage sites near selected location (Integration with MATT map server)

Provide relaxation control Relaxation order Not-relaxable At-least (answer set, quantity on hand)

Page 23: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

23

Cooperative Operations Added to GLADImplicit Query RelaxationExplicit Query Relaxation Approximate operator Similar-to/based-on Spatial relaxation

Relaxation Control Relaxation-order Not-relaxable At-least (answer-set size, quantity on hand)

Page 24: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

24

CoBase Features Added to GLADEnhance GLAD queries with cooperative operators (similar-to, relaxation-order, etc.)Display the query relaxation process modified query conditions (value, spatial) type abstraction hierarchies

Rank returned answers with similarity measurese.g., spatial relaxation ranks answers according to

their distance from the selected location

Page 25: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

25

CoBase and GLAD TIE

ReportCollection

Report QueryConstructor

Filter

Editor

ObjectCache

DisplayGenerator

QueryCollection

GLAD

CoBase QueryEditor

CoBaseRelaxationManager

KnowledgeBase

DataCacheCoBase

Data Source

Manager

Databases

NSNs

SpatialArea

Selection

Page 26: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

26

GLAD Query

Find NSNs of aircraft with passenger capacity > 10, combat type = 'I', capacity weight <= 2 tons and price < 700,000. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'

and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

Page 27: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

27

CoGLAD Query with Relaxation Control Operators

Find NSNs of aircrafts with passenger capacity > 10, combat type = 'I',capacity weight <= 2 tons and price < 700,000. Attribute passengercapacity is not relaxable. Relax price first and then capacity weight. select nsn, price, pax_capacity_qty, capacity_wt_stonfrom nsn_descriptionwhere (upper(class) = '7'

and upper(cbs_category_nomen) = 'AIRCRAFT'and price < 700000and pax_capacity_qty > 10and upper (combat_type) = 'I'and capacity_wt_ston <= 2)

not-relaxable pax_capacity_qtyrelaxation-order price capacity_wt_ston

Page 28: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

28

CoGLAD Querywith Similar-to OperatorFind aircraft similar to NSN = '0000IB0000961' based on the attributes price, passenger capacity and air mileage. Passenger capacity has a weight of 8 and price and air mileage has a weight of 1.

select nsnfrom nsn_descriptionwhere upper(nsn) similar-to '0000IB0000961'

based-on ((price 1.0) (pax_capacity_qty 8.0) (air_mileage 1.0))

at-least 4

* '0000IB0000961' is an answer from the previous query

Page 29: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

29

CoGLAD Querywith Approximate Operator

Find DLA stock report with NSN like ‘%8340% (FSC for tents and tarpaulin) and on-hand quantity is approximate 150.

select nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

on_hand_quantity = ~150

Page 30: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

30

Adding Constraints to a Query

GLAD queryselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’

Query with added constraintsselect nsn, ricfrom dla_stock_reportwhere nsn like ‘%8340%’ and

nomenclature like ‘%TARP%’ and on_hand_quantity = ~150

andsize_in_square_feet = 350

Page 31: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

31

Example of Spatial Relaxation

NSNsselected an area on the mapconstraint: quantity on hand

CoBaseRelaxationManager

satisfyconstraints

Yes

No

return the answers

QueryProcessing

relax the selected areabased on the context-sensitive TAHs

Page 32: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

32

Spatial Relaxation with Relaxation Controlrelaxation-order: size, (latitude, longitude)

not-relaxable: price

at-least: value: size of the tarpaulin quantity on hand: relax until enough

quantity on hand (specified by the user) is obtained

Page 33: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

33

Scalable and Extensible CoBase Architecture

Page 34: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

34

Mediator Inter-Communications via KQML

ModuleObjects

APIs

Content LanguageDataActions

CoBaseOntology

Mediator A

Module A

CoBase Ontology

CoBase Content Language

KQML

Mediator B

Module B

CoBase Ontology

CoBase Content Language

KQML

Page 35: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

35

Page 36: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

36

Query Answers Without CoBase

Query: find chemical suits

Page 37: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

37

Page 38: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

38

Page 39: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

39

Page 40: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

40

Page 41: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

41

Page 42: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

42

Page 43: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

43

Electronic Warfare

Identify and locate sources of radiated electromagnetic energyDetermine emitter type based on the operating parameters of observed signals: Radio Frequency (RF) Pulse Repetition Frequency (PRF) Pulse Duration (PD) Scan Period (SP) other operating parameters

Determine platform sites near the line of the bearing of an emitter

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

Page 44: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

44

Performance Improvement by Using CoBase in EW

Conventional DB CoBaseCase 1 Case 2 Case 1 Case 2

identified 90.00% 30.00% 100.00% 85.90%id/ranking 100.00% 36.00% 100.00% 98.80%relaxation 0.00% 0.00% 95.90% 99.80%

Conventional DB: parameter ranges from emitter specificationsCoBase:

DB: peak parameters (RF,PRF) and parameter ranges (PD,SP)KB: TAHs based on RF and PRF peak parameters

TAHs based on PD and SP parameter rangesCase 1: emitter signals without noiseCase 2: add noise - PD & SP (10%), PRF (5%), RF (2.5%)Sample Size: 1000 signals Emitter Types: 75

This research is a joint effort between CoBase and Lockheed Martin Communication Systems (Russ

Frew, et al.), Camden, NJ

Page 45: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

45

Current CoBase Users and Applications

ARPI members ISI Unisys

Enchance Query Capabilities in TransportationDomain (ARPI TARGET): query relaxation, association, and explanation

UCLA KMeD Project Medical School

Improve Search in Medical Images (X-rays, MRs) approximate matching of image features and

contents explanation of approximate matching quality

Hughes Research Lab Integrate Schema in Heterogeneous Databases approximate matching of attributes and views

Lockheed/Martin Marietta

Emitter and Platform Identification approximate matching of observed emitter signals relaxation of regions to identify emitter platforms

BBN Enchance DOD Logistic Anchor Desk (GLAD) query relaxation and spatial relaxation

Page 46: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

46

Conclusions

Provide user and context sensitive query relaxations (structured and unstructured data)Provide additional information (associative query answering) based on past casesCoSQL (Cooperative SQL) similar-to, near-to, approximate relaxation control operators

GUI map server, high-level query formation

Page 47: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

47

Page 48: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

48

CoSent: An Active Data Base Technology

Natural language-like rule supports conceptual & approximate terms Decompose natural language-like rule to low level rules via knowledge based (TAH) Mimic human cognitive process and thus ease in rule specificationEase in rule maintenance

Page 49: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

49

CoSent: An Active Database Technologies

Trigger with high-level rules containing conceptual term (e.g., bad, heavy) and approximate operators (e.g., similar-to, near-

to, approximate)Allow trigger conditions to be specified with fuzzy and conceptual termsMimic human cognitive expression

CoSent monitors temporal composition events and executes rules with conceptual and approximate terms.

Page 50: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

50

Key Features of CoSent

User defined rules transformed into low-level range values via knowledge base--Type Abstraction Hierarchies (TAHs)TAHs are typically generated from data sources automaticallyLeveraged on conventional DBMS (e.g., Oracle, Sybase, Teradata) triggering systemsRule definition is either specified by domain expert or derived by data mining technologies

Page 51: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

51

Example of Rule Definitions with Data Mining Technology

Find attributes that frequently appear together for a given target attribute. If bad road condition and also bad weather,

then cause traffic congestion. If a person wrote many bad checks and also

has past eviction, then this person is a poor credit risk.

Based on the frequency of occurrence, the derived rules can be ranked according to certain information measure.

Page 52: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

52

Conventional vs. Natural Language-Like Rules

Natural Language-Like RulIf the weather turns bad,

then notify all affected units in that region and all those that are near to that region.

Conventional RuleIf wind_speed > MAX_WIND_SPEED and

wave_height > MAX_WAVE_HEIGHTthen notify affected units in regions.

Page 53: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

53

Natural Language-Like Rule Specifications

Example 2If the aircraft has a fuel contamination problem and the aircraft type is similar-to‘C-5’ based on the fuel type and fueling method, then notify the authority

Example 1If the number of departures of large cargo carrier (e.g., C-5, C-141) becomes significantly low in the past seven days, notify the Air Mobility Command.

Page 54: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

54

Example

Wind Speed(meters/second)

14.913.512.212

11.810.610.510108.37.98.17.77.1

Wave Height(meter)

3.33.13.12.62.82.32.72.52.52.32.222

1.8

Wind Speed(meter/second)

7.47.77

6.56.66.56.66.45.95.76

4.54

3.7

Wave Height(meter)

1.91.71.61.51.61.41.41.51.51.41.61.41.31.2

Wind Speed is the hourly average over an eight-minute period for buoys and a two-minute period for land stations

Wave height is sampled in a 20-minute period

DoD Transportation PlanningWeather Report Table

Page 55: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

55

TAH Example

Wave Height[0.6, 7.2]

VERY LOW[0.6, 1.25]

LOW[1.25, 1.75]

HIGH[1.75, 2.45]

VERYHIGH

[2.45, 7.2]

Wave Height

Page 56: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

56

A Portionof WaveHeightTAH

Page 57: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

57

Triggering Based on Temporal Composite Events

Notify the commander if within the past seven days, the total departure of C-5 is significantly low and the filter problem on C-5 is extremely high.

C-5 Departure

Low9-134.5

High134.5-208

Very Low53-134.5

Signt. Low9-53

Signt High162-208

Very High134.5-162

C-5 Filter Problem

Low0-53

High53-79

Very Low36-53

Extra. Low0-36

Ex High60-79

Very High53-60

Page 58: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

58

Natural Language-Like Rule Translations

RuleDefinition

TAH

Conventional triggering system (e.g.,Oracle,Sybase,Teradata)

Low-level rules

Natural Language-Like Rules

Rule Parser

Rule Rep

Rule Decomposer

Rule Translator

Rule Translation/Relaxation

Page 59: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

59

CoSent Architecture

TriggerAction(output)

Rule Parser

RelaxationEngine

TAHs

Rule Base

RuleManager

EventManager

ActionManager

Natural Language-Like Rule

Composite Event Specification and Notification

CoSent Server

(input)

(input/output)

Rule Translation/Relaxation

Commercial relational database systems (e.g., Oracle, Sybase, Teradata, etc.)

Page 60: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

60

CoSent Demo

Natural Language-like rule with conceptual terms :“very high wave height” and ”very strong wind speed”Natural language-like rule with approximate term “nearby” and conceptual term “bad weather”Install trigger by drag-and-drop on the desired location on the map

Page 61: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

61

Natural Language-Like Rule

Natural language-like rule containing conceptual terms, such as wave_height = “very-high” and wind_speed = “very-strong”, can be translated to range values by domain knowledge. For instance, type abstraction hierarchy. Natural language-like rules reduce the number of rules, thus easing rule maintenance

Page 62: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

62

Page 63: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

63

Page 64: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

64

Page 65: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

65

Page 66: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

66

Page 67: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

67

Rules With Approximate Terms

Rules can contain approximate terms, such as near-by and approximate, thus ease in rule specificationThe Trigger can be installed on the desired location on a map by drag-and-drop methodThe near-by region affected by the bad weather condition is specified by the trigger condition shown by a red circle

Page 68: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

68

Page 69: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

69

Page 70: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

70

Page 71: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

71

Page 72: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

72

Page 73: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

73

Page 74: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

74

Page 75: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

75

Map Server Architecture

Page 76: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

76

Current Capabilities of Map Server

Visualization of Query Answers Icons Paths

Enter Query Constraints GraphicallyVisualization of Query Relaxation Process

Page 77: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

77

Visualization of Relaxation Process

Query: Find seaports in the given region.

given region

relaxed region

Page 78: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

78

Explanation Agent

Based on process traces and invocation rules, generate English-like explanation of: Relaxation process Quality of approximate matching Further explanation on definitions and terms in

explanation

Page 79: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

79

Explanation of Relaxation Process

Page 80: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

80

Relaxation Primitive: within

Page 81: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

81

Extend near-to Primitive Points to Regions

Page 82: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

82

Dynamic Nearness

Uses transaction history to identify nearness between tuples and values

If two tuples (or attribute values) appear together in a query answer, then that is a piece of evidence that they should be clustered together.

Gather evidence over time

Evolve the hierarchy

Page 83: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

83

The BOOKS Relation

Page 84: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

84

Schematic of a Browsing System

Page 85: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

85

Schematic of a Query Modification System

Page 86: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

86

The Links Between Tuples in BOOKS

Page 87: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

87

Dynamic Links After Two Queries

Page 88: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

88

Links with Counts

Page 89: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

89

Number of Links with Threshold Value

Page 90: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

90

Number of Links is determined by Maximum Answer Set Size

Page 91: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

91

Query Formation From High-LevelConcepts for Relational

Databases

Guogen ZhangWesley ChuFrank MengGladys Kong

Page 92: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

92

Outlines

OverviewSemantic Graph ModelHigh-Level Query Formation for SPJ queriesIncremental Query Formation for Complex QueriesConclusions

Page 93: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

93

Overview: Query Formation

Based on semantic graph model, including user-defined relationshipsUser specifies requests and constraintsFormulate simple query by graph search technique Candidates ranked by information measure English-like query description

A complex query can be formulated by a series of simple queries

Page 94: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

94

Related WorkQuery formulation as Steiner tree problem (Wald and Sorenson, 1984) limited to partial 2-tree graphs

Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988)Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994)Query-by-Icon (QBI) [Massari and Chrysanthis, 1995]Natural language interfaces (text/voice): logical form to query

Page 95: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

95

Semantic Graph Model

Weighted graph G=(V,E):Nodes: entities -- strong, weak, user-definedLinks: relationships -- ISA, HAS, simple, complex, user-defined For relational databases:

nodes: relations links: natural and user-defined joins Weight: information measure of a node or link

Page 96: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

96

Query Feature

Query expression in a semantic graph

Query Topic, T: A set of Joins represented by links

Query Constraints, C: Query Conditions Query Aspect, A: Attribute list

Page 97: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

97

A query topic for “aircraft can land on airports at geographical locations of countries”

airports

runwayscan land

have

is a located

airfield_chars

geoloc country

Page 98: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

98

Semi-Automatic Generation of Semantic Model

Find natural joins through key and foreign key between nodes.User-defined links can be added into the graph model.Designers need to specify link types and assign names to all the elements in the graph.

Page 99: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

99

Example of Semantic Model Generation

AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …;key: APORT_NM.

RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT,RUNWAY_WIDTH_FT, …; key: RUNWAY_NM.

GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …;key: GLC_CD.

COUNTRY: CY_CD, CY_NM, …; key: CY_CD.Links:

AIRPORT--RUNWAY: APORT_NM;AIRPORT--GEOLOC: GLC_CD;RUNWAY--GEOLOC: GLC_CD;GEOLOC--COUNTRY: CY_CD;

Page 100: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

100

Information Measure

Information measure of a node or link, aI(a) = - log P(a)

where P(a) is the probability of a being used

in queries.Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive:

n

I(A) = SUM I(ai) i = 1

Page 101: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

101

Information Measure (cont.)

Initial Information Measure:all the nodes = 1different nodes have a different value

Information measure is normalized and converted into counts

Probability of a node or a link is P(ai) = ci/cUpdate Information measureRanking based on Information measure, thus adapt to user feedback

Page 102: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

102

Query Formulation

To formulate (simple) queries without knowledge of query language or database schema

Example:Find airports in Tunisia that can land a C-5 cargo plane

User input:Query aspect: AIRPORTS.APORT_NMConstraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’

COUNTRY_STATE.CY_NM = ‘Tunisia’Links: CAN LAND

Page 103: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

103

Formulated Query

SELECT R3.APORT_NMFROM AIRCRAFT_AIRFIELD_CHARS R0

AIRPORTS R3, COUNTRY_STATE R11GEOLOC R12, RUNWAYS R16

WHERE R0.AC_TYPE_NM = ‘C-5’AND R11.CY_NM = ‘Tunisia’AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FTAND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FTAND R11.GLC_CD = R3. GLC_CDAND R3.APORT_NM = R16.APORT_NMAND R11.CY_CD = R11.CY_CD

Page 104: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

104

Query Completion as Graph Search Problem

Given: An incomplete input query topic Ti

Find a set of links to complete the topic (to make Ti connected)

Minimum Missing Information principle:The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information

Page 105: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

105

Query Formulation Algorithm

Input: subgraph T of the semantic graph G Find candidates with the minimum Information

measure

Two methods used to limit the search scope: L-step-bound paths: paths that connect two

components with at most L links, to limit search within the neighborhood of the input subgraph

k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)

Page 106: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

106

Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query

airportsrepair

(1)2

aircrafts airportshave authorize

1 2(2)runways

can land

airports

country

geolocat is a

1 1

geolocat located

1 1

geolocis a located

1 1

airportshave

1(3)

(4)

(5)

(6)

(a) Initial components (b) 2-step-bound paths

airfield_chars

airports

runways

runways

runways

airfield_chars

airfield_chars

country

country

airports

Page 107: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

107

The Semantic Graph For theTransportation Domain

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

weather

airfield_chars

geoloc country

Page 108: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

108

Incremental Query Formulation To assist user reach a complex query goal

with a series of simple queries The subsequent queries may depend on

results of preceding queries (derived relations)

Issues Incorporate derived relations into the

semantic graph Suggest missing attributes to link isolated

derived nodes to the graph

Incremental Query Formulation

Page 109: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

109

Incremental Query Examples

Find airports in Tunisia.Which of these airports can land a C-5?What is the weather at these airports?

Page 110: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

110

Incorporating Derived Relations

Source relation: contributes attributes to the derived relationsDerived relation: inherits properties of attributes from their source relationsDeriving link: links to the source relations through inherited keysInherited link: inherits links from the source relations

Page 111: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

111

Extended semantic graph showing derived nodes, derived links and inherited links

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

Derived Node

Derived Link

Inherited Link

airfield_chars

weather

geoloc country

airporttunisiacanland airporttunisiacanlandweather

airporttunisia

Page 112: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

112

Suggesting Key Attributes for a Query

Find source relations for the isolated derived relation.Suggest key of the source relations as attributes to include.

Page 113: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

113

Concept and Attribute Specification Interface

Page 114: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

114

Query Constraint Specification

Page 115: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

115

Action Specification

Page 116: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

116

English-Like Query Descriptionand the Formulated Query

Page 117: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

117

Conclusions

Semantic graph model provides a basis for query formulation searchRanking of query candidates by information measure in formulation provides adaptive behaviorIncremental query formulation is effective for complex queriesGUI and voice interface can be built for query formulation from high-level concepts

Page 118: 1 CoBase: Scalable and Extensible Cooperative Information System Wesley W. Chu Computer Science Department University of California, Los Angeles

118