28
Query Formation From High- Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University of California Los Angeles, CA http://www.cobase.cs.ucla.edu

Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Embed Size (px)

Citation preview

Page 1: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Formation From High-LevelConcepts for Relational Databases

Guogen ZhangWesley ChuFrank MengGladys Kong

Computer Science DepartmentUniversity of California

Los Angeles, CA http://www.cobase.cs.ucla.edu

Page 2: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Outlines

• Overview• Semantic Graph Model• High-Level Query Formation for SPJ

queries• Incremental Query Formation for

Complex Queries• Conclusions

Page 3: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Overview: Query Formation

• Based on semantic graph model, including user-defined relationships

• User specifies requests and constraints• Formulate simple query by graph search

technique– Candidates ranked by information

measure– English-like query description

• A complex query can be formulated by a series of simple queries

Page 4: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Related Work

• Query formulation as Steiner tree problem (Wald and Sorenson, 1984)– limited to partial 2-tree graphs

• Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988)

• Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994)

• Query-by-Icon (QBI) [Massari and Chrysanthis, 1995]

• Natural language interfaces (text/voice): logical form to query

Page 5: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Semantic Graph Model

• Weighted graph G=(V,E):• Nodes: entities -- strong, weak, user-defined• Links: relationships -- ISA, HAS, simple,

complex, user-defined– For relational databases:

•nodes: relations•links: natural and user-defined joins•Weight: information measure of a node or

link

Page 6: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Title:D:\WINNT\Profiles\fmeng\Personal\CoBase Documents\graph.epsCreator:ImageMark Software LabsPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.

Semantic Graph Example

Page 7: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Feature

• Query expression in a semantic graph

– Query Topic, T: A set of Joins represented by links

– Query Constraints, C: Query Conditions

– Query Aspect, A: Attribute list

Page 8: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

A query topic for “aircraft can land on airports at geographical locations of

countries”

airports

runwayscan land

have

is a located

airfield_chars

geoloc country

Page 9: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Semi-Automatic Generation of Semantic Model

• Find natural joins through key and foreign key between nodes.

• User-defined links can be added into the graph model.

• Designers need to specify link types and assign names to all the elements in the graph.

Page 10: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Example of SemanticModel Generation

AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …;key: APORT_NM.

RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT,RUNWAY_WIDTH_FT, …; key: RUNWAY_NM.

GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …;key: GLC_CD.

COUNTRY: CY_CD, CY_NM, …; key: CY_CD.Links:

AIRPORT--RUNWAY: APORT_NM;AIRPORT--GEOLOC: GLC_CD;RUNWAY--GEOLOC: GLC_CD;GEOLOC--COUNTRY: CY_CD;

Page 11: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Information Measure

Information measure of a node or link, aI(a) = - log P(a)

where P(a) is the probability of a being used in queries.

Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive:

n

I(A) = SUM I(ai)

i = 1

Page 12: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Information Measure(cont.)

Initial Information Measure:all the nodes = 1different nodes have a different value

Information measure is normalized and converted into counts

Probability of a node or a link is P(ai) = ci/c• Update Information measure• Ranking based on Information measure, thus

adapt to user feedback

Page 13: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Formulation

To formulate (simple) queries without knowledge of query language or database schema

Example:Find airports in Tunisia that can land a C-5 cargo plane

User input:Query aspect: AIRPORTS.APORT_NMConstraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’

COUNTRY_STATE.CY_NM = ‘Tunisia’Links: CAN LAND

Page 14: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Formulated Query

SELECT R3.APORT_NMFROM AIRCRAFT_AIRFIELD_CHARS R0

AIRPORTS R3, COUNTRY_STATE R11GEOLOC R12, RUNWAYS R16

WHERE R0.AC_TYPE_NM = ‘C-5’AND R11.CY_NM = ‘Tunisia’AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FTAND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FTAND R11.GLC_CD = R3. GLC_CDAND R3.APORT_NM = R16.APORT_NMAND R11.CY_CD = R11.CY_CD

Page 15: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Completion as GraphSearch Problem

Given: An incomplete input query topic Ti

Find a set of links to complete the topic (to make Ti connected)

Minimum Missing Information principle:The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information

Page 16: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Formulation Algorithm

• Input: subgraph T of the semantic graph G– Find candidates with the minimum Information

measure• Two methods used to limit the search scope:

– L-step-bound paths: paths that connect two components with at most L links, to limit search within the neighborhood of the input subgraph

– k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)

Page 17: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query

airportsrepair(1)

2

aircrafts airportshave authorize

1 2(2)runways

can land

airports

country

geolocat is a

1 1

geolocat located

1 1

geolocis a located

1 1

airportshave

1(3)

(4)

(5)

(6)

(a) Initial components (b) 2-step-bound paths

airfield_chars

airports

runways

runways

runways

airfield_chars

airfield_chars

country

country

airports

Page 18: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

The Semantic Graph For theTransportation Domain

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

weather

airfield_chars

geoloc country

Page 19: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

• Incremental Query Formulation– To assist user reach a complex query

goal with a series of simple queries– The subsequent queries may depend on

results of preceding queries (derived relations)

• Issues– Incorporate derived relations into the

semantic graph– Suggest missing attributes to link

isolated derived nodes to the graph

Incremental Query Formulation

Page 20: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Incremental Query Examples

• Find airports in Tunisia.• Which of these airports can land a C-5?• What is the weather at these airports?

Page 21: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Incorporating Derived Relations

• Source relation: contributes attributes to the derived relations

• Derived relation: inherits properties of attributes from their source relations

• Deriving link: links to the source relations through inherited keys

• Inherited link: inherits links from the source relations

Page 22: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Extended semantic graph showing derived nodes, derived links and

inherited links

airports

runwayscan land

Relation Node

at

have

is a located

2

1

1 1

1

Derived Node

Derived Link

Inherited Link

airfield_chars

weather

geoloc country

airporttunisiacanland airporttunisiacanlandweather

airporttunisia

Page 23: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Suggesting KeyAttributes for a Query

• Find source relations for the isolated derived relation.

• Suggest key of the source relations as attributes to include.

Page 24: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Concept and AttributeSpecification Interface

Page 25: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Query Constraint Specification

Page 26: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Action Specification

Page 27: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

English-Like Query Descriptionand the Formulated Query

Page 28: Query Formation From High-Level Concepts for Relational Databases Guogen Zhang Wesley Chu Frank Meng Gladys Kong Computer Science Department University

Conclusions

• Semantic graph model provides a basis for query formulation search

• Ranking of query candidates by information measure in formulation provides adaptive behavior

• Incremental query formulation is effective for complex queries

• GUI and voice interface can be built for query formulation from high-level concepts