Upload
kathleen-meghan-daniel
View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Query Formation From High-LevelConcepts for Relational Databases
Guogen ZhangWesley ChuFrank MengGladys Kong
Computer Science DepartmentUniversity of California
Los Angeles, CA http://www.cobase.cs.ucla.edu
Outlines
• Overview• Semantic Graph Model• High-Level Query Formation for SPJ
queries• Incremental Query Formation for
Complex Queries• Conclusions
Overview: Query Formation
• Based on semantic graph model, including user-defined relationships
• User specifies requests and constraints• Formulate simple query by graph search
technique– Candidates ranked by information
measure– English-like query description
• A complex query can be formulated by a series of simple queries
Related Work
• Query formulation as Steiner tree problem (Wald and Sorenson, 1984)– limited to partial 2-tree graphs
• Formulate simple Select-Project-Join (SPJ) queries via Universal Relation Model: no need to specify natural joins (Ullman 1988, Vardi, 1988)
• Object-oriented query path expression completion: partial order relationship between different path for ranking (Ioannidis and Lashkari, 1994)
• Query-by-Icon (QBI) [Massari and Chrysanthis, 1995]
• Natural language interfaces (text/voice): logical form to query
Semantic Graph Model
• Weighted graph G=(V,E):• Nodes: entities -- strong, weak, user-defined• Links: relationships -- ISA, HAS, simple,
complex, user-defined– For relational databases:
•nodes: relations•links: natural and user-defined joins•Weight: information measure of a node or
link
Title:D:\WINNT\Profiles\fmeng\Personal\CoBase Documents\graph.epsCreator:ImageMark Software LabsPreview:This EPS picture was not savedwith a preview included in it.Comment:This EPS picture will print to aPostScript printer, but not toother types of printers.
Semantic Graph Example
Query Feature
• Query expression in a semantic graph
– Query Topic, T: A set of Joins represented by links
– Query Constraints, C: Query Conditions
– Query Aspect, A: Attribute list
A query topic for “aircraft can land on airports at geographical locations of
countries”
airports
runwayscan land
have
is a located
airfield_chars
geoloc country
Semi-Automatic Generation of Semantic Model
• Find natural joins through key and foreign key between nodes.
• User-defined links can be added into the graph model.
• Designers need to specify link types and assign names to all the elements in the graph.
Example of SemanticModel Generation
AIRPORT: APORT_NM, GEOLOC_TYPE, GLC_CD, ELEV_FT, …;key: APORT_NM.
RUNWAY: APORT_NM, RUNWAY_NM, GLC_CD, RUNWAY_LENGTH_FT,RUNWAY_WIDTH_FT, …; key: RUNWAY_NM.
GEOLOC: GLC_CD, GLC_NM, CY_CD, LATITUDE, LONGITUDE, …;key: GLC_CD.
COUNTRY: CY_CD, CY_NM, …; key: CY_CD.Links:
AIRPORT--RUNWAY: APORT_NM;AIRPORT--GEOLOC: GLC_CD;RUNWAY--GEOLOC: GLC_CD;GEOLOC--COUNTRY: CY_CD;
Information Measure
Information measure of a node or link, aI(a) = - log P(a)
where P(a) is the probability of a being used in queries.
Assume nodes and links are independent, for a subgraph with a set of elements A={ai | i = 1, …, n}, information measure is additive:
n
I(A) = SUM I(ai)
i = 1
Information Measure(cont.)
Initial Information Measure:all the nodes = 1different nodes have a different value
Information measure is normalized and converted into counts
Probability of a node or a link is P(ai) = ci/c• Update Information measure• Ranking based on Information measure, thus
adapt to user feedback
Query Formulation
To formulate (simple) queries without knowledge of query language or database schema
Example:Find airports in Tunisia that can land a C-5 cargo plane
User input:Query aspect: AIRPORTS.APORT_NMConstraints: AIRCRAFT_AIRFIELD_CHARS.AC_TYPE_NAME = ‘C-5’
COUNTRY_STATE.CY_NM = ‘Tunisia’Links: CAN LAND
Formulated Query
SELECT R3.APORT_NMFROM AIRCRAFT_AIRFIELD_CHARS R0
AIRPORTS R3, COUNTRY_STATE R11GEOLOC R12, RUNWAYS R16
WHERE R0.AC_TYPE_NM = ‘C-5’AND R11.CY_NM = ‘Tunisia’AND R0.WT_MIN_AVG_LAND_DIST_FT <= R16.RUNWAY_LENGTH-FTAND R0.WT_MIN_RUNWAY_WIDTH_FT <= R16.RUNWAY_WIDTH_FTAND R11.GLC_CD = R3. GLC_CDAND R3.APORT_NM = R16.APORT_NMAND R11.CY_CD = R11.CY_CD
Query Completion as GraphSearch Problem
Given: An incomplete input query topic Ti
Find a set of links to complete the topic (to make Ti connected)
Minimum Missing Information principle:The query completion candidate Tc (the missing links and nodes) for an incomplete input topic Ti contains the minimum information
Query Formulation Algorithm
• Input: subgraph T of the semantic graph G– Find candidates with the minimum Information
measure• Two methods used to limit the search scope:
– L-step-bound paths: paths that connect two components with at most L links, to limit search within the neighborhood of the input subgraph
– k-minimum completion candidates: only at most k candidates with minimum Information measure are kept (alpha-beta pruning)
Initial Components and 2-Step-BoundPaths For the “CAN LAND” Query
airportsrepair(1)
2
aircrafts airportshave authorize
1 2(2)runways
can land
airports
country
geolocat is a
1 1
geolocat located
1 1
geolocis a located
1 1
airportshave
1(3)
(4)
(5)
(6)
(a) Initial components (b) 2-step-bound paths
airfield_chars
airports
runways
runways
runways
airfield_chars
airfield_chars
country
country
airports
The Semantic Graph For theTransportation Domain
airports
runwayscan land
Relation Node
at
have
is a located
2
1
1 1
1
weather
airfield_chars
geoloc country
• Incremental Query Formulation– To assist user reach a complex query
goal with a series of simple queries– The subsequent queries may depend on
results of preceding queries (derived relations)
• Issues– Incorporate derived relations into the
semantic graph– Suggest missing attributes to link
isolated derived nodes to the graph
Incremental Query Formulation
Incremental Query Examples
• Find airports in Tunisia.• Which of these airports can land a C-5?• What is the weather at these airports?
Incorporating Derived Relations
• Source relation: contributes attributes to the derived relations
• Derived relation: inherits properties of attributes from their source relations
• Deriving link: links to the source relations through inherited keys
• Inherited link: inherits links from the source relations
Extended semantic graph showing derived nodes, derived links and
inherited links
airports
runwayscan land
Relation Node
at
have
is a located
2
1
1 1
1
Derived Node
Derived Link
Inherited Link
airfield_chars
weather
geoloc country
airporttunisiacanland airporttunisiacanlandweather
airporttunisia
Suggesting KeyAttributes for a Query
• Find source relations for the isolated derived relation.
• Suggest key of the source relations as attributes to include.
Concept and AttributeSpecification Interface
Query Constraint Specification
Action Specification
English-Like Query Descriptionand the Formulated Query
Conclusions
• Semantic graph model provides a basis for query formulation search
• Ranking of query candidates by information measure in formulation provides adaptive behavior
• Incremental query formulation is effective for complex queries
• GUI and voice interface can be built for query formulation from high-level concepts