58
KEYWORD SEARCH OVER RELATIONAL TABLES AND STREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science and Technology Doklea Meci (A.M 2152) May 2012 University Of Crete Department Of Computer Science 1

K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

Embed Size (px)

Citation preview

Page 1: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

1

KEYWORD SEARCH OVER RELATIONAL TABLES AND STREAMS

ALEXANDER MARKOWETZ

University of Bonn

YIN YANG and DIMITRIS PAPADIAS

Hong Kong University of Science and Technology

Doklea Meci (A.M 2152)

May 2012

University Of Crete

Department Of Computer Science

Page 2: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

3

THE CHALLENGES OF ACCESSING STRUCTURED DATA Query languages:

Numerous complex SQL statements

Schemas: Complex, or nontrivial

schema

R-KWS queries: replaces numerous

complex SQL statements liberates users from

studying a database schema

allows querying for terms in unknown locations (tables/attributes)

Page 3: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

INTRODUCTION

KeyWord Search (KWS) each document/Web page constitutes one unit of information

a result if it contains a subset of the query’s keywords

has been applied to relational DBMS allows data retrieval without SQL

Relational-Keyword Search (R-KWS) the basic unit of information is a record/tuple queries cannot be answered by inspecting

records individually results have to be constructed by joining tuples

Page 4: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

5

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 5: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

RELATIONAL KEYWORD SEARCH ON TABLES Goal: methods for BG and OB processing

avoid the shortcomings of prior systems improve performance of R-KWS in conventional

databases

Page 6: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

7

GRAPH-BASED PROCESSING

Basic Idea: given an inverted index I (on disk), it traverses

an undirected data graph G (in memory), searching for MTJNT (Minimal Total Join Networks of Tuples ) results

JNT –Join Networks of Tuples (JNT), which are connected acyclic components of G

A JNT is called Minimal Total JNT (MTJNT) iff it is impossible to remove any node and find the remainder to be total

Page 7: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

8

GSEARCH ALGORITHM

Basic Idea: the algorithm enumerates all possible trees in G rooted at sn

Result: a tree that corresponds to an MTJNT

Page 8: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

9

GSEARCH ALGORITHM

GSearch maintains a queue Q of trees each constituting a fraction of a potential MTJNT

Every tree is de-queued and expanded by adding one new node , resulting in a new tree

The new tree falls into one of three categories: It forms an MTJNT, and is included in the result set It has the potential to become an MTJNT, and is

inserted in Q to be expanded later None of the previous and the tree can be safely

discarded The algorithm terminates when Q becomes

empty

Page 9: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

10

GSEARCH ALGORITHM

GSearch computes the set of MTJNT containing node sn and so GB answers an R-KWS query q correctly, completely, without duplicates.

Page 10: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

11

OPERATOR-BASED PROCESSING

Basic Idea: Query processing relies on Candidate Networks

(CN)

Candidate Networks (CN) are projections of MTJNT onto the expanded schema a tuple s of relation S maps to node S{K} EG(q), iff s

contains all keywords in K , but does not contain any other term in q\K

An MTJNT projects to a unique CN

Page 11: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

12

EXAMPLE

Page 12: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

13

EXAMPLE

Page 13: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

14

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 14: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

15

OPTIMIZATIONS FOR CONTINUOUS GB

Basic Idea: Keyword labeling a simple and effective method to summarize

reachable keywords for a given node.

Improves performance by avoiding unnecessary calls to GSearch and constraining graph traversals.

A keyword label (KL) of format , stored at node n, indicates a path of h edges in the data graph, connecting n to an occurrence of keyword .

Page 15: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

16

EXAMPLE s:[ ,2] corresponds to

the path connecting s to an occurrence of , via 2 edges

Page 16: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

17

BENEFITS OF A MIN-COMPLETE LABELING GSearch(G, q, s) is called if s node can reach all query

terms, only if the node stores a KL for every k ∈ q. In any other case, s is guaranteed not to participate

in an MTJNT.

KL-aware Gsearch Algorithm: Inserts into Q iff there exists a set NL of labels with

belows criteria:

The KL in NL can reach all missing keywords; that is, NL

Page 17: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

18

EXAMPLE - INTERMEDIATE TREES ABANDONED BY KL-AWARE GSEARCH. ( = 9)

lacking keyword new nodes can only be

added to node can reach in four

hops, the shortest path to

2-nd criteria not satisfied!while = 6; + 4 FAIL! 6+4

Page 18: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

19

PREDECESSOR-KL IMPLEMENTATION

Basic Idea: A predecessor-KL is a triplet of the form [k, h, p]

a path of length h, connecting n to an occurrence of keyword k

p is n’s predecessor

Every node n must contain a predecessor-KL [k, h, p] for the shortest path leading from n through p to the occurrence of k

An arriving tuple s can itself contain a keyword, or create new paths between keywords and nodes

require KL insertions and updates

each path contains at most edges

Page 19: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

20

PREDECESSOR-KL EXAMPLE

must keep bothKL [] , KL[,1, ] represent the shortest

path via predecessors and

both paths (to and ) share the same predecessor

suffices to keep KL [] through node

Page 20: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

21

TIME-KL

Basic Idea: More efficient labeling that does not require

explicit removal A time-KL is a triplet [k, h, ] indicating a

path of length h to an occurrence of keyword k, which exists until KL [k, h1, ] dominates another [k, h2, ] iff ( h1 h2 and )Result: the graph that contains all KL that are not

dominated by others

Page 21: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

22

TIME-KL EXAMPLE

1) is connected to in via 2 hops

2) is connected to in via 1 hop

3) is connected to in via 3 hops and node expires at 21

Result:

(1) and (2) must be stored as each indicates the shortest path for some period of time.

(3) is not recorded as it expires sooner than the other two

Page 22: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

23

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based Processing Operator-Based Processing

Optimizations For Continuous GB Predecessor-KL Time-KL

Optimizations For Continuous OB Operator Mesh Demand-Driven Operator Execution Partial-Mesh

Experimental Evaluation Snapshot R-KWS Queries over Tables Continuous R-KWS Querie0s over Streams Summary of Experimental Evaluation

Conclusion

Page 23: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

24

OPTIMIZATIONS FOR CONTINUOUS OB

Basic Idea: If a selection on a table (e.g., T{}) returns no

tuples, all operator trees using this input can be discarded immediately For data streams, this is not permissible

Even though the selection T{} does not currently produce tuples, it may do so in the future, and all operator trees must thus be maintained.

Solution: optimizations that enable efficient OB R-KWS

over data streams

Page 24: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

25

OPERATOR MESH (1/3)Basic Idea:

sharing common subexpressions all operator trees are integrated into an operator mesh, reducing

CPU cost (for evaluating joins) as well as memory overhead (for intermediate results).

The mesh has |SR|* clusters |SR| is the number of streaming relations |K| the number of query keywords

Each cluster contains the operator trees for all CN (Candidate Networks) discovered from a certain

The entire operator mesh has |SR|* leafs/sources, one for each node of the extended schema

Maximum depth of the mesh is +1 Number of edges depends on the schema complexity Different clusters are interconnected only through

their source operators Joins from different clusters do not connect directly

Page 25: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

26

OPERATOR MESH EXAMPLE

shows the shared execution of four operator trees

Page 26: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

27

OPERATOR MESH EXAMPLE

Algorithm: The first node in a cluster corresponds to the root

node , from which CNGen starts Whenever the algorithm generates a new tree

from (by adding a new child to a parent ), a join .op is added to the mesh

The left child of .op is .op (the operator that was inserted when was created)

The right child is the source of For each tree t in CNGen, a pointer is maintained to

the corresponding operator t.op, to decide where to place subsequent joins when t is expanded

The algorithm is initialized with t first .op pointing to the source of

Page 27: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

28

PROBLEMS WITH OPERATOR MESH APPROACH

Example: Assume tuples from S{} and T{} and

V{},U{, },V {, } are empty none of the joins , , or requires the output of

because they do not receive right input

Worst case:

’s results expire before the arrival of any tuples from V{},U{, } or V {, }

The join has wasted CPU and memory, without any contribution to the query

Page 28: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

29

DEMAND-DRIVEN OPERATOR EXECUTION (2/3) This mesh is maintained in main memory

throughout the lifespan of the query. A join is considered to be either

running - operators process input Sleeping – operators ignore input

A join operator is sent to sleep if: it has no input from the right child (a source), or all its parents are sleeping

Sending operators to sleep does not affect the result’s correctness or completeness because either: the operator cannot produce output, or its output would not be consumed

Page 29: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

30

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

Shows the state diagram for a join operator

Page 30: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

31

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

States are characterized by two binary flags: d indicating that at least one parent operator is running, and r specifying that the operator’s right input is not

empty. An operator only runs in the topmost state (d/r) Operators exchange messages regarding their

state, in order to ensure that all d and r flags are up-to-date.

When it leaves this state (transition 2 or 3) it goes to sleep (or halts), to wake up (or restart) later (transitions 9 and 10)

a join operator communicates changes (running/sleeping) to its left child that adjusts its d flag

Page 31: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

32

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

Assume U{, } stops producing output

Result: turns off its r flag,

goes to sleep (transition 2)

calls its left child decreases its counter of running parents no further actions

for as there are other running parents ,

Page 32: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

33

DEMAND-DRIVEN OPERATOR EXECUTION - EXAMPLE

If T{},V{, } dries up, too, then, goes to sleep

When operator decreases its counter (rParents=0)

Trasition 3

Page 33: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

34

EXAMPLE- CONSIDERING THAT THE ONLY RUNNING JOIN OPERATORS ARE AND

Join does not generate results, due to lack of left input

When T{} begins producing output, it causes to adjust its r flag, wake up (transition 9), and

call .Pstart operator restarts

and informs

Page 34: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

35

EXAMPLE - ALL JOINS RUN AGAIN EXCEPT AND

Note!!! this method is not restricted to keyword search; it can

equally benefit other data stream applications.

Page 35: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

36

PARTIAL-MESH (3/3)BASIC IDEA

A Partial-Mesh (PM) is built at runtime and breaks the distinction between

operator initialization Tuple processing

The method maintains relatively few active operators in memory

It is each operator’s responsibility to create its parents before it can produce output

It destroys its parents (and other operators up the tree) if it cannot supply them with input

In large meshes operators are idle Their absence does not affect result’s

completeness, but dramatically reduces memory consumption

Page 36: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

37

PARTIAL-MESH EXAMPLE

When the leftmost source S{} first produces output

It creates its direct parents and

when generates results, it creates its own parents

Page 37: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

38

PARTIAL-MESH EXAMPLE

when outputs a first tuple t and instantiates , this operator immediately probes t against T {}

Page 38: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

39

PARTIAL-MESH ALGORITHM

Basic Idea: TreeGen, is an algorithm for reconstructing a tree

I decideS which parents to create

The algorithm checks the join condition of .op If is the source joined with then is generated

by adding as the rightmost child of in

Page 39: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

40

PARTIAL-MESH EXAMPLES OF TREEGEN.

TreeGen(S{} )returns a tree that contains a single node S{}

parent is inserted in the mesh and connected to its left and right inputs

The call TreeGen() returns the tree

The expansion of reveals the parents of (e.g., , , )

Page 40: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

41

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based ProcessingOperator-Based Processing

Optimizations For Continuous GBPredecessor-KLTime-KL

Optimizations For Continuous OBOperator MeshDemand-Driven Operator ExecutionPartial-Mesh

Experimental EvaluationSnapshot R-KWS Queries over TablesContinuous R-KWS Queries over Streams

Conclusion

Page 41: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

42

SNAPSHOT R-KWS QUERIES OVER TABLES (1/3)

Comparing GB and OB implementation: Experiments are focused on tables

Part (0.2M entries), Supplier (10K), PartSupp (0.8M), Customer (150K), Orders (1.5M), and LineItem (6M)

Two tables can join if and only if there is a foreign-key to primary-key between them

The length of join sequences is restricted to , which ranges between 4 and 6.

Page 42: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

43

EXAMPLE

Page 43: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

44

EXAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7

QS 1, QS 2 : people’s or companies’ names (denoted as PeopleName), which appear in the columns Customer. Name, Supplier.Name, and Orders.Clerk; (retrieve connections between multiple people)QS 3 /QS 4:terms from the name of apart, for example, “ivory”, from the Part.Name attribute;

Page 44: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

45

EXAMPLE - SEVEN SETS OF R-KWS QUERIES QS 1 -QS 7

QS 5, QS 6 :years, which are present in LineItem.ShipDate, LineItem.CommitDate, LineItem.ReceiptDate, Orders.OrderDate; QS 7 :terms from Part.Brand, Part.Mfgr, Part.Size, and Part.Container

Page 45: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

46

EXAMPLE- PROCESSING TIME FOR QUERIES QS 1 -QS 7

The below picture depicts the total runtime ( y-axis) of GB and OB The result set cardinality |R| (below the x-

axis) for the seven query sets Report the median values after setting to 4,

5, and 6.

Page 46: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

47

SNAPSHOT R-KWS QUERIES OVER TABLES –CONCLUSION

(+) For conventional tables, GB is more

efficient than OB,. GB methods, GSearch avoids

duplicate results reduces the total cost GB is preferable for datasets with

frequent updates (-) Not efficient for queries involving

numerous keywords and/or a large value of T max

consumes a large amount of main memory to store the data graph

Conclusion:On servers dedicated for R-KWS queries, GB is the best choice due to its high performance

(+) OB utilizes the

functionality provided by a DBMS, and, thus, can answer R-KWS queries using much less memory than GB

Conclusion:On servers running multiple applications and only answering R-KWS queries infrequently, OB might be preferable due to its low memory footprint

GB OB

Page 47: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

48

CONTINUOUS R-KWS QUERIES OVER STREAMS(2/2)

Page 48: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

49

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 49: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

50

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 50: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

51

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 51: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

52

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 52: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

53

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 53: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

54

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 54: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

55

CONTINUOUS R-KWS QUERIES OVER STREAMS

Page 55: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

56

CONTINUOUS R-KWS QUERIES OVER STREAMS - CONCLUSION

FM is usually the most

CPU-efficient method for a single query

GB and PM are more economical in terms of memory consumption

FULL MESH (FM) Partial Mesh (PM)

Page 56: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

57

OUTLINE

Introduction Relational Keyword Search On Tables

Graph-Based ProcessingOperator-Based Processing

Optimizations For Continuous GBPredecessor-KLTime-KL

Optimizations For Continuous OBOperator MeshDemand-Driven Operator ExecutionPartial-Mesh

Experimental EvaluationSnapshot R-KWS Queries over TablesContinuous R-KWS Queries over Streams

Conclusion

Page 57: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

58

CONCLUSION – ADVANTAGES OF R-KWS

R-KWS handles broad query tasks whose complexity does not permit handcoded structured queries

Presents considerable algorithmic challenges because query processing has to explore a vast search space

Challenges are faced through a series of contributions

they provide R-KWS semantics that are well defined and easily extensible to streaming environments

develop GB and OB processing techniques that match these semantics and remedy problems encountered in previous systems

they adapt their framework to relational streams, and propose a wide range of optimizations

support their claims through an extensive set of experiments

Page 58: K EYWORD S EARCH OVER R ELATIONAL T ABLES AND S TREAMS ALEXANDER MARKOWETZ University of Bonn YIN YANG and DIMITRIS PAPADIAS Hong Kong University of Science

59

CONCLUSION – FUTURE WORK

They plan to further improve R-KWS performance by means of indexing

They intend to integrate ranking into continuous R-KWS query processing Example:

if there are a sudden burst of results, it may be desirable to report only the top-k answers for the affected period.