Upload
sadicsfaction-si-atat
View
223
Download
0
Embed Size (px)
Citation preview
8/8/2019 Thesis Sadri
1/108
University of California
Los Angeles
Optimization of Sequence Queries in Database
Systems
A dissertation submitted in partial satisfaction
of the requirements for the degree
Doctor of Philosophy in Computer Science
by
Mohammad Reza Sadri
2001
8/8/2019 Thesis Sadri
2/108
c Copyright by
Mohammad Reza Sadri
2001
8/8/2019 Thesis Sadri
3/108
The dissertation of Mohammad Reza Sadri is approved.
D. Stott Parker
Richard Muntz
Dominique Hanssens
Carlo Zaniolo, Committee Chair
University of California, Los Angeles
2001
ii
8/8/2019 Thesis Sadri
4/108
To the Memory of My Father, Who Inspired Me to Start This Path
and
To Katerina, Who Stood by Me to Finish It
iii
8/8/2019 Thesis Sadri
5/108
Table of Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 State of the Art . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Procedural Extensions . . . . . . . . . . . . . . . . . . . . 2
1.1.2 SEQUIN and SRQL . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Tangram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.1.4 SQL/LPP . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.5 Composite Events in Active Databases . . . . . . . . . . . 5
1.2 Optimization of Sequence Searches . . . . . . . . . . . . . . . . . 6
2 The SQL-TS Language . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Search Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1 Searching Simple Text Strings . . . . . . . . . . . . . . . . . . . . 13
3.2 General Predicates . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.3 Optimized Pattern Search . . . . . . . . . . . . . . . . . . . . . . 17
3.4 Implications between elements . . . . . . . . . . . . . . . . . . . . 19
4 Patterns with Stars and Disjunctions . . . . . . . . . . . . . . . . 26
4.1 Finding next and shift for the Star Case . . . . . . . . . . . . . . 29
4.2 Complexity of Calculating next and shift . . . . . . . . . . . . . 36
4.3 Disjunctive Patterns . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Calculating and . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
iv
8/8/2019 Thesis Sadri
6/108
5.1 GSW Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.1.1 Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.1.2 Implication . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2 Calculating and for Disjunctive Pattern Elements . . . . . . . 49
6 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 Improvements and Generalizations . . . . . . . . . . . . . . . . . 57
7.1 Mapping between Set Theoretic and Logical Domains . . . . . . . 57
7.2 Interval Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
7.2.1 One Dimensional Intervals . . . . . . . . . . . . . . . . . . 61
7.2.2 Multi-dimensional Intervals . . . . . . . . . . . . . . . . . 63
7.3 Multi Dimensional Patterns . . . . . . . . . . . . . . . . . . . . . 64
7.3.1 Patterns on Multi Dimensional Ordered Sets . . . . . . . . 65
7.3.2 Vector Time Series . . . . . . . . . . . . . . . . . . . . . . 66
7.3.3 Multi-dimensional SQL-TS Queries . . . . . . . . . . . . . 69
8 Syntax and Formal Semantics of SQL-TS . . . . . . . . . . . . . 74
8.1 SQL-TS Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
8.2 Formal Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
8.3 All Answers Semantics . . . . . . . . . . . . . . . . . . . . . . . . 78
8.4 Left Maximality Semantics . . . . . . . . . . . . . . . . . . . . . . 80
8.5 Aggregates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
9 Further Work & Conclusion . . . . . . . . . . . . . . . . . . . . . . 84
v
8/8/2019 Thesis Sadri
7/108
A SQL-TS Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
vi
8/8/2019 Thesis Sadri
8/108
List of Figures
2.1 Effects of SEQUENCE BY and CLUSTER BY on data . . . . . . . . . . . . 9
3.1 The meaning of next(j) . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Shifting the pattern k positions to the right . . . . . . . . . . . . . 21
3.3 Next and Shift definitions for OPS . . . . . . . . . . . . . . . . . . 22
3.4 Comparison between path curve of the naive search (top chart) and
OPS (bottom chart) . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.1 Directed weighted graph for determining the satisfiability of a set
of inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6.1 The relaxed double bottom pattern. . . . . . . . . . . . . . . . . 54
6.2 Doublebottoms found in the DJIA data are shown by boxes. The
bottom picture is zoomed for the area pointed by arrow in the top
picture and shows one of the matches. . . . . . . . . . . . . . . . 56
7.1 Table for the hight of the surface water . . . . . . . . . . . . . . . 65
7.2 Illustration of example 14 over a two-dimensional ordered set. . . . 67
7.3 Illustration of the two-dimensional vector series in example 15 . . . 69
7.4 Table for the trajectory of molecules . . . . . . . . . . . . . . . . . 70
7.5 Illustration of the two-dimensional vector series in example 16 . . . 72
vii
8/8/2019 Thesis Sadri
9/108
List of Tables
5.1 Logic matrix elements for disjunction of pattern elements. . . . . 50
7.1 Mapping between set theoretic logic relations among pattern ele-
ments and corresponding OPS speedups for scanning left to right. 59
7.2 Detailed mapping between set theoretic logic inclusions among pat-
tern elements and the logic matrices elements. . . . . . . . . . . . 60
viii
8/8/2019 Thesis Sadri
10/108
Acknowledgments
Writing this thesis marks the conclusion of a journey that I started many
years ago. Throughout this journey I enjoyed and benefited from the support
and companionship of many people.
First I would like to thank Carlo Zaniolo, my advisor, whose invaluable guid-
ance in both academic and personal issues provided me with much needed direc-
tion and support. I am as much indebted to him for his understanding of my
personal circumstances as I am in his intellectual contribution to my academic
growth. I also would like to thank Dominique Henssens, Dick Muntz, and Stott
Parker for accepting to be in my committee.
I am indebted to my dear friends Jafar (Iman) Adibi and Amir Masoud
Zarkesh for their help in prototyping this work and utilizing it in a real ap-
plication. Their support along with numerous discussions that cleared up manyambiguities was instrumental in the advancement of my work to this stage. I
also would like to thank my friend Bahram Fazeli for reviewing parts of this
manuscript and for sharing many happy moments at UCLA.
During the seven years that I was at UCLA, My family went through hard
times. Most difficult of all was the illness and loss of my father. In spite of
all that, their uninterrupted love and support has enabled me to reach my goal.
The memory of my last conversation with my father when he asked me to finish
my PhD is always fresh in my mind. My Mother endured unbelievable amount
of pressure in all this, and her love and prayers have given me the strength to
persevere.
My wife, Katerina, has been my main support during these years. Dealing
ix
8/8/2019 Thesis Sadri
11/108
with all the ups and downs of this journey every day and giving me her love and
support has kept me on my feet. I am also grateful to my brother, Mohsen, my
sister, Mojgan, their families, my younger brothers Majid and Saeid, my nephew
Nima, and my niece Dimitra, who bring joy to our life.
There are many that I am indebted to for their help: Henderson Yeung and
Shigako Seki In CSU Fresno who helped me through my first experience with
academic life in the US; my uncle Ahmad Tadayon and his family, Alipour family,
Peters Family, Medy Sanadidi, my supervisor and my good friend in NCR; Reza
Ahmadi,Hamid Jafarkhani, Rasool Jalili, Hadi Moradi, Reza Sadaghiani, Vali
Tadayon, Homayoun Yousefi-Zadeh, who shared their friendships with me; Alex
Razmjoo, Aktay Aydin, Frank Alaghband, Dan Cossack, and many other friends
and coworkers in Procom who showed great understanding and flexibility allowing
me to continue my PhD while working full time.
x
8/8/2019 Thesis Sadri
12/108
Vita
1967 Born, Eghlid, Iran.
1988 B.S., Electrical Engineering, Tehran University, Tehran, Iran.
1988-91 Software Engineer, Sanyo Electric, Tehran, Iran
1991-93 Teaching Assistant, Computer Science, California State Univer-
sity, Fresno, California
1993 M.S., Computer Science, California State University, Fresno,
California
1993-95 Research Assistant, Medical Imaging Department, UCLA, Los
Angeles, California
1995-96 Intern, AT&T GIS (Now NCR), El Segundo, California
1996-2000 Principal Software Engineer, Procom Technology, Irvine, Cali-
fornia
2000-Present Chief Technologist, Procom Technology, Irvine, California
Publications
Reza Sadri, Carlo Zaniolo, Amir M. Zarkesh, Jafar Adibi, Optimization of Se-
quence Queries in Database Systems, Accepted for publication in PODS 2001.
xi
8/8/2019 Thesis Sadri
13/108
Amir M. Zarkesh, Jafar Adibi, Cyrus Shahabi, Reza Sadri, Vishal Shah, Analysis
and Design of Server Informative WWW-Sites, in CIKM 1997 pages 254-261.
Bruce K.T. Ho, Zheng Chen, Ramesh Panwar, Reza Sadri, Pongskorn Saipetch,
Marco Ma, Medical Imaging Supercomputing in a PACS infrastructure, in Pro-
ceedings of SPIE, PACS Design and Evaluation 2435: 536-546, 1995.
Bruce K.T.Ho, Reza Sadri, Woodrew Chao, Lu Huang, Ricky Taira, Henry Shih,
Data clustering and other archive retrieval strategies for teleradiology and picture
archiving communication systems, Journal of Digital Imaging, 8(4): 180-190,
1995.
Woodrew Chao, Bruce K. T. Ho, John T. Chao. Reza Sadri, Lu Huang, Ricky
Taira, Implementation of System Intelligence in a 3-tier Tele-Medicine PACS
Hierarchical Storage Management System, Proceedings of SPIE, PACS Designand Evaluation 2435: 423-429, 1995.
Bruce Ho, Woodrew Chao, Johnny Chao, Reza Sadri, Richard Steckel, Hooshang
Kangarloo, A Multi-Client Global Teleradiology System Proceedings of SPIE,
PACS Design and Evaluation 2435: 479-490, 1995.
Carlo Zaniolo, Reza Sadri, A Simple Model for Active Rules and their Behavior
in Deductive Databases, in Workshop on Deductive Databases and Logic Pro-
gramming, pages 13-25, 1994.
xii
8/8/2019 Thesis Sadri
14/108
Abstract of the Dissertation
Optimization of Sequence Queries in Database
Systems
by
Mohammad Reza Sadri
Doctor of Philosophy in Computer Science
University of California, Los Angeles, 2001
Professor Carlo Zaniolo, Chair
The need to search for complex and recurring patterns in database sequences is
shared by many applications. In this work, we discuss how to express and support
efficiently sophisticated sequential pattern queries in relational database systems.
Thus, we first introduce SQL-TS, an extension of SQL, to express these patterns,
and then we study how to optimize search queries for this language. We take
the optimal text search algorithm of Knuth, Morris and Pratt, and generalize
it to handle complex queries on sequences. Our algorithm exploits the inter-
dependencies between the elements of a sequential pattern to minimize repeated
passes over the same data. We then present extensions of the algorithm for
detecting repeated patterns and disjunctive patterns. We also provide methods
for finding the inter-dependencies between the pattern elements for important
domains including intervals and vector time-series. In addition, a logic based
semantics for SQL-TS is given. Experimental results on typical sequence queries,
such as double bottom queries, confirm that substantial speedups are achieved
by our new optimization techniques.
xiii
8/8/2019 Thesis Sadri
15/108
CHAPTER 1
Introduction
Many applications require processing and analyzing sequential data. Examples
include the analysis of stock market prices [3], meteorological events [13], and the
identification of patterns of purchases by customers over time [1, 12]. These ap-
plications focus on finding patterns and trends in sequential data. The patterns
of interest in actual applications range from very simple ones, such as finding
three consecutive sunny days, to the more complex patterns used in datamining
applications [1, 4, 8]. These applications have motivated researchers to extend
database query languages with the ability of searching for and manipulating se-
quential patterns.
The time-series datablades [8] introduced by Informix provide a library of
functions that can be called from an SQL query, and most commercial DBMSs
support similar extensions. But datablades lack in expressive power, flexibility
and integration with DB query languages; thus, DB researchers have been seeking
time-series tools that are more powerful, more flexible, and more integrated with
DB query languages. In particular, the PREDATOR system proposed an SQL
extension called SEQUIN [22, 23, 21] for querying sequences. Then, SRQL [19]
extended the relational algebra with sequence operators for sorted relations, and
added constructs for querying sequences to SQL.
In this thesis, we view sorted relations as sequences as in SRQL, but propose
a new and more powerful SQL-like language for pattern searching, and advanced
1
8/8/2019 Thesis Sadri
16/108
techniques for optimizing queries in such a language.
1.1 State of the Art
SQL is not a good candidate for expressing sequential queries since it is based on
a set oriented model with a limited expressive power. There has been much effort
for developing database systems that can handle sequential data more efficiently
than SQL. Previous research efforts are discussed below.
1.1.1 Procedural Extensions
Informix time-series datablade [8] is a set of functions written for time-series ap-
plications, which can be called from SQL queries. These functions are mostly
statistical functions particularly useful for business applications. Therefore, In-
formix time-series datablade is meant for a known set of business applications
and problems. Other commercial systems support similar extensions: for in-
stance Red Brick system RISQL [26] is similar in nature to Informix datablades.
It provides a set of functions that can be used in time-series analysis, and suf-
fers from the same problems as Informix time-series datablades: It has weak
expressive power and it is useful only for specific applications. Oracle cartridges
and IBM DB2 universal database extenders are other examples of procedural
extensions to relational databases.
Besides limited expressive power, the datablades approach also suffers from
problems with respect to optimization. Indeed, while the individual functions
that exist in these systems are highly optimized for their specific tasks, there is
no optimization between these functions and the rest of the query.
2
8/8/2019 Thesis Sadri
17/108
1.1.2 SEQUIN and SRQL
SEQ and PREDATOR database systems [23, 21] use different data models and
different database languages for different EADTs (Enhanced Abstract Data Type).
It uses SQL for querying sets and SEQUIN [22, 23, 21] for querying sequences.
Query blocks from different languages can be nested inside each other with the
help of directives that convert data between these data types. SEQUIN has its
own algebra that makes the optimization of sequence queries possible. There are
two major drawbacks to this approach. One is that it does not support optimiza-tion between blocks of sequence and set queries, and the other is that it lacks the
expressive power needed for many important applications.
SEQ provides three different techniques for query optimization:
Using meta-information about the input sequence (range of sequence, den-
sity of sequence, distribution of values and empty positions). This infor-
mation is used to calculate the range of the output sequence, and theneliminate parts of input sequences that cant affect the output.
Heuristic query transformations like pushing the selections inside joins.
These can be used to create different query plans.
Using cost-based decisions to choose between different plans.
While Some of SEQ methods could also be used in the language proposed in
this thesis, we will provide a more general and extensible approach.
SRQL [19] introduces a solution that augments relational algebra with a se-
quential model using sorted relations. This approach is better than SEQ since
sequences are expressed in the same framework as sets, thus enabling more effi-
cient optimization of queries that involve both. SRQL also extends SQL with a
3
8/8/2019 Thesis Sadri
18/108
few constructs for querying sequences.
An overview of the optimization techniques used in SRQL, including well-
known heuristics of pushing selections inside the joins, is given in [19]
1.1.3 Tangram
Tangram is a stream processing system from UCLA [17]. The basic constructs in
Tangram are transducers. Transducers are operators that transform input data
streams to output data streams. Transducers can be combined to make more
complex operators for answering ad-hoc queries on streams. Tangram is a Prolog
based system, implemented on a functional language called Log(F) that can be
translated to Prolog. The major issue remaining in terms of databases is the
impedance mismatch between Prolog that returns the results one by one and
RDBMSs that return all the results at once as a set.
1.1.4 SQL/LPP
Another system that adds time-series extensions to SQL is discussed in [18].
It models time-series as attributed queues (queues augmented with attributes
that are used to hold aggregate values and are updated upon modifications to
the queue). Segments of time series data are ordered and there are directives
for the search engines that specify the next segment that should be checked
against the query. For searching desired time-series, the criteria for the segmentsthat qualify are first specified and then they are used in the main SQL query.
The language for specifying the desired patterns is somehow decoupled from the
main query; this is similar to the separation between the sequential part and
relational part in SEQ. This separation also introduces problems similar to those
of SEQ ( such as, complexity and added difficulty to learn and use, and lack of
4
8/8/2019 Thesis Sadri
19/108
optimization between sequential and relational parts). Also, SQL/LPP doesnt
report recursive patterns and this limits the expressive power. It also has a
limited number of basic aggregate functions. It is possible to build more complex
aggregate functions using the basic aggregate functions but new functions can
not be introduced from scratch. Also SQL/LPP can not be applied directly to
existing tables because it requires a specific structure in the underlying table.
SQL/LPP optimization uses pattern length analysis to prune the search space
and to deduce properties of composite patterns based on properties of the simple
patterns. These properties are used at runtime to reduce the search time.
1.1.5 Composite Events in Active Databases
There has been some work in active databases for implementing complex events.
Examples are ODE [6], SAMOS [5], and TREPL [15]. These systems are for
tracking complex events. Here we use similar ideas to detect complex patterns in
sequential data. Our methods has the following similarities to these languages:
They both utilize a good marriage of regular expressions and database query
languages.
They both have good support for temporal aggregations.
They have strong formal bases. ODE uses finite state machines, SAMOS
uses petri-nets and TREPL and our extensions use logic based semantics.
Many of language constructs of TREPL are relevant to the design of SQL-TS;
however the implementation and optimization techniques are very different since
event-based languages for active databases work on the live real time data and
need to generate the answers as soon as the corresponding events happen.
5
8/8/2019 Thesis Sadri
20/108
1.2 Optimization of Sequence Searches
Current database systems that support some sort of sequential queries follow one
of the following approaches for optimization:
A fixed set of optimized functions that work on sequences. This approach
is taken in ADT based systems. This functions dont fit in the traditional
model of relational databases.
Modeling some time-series queries based on an extended relational algebra
and applying traditional optimization approaches (like pushing joins into
the selections). SEQ and SRQL use this approach.
In this thesis we explore a new approach based on the observation that find-
ing sequential patterns in databases is somehow similar to finding a phrase in a
text. However, instead of searching for a sequence of letters usually from a finite
alphabet, we search for a sequence of tuples with rich structure and with infinite
possibilities. Since our approach is motivated by the text searching problem, we
give a short overview of some of existing text searching algorithms, their perfor-
mance, and their limitations. These algorithms are used to find the occurrences
of a given pattern string pattern within a given text string. The brute force case
for searching the pattern has a time complexity of O(mn). The Karp-Rabin [9]
algorithm has a worst time complexity of O(nm) and an expected running time
ofO(n + m). It works based on creating the hash values for possible substrings of
size m and its efficiency depends to the alphabet size. The Boyer-Moore pattern
matcher [14] (also found independently by Gosper) works best where the pattern
is long and the alphabet is large. It has been used in some versions of Unix grep
string searching function. Its worst case performance is O(nm), and its best case
6
8/8/2019 Thesis Sadri
21/108
performance is O(n/m). An undesirable characteristics of these algorithms is
that they assume a finite alphabet size and their performance is dependent on
the size of the alphabet. The Knuth-Morris-Pratt (KMP) algorithm [11] creates a
prefix function from the pattern that allows the building of a transition function
to expedite the searching. The prefix function is built in O(m) time, and the
algorithm has a worst case time complexity of O(n + m), independent from the
alphabet size. Performance comparison by Wright, Cumberland, and Fang [29]
shows that in general KMP has the best performance. This along with the fact
that KMP is independent of the alphabet size makes it the best candidate for
generalization to handle sequential database pattern searches.
7
8/8/2019 Thesis Sadri
22/108
CHAPTER 2
The SQL-TS Language
Our Simple Query Language for Time Series (SQL-TS) adds to SQL simple con-
structs for specifying complex sequential patterns. For instance, say that we have
the following table of closing prices for stocks:
CREATE TABLE quote ( name Varchar(8),
date Date,
price Integer )
Now, to find stocks that went up by 15% or more one day, and then down by20% or more the next day, we can write the SQL-TS query of Example 1:
Example 1 Using the FROM clause to define patterns
SELECT X.name, Z.price - X.price, Z.date
FROM quote
CLUSTER BY name
SEQUENCE BY date
AS (X, Y, Z)
WHERE Y.price > 1.15 * X.price
AND Z.price < 0.80 * Y.price
8
8/8/2019 Thesis Sadri
23/108
name price date
... ... ...
INTC $60 1/25/99
INTC $63.5 1/26/99
INTC $62 1/27/99
... ... ...
IBM $81 1/25/99
IBM $80.50 1/26/99
IBM $84 1/27/99
... ... ...
Figure 2.1: Effects of SEQUENCE BY and CLUSTER BY on data
Thus, SQL-TS is basically identical to SQL, but for the following additions to
the FROM clause:
A CLUSTER BY clause specifying that data for each stock is processed sep-
arately (i.e., as it were a separate stream.)
A SEQUENCE BY clause specifying that the data must be traversed by as-
cending date. Figure 2.1 shows how SEQUENCE BY and CLUSTER BY affect
the input. Rows are grouped by their CLUSTER BY attribute(s) (not nec-
essarily ordered), and data in each group are sorted by their SEQUENCE BY
attributes(s). This is similar to SRQL, where we have GROUP BY and
SEQUENCE BY attributes [19].
The AS clause, which in SQL is mostly used to assign aliases to the table
names, is here used to specify a sequence of tuple variables from the specified
table. By (X, Y, Z) we mean three tuples that immediately follow each
9
8/8/2019 Thesis Sadri
24/108
other. Tuple variables from this sequence can be used in the WHERE clause
to specify the conditions and in the SELECT clause to specify the output.
Expressing the same query using SQL would require three joins and would be
more complex, less intuitive, and much harder to optimize.
A key feature of SQL-TS is its ability to express recurring patterns by using
a star operator. Take the following example:
Example 2 Find the maximal periods in which the price of a stock fell more
than 50%, and return the stock name and these periods
SELECT X.name, X.date AS start_date,
Z.previous.date AS end_date
FROM quote
CLUSTER BY name
SEQUENCE BY dateAS (X, *Y, Z)
WHERE Y.price < Y.previous.price
AND Z.previous.price < 0.5 * X.price
In SQL-TS, each tuple is viewed as containing two additional fields that refer
to the previous and the next tuple in the sequence within the same cluster.
Thus, for instance Z.previous (X.next) delivers the last tuple (the first tuple) in
the Y sequence, and Z.previous.date is the date of this last tuple (the SQL3
syntax Z.previous date is also supported). Here the star construct Y is used
to specify a sequence of one or more Ys of increasing price, as per the condi-
tion the condition Y.price < Y.previous.price. In general, a star denotes a
10
8/8/2019 Thesis Sadri
25/108
sequence of one or more (not zero or more!) tuples that satisfy all applicable con-
ditions in the where clause. Thus, Z here is the first tuple where the price of the
stock is no longer smaller than the previous one. Constructs similar to the star
have been proposed previously in several query languages, and their semantics, is
easily formalized using recursive Datalog programs [15]. Also observe that a left
maximality condition in implicit in the SQL-TS semantics, meaning that when
two overlapping sequences satisfy the query, we return only the one that starts
first.
11
8/8/2019 Thesis Sadri
26/108
CHAPTER 3
Search Optimization
Since SQL-TS is a superset of SQL, all the well-known techniques for query
optimization remain available, but in addition to those we find new query op-
timization opportunities using techniques akin to those used for text searching.
For instance, take the following example:
Example 3 Find companies whose closing stock price in three consecutive days
was 10, 11, and 15.
SELECT X.name
FROM quote
CLUSTER BY name
SEQUENCE BY date
AS (X, Y, Z)
WHERE X.price =10 AND Y.price=11
AND Z.price=15
The text searching algorithms by Knuth, Morris and Pratt (KMP), discussed
below, provides a solution of proven optimality for this query [11, 29]. Unfor-
tunately, the KMP algorithm is only applicable when the qualifications in the
query are equalities with constants as those of Example 3
12
8/8/2019 Thesis Sadri
27/108
Therefore, in this paper, we extend the KMP algorithm to handle the condi-
tions that are found in general queriesin particular inequalities between terms
involving variables such as those in the next example.
Example 4 For IBM stock prices, find all instances where the pattern of two
drops followed by two increases, and the drops take the price to a value between
40 and 50, and the first increase doesnt move the price beyond 52.
SELECT X.date AS start_date, X.price
U.date AS end_date, U.price
FROM quote
CLUSTER BY name
SEQUENCE BY date
AS (X, Y, Z, T, U)
WHERE X.name=IBM
AND Y.price < X.priceAND Z.price < Y.price
AND 40 < Z.price < 50
AND Z.price < T.price
AND T.price < 52
AND T.price < U.price
3.1 Searching Simple Text Strings
The KMP algorithm takes a sequence pattern of length m, P = p1 . . . pm, and
a text sequence of length n, T = t1 . . . tn, and finds all occurrences of P in
T. Using an example from [11], let abcabcacab be our search pattern, and
babcbabcabcaabcabcabcacabc be our text sequence. The algorithm starts from
13
8/8/2019 Thesis Sadri
28/108
the left and compares successive characters until the first mismatch occurs. At
each step, the ith element in the text is being compared with the jth element in
the pattern (i.e., ti is compared with pj). We keep increasing i and j until a
mismatch occurs.
j,i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ti a b c b a b c a b c a a b c a b c
pj a b c a b c a c a b
For the example at hand, the arrow denotes the point where the first mismatch
occurs. At this point, a naive algorithm would reset j to 1 and i to 2, and restart
the search by comparing p1 to t2, and then proceed with the next input charac-
ter. But instead, the KMP algorithm avoids backtracking by using the knowledge
acquired from the fact that the first three characters in the text have been suc-
cessfully matched with those in the pattern. Indeed, since p1 = p2, p1 = p3, andp1p2p3 = t1t2t3 we can conclude that t2 and t3 cant be equal to p1, and we can
thus jump to t4. Then, the KMP algorithm resumes by comparing p1 with t4;
since the comparison fails, we increment i and compare t5 with p1:
i 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
ti a b c b a b c a b c a a b c a b c
j 1 2 3 4 5 6 7 8 9 10
pj a b c a b c a c a b
Now we have the mismatch where j = 8 and i = 12. Here since we know that
p1 . . . p4 = p4 . . . p7 and p4 . . . p7 = t8 . . . t11, p1 = p2, and p1 = p3, we conclude
14
8/8/2019 Thesis Sadri
29/108
1 j
1 next[j]
pattern
pattern
1 j
1 next[j]
pattern
pattern
Figure 3.1: The meaning of next(j)
that we can move pj four characters to the right, and resume by comparing p5 to
t12. Therefore, by exploiting the relationship between elements of the pattern, we
can continue our search without without moving back in the text (i.e., without
changing the value of i). As shown in [11], the KMP algorithm never requires
backtracking on the text. Moreover, the index on the pattern can be reset to a
new value next(j), where next(j) only depends on the current value, and is inde-
pendent from the text. For a pattern of size m, next(j) can be stored on an array
of size m. (Thus this array can be computed once as part the query compilation,
and then used repeatedly to search the database, and its time-varying content.)
The array next(j) can be defined as follows:
1. Find all integers k, 0 < k < j, for which pk = pj and such that for every
positive integer s < k, ps = pjk+s (i.e.,
p1 = pjk+1 . . . pk1 = pj1).
2. If no such k exists, then next(j) = 0 else next(j) is the least of these ks.
This definition is clarified by Figure 3.1. The upper line shows the pattern,
and the lower line shows the pattern shifted by k; the thick segments show where
the two are identical. When no shift exists for which the shifted pattern can
match the original one, we have next(j) = 0, and the pattern is shifted to the
right till its first element is at position j + 1, i.e., one after the current position
in the text. In the KMP algorithm, this is the only situation in which the search
15
8/8/2019 Thesis Sadri
30/108
pattern is advanced following a failure. (Of course, the search pattern is always
advanced after success.)
The KMP Algorithm:
j = 1; i = 1;
while j m i n do {
while j > 0 ti = pj do
j = next[j];i = i + 1; j = j + 1; }
if i > n thenfailure
else success;
The KMP algorithm is shown above. An efficient algorithm for computing the
array next is given in [11]. The complexity of the complete algorithm, including
both the calculation of the next for the pattern and the search of pattern over
text, is O(m + n), where m is the size of the pattern and n is the size of the text
[11]. When success occurs, the input text tim+1 . . . ti matches the pattern.
3.2 General Predicates
The original KMP algorithm can be used to optimize simple queries, such as that
of Example 3, in which conditions in the WHERE clause are equality predicates and
t is a tuple variable:
p1(t) = (t.price = 10)
p2(t) = (t.price = 11)
p3(t) = (t.price = 15)
16
8/8/2019 Thesis Sadri
31/108
However, for the powerful sequence queries of SQL-TS we need to support:
1. General Predicates: In particular we need to support systems of equali-
ties and inequalities such as those of Example 4 where we have the following
predicates:
p1(t) = (t.price < t.previous.price)
p2(t) = (t.price < t.previous.price)
(40 < t.price < 50)
p3(t) = (t.price > t.previous.price)
(52 < t.price)
p4(t) = (t.price > t.previous.price)
2. Recursive pattern expressions: The KMP algorithm assumes that the
pattern consists of a fixed number of elements. To support queries such
as that of Example 2, we need to optimize searches involving recurring
patterns expressed by the star.
3. More general objects: In modern database systems we store many dif-
ferent types of objects, such as images, text, and XML objects, along with
user-defined methods and predicates on these objects.
3.3 Optimized Pattern Search
In this section we provide the Optimized Pattern Search (OPS) algorithm which
is an extension the KMP algorithm. The OPS algorithm is directly applicable
to the optimization of SQL-TS queries, since it handles the much more general
17
8/8/2019 Thesis Sadri
32/108
conditions that occur in time series applications, including repeating patterns
that can be expressed by the star construct.
Say that we are searching the input stream for a sequential pattern, and
a mismatch occurs at the j-th position of the pattern. Then, we can use the
following two sources of information to optimize our next steps in the search:
Conditions for elements 1 through j 1 in the search pattern were satisfied
by the corresponding items in the input sequence, and
The condition for the jth element in the search pattern was not satisfied by
its corresponding input element.
Therefore, much as in the KMP algorithm, we can capture the logical rela-
tionship between the elements of the pattern, and then infer which shifts in the
pattern can possibly succeed; also, for a given shift, we can decide which condi-
tions need not be checked (since their validity can be inferred from the two kinds
of information described above).
Therefore, we assume that the pattern has been satisfied for all positions
before j and failed at position j, and we want to compute the following two
items,
shift(j): this determines how far the pattern should be advanced in the
input, and
next(j): this determines from which element in the pattern the checking of
conditions should be resumed after the shift.
Observe that the KMP algorithm only used the next(j) information. Indeed,
for KMP, the search pattern was never shifted in the text (except for the case
18
8/8/2019 Thesis Sadri
33/108
where next(j) = 0 and the pattern was shifted by j). The richer set of possibilities
that can occur in OPS demand the use of explicit shift(j) information. Further-
more, the computation for next and shift is now significantly more complex and
requires the derivation of several three-valued logic matrices.
3.4 Implications between elements
The OPS algorithm begins by capturing all the logical relations among pairs of
the pattern elements using a positive precondition logic matrix , and a negative
precondition logic matrix . These matrices are of size m, where m is the length
of the search pattern. The ij and ij elements of these matrices are only defined
for i j; thus we have lower-triangular matrices of size m. We define:
jk =
1 if pj pk pj F
0 if pj pk
U else
jk =
1 if pj pk
0 if pj pk pj T
U else
We have added the terms pj F in definition of, and pj T in definition of,
to make sure that the left side of the implication relationships are not equivalent
to false, because in that case the value of the corresponding element in the matrixcould be both 0 and 1. By excluding those cases, we have removed the ambiguity.
Logic matrices and contain all the possible pairwise logical relations between
pattern elements. For instance, for Example 4 we have:
Example 5 Computing the matrices and for Example 4
19
8/8/2019 Thesis Sadri
34/108
p2 p1 therefore 21 = 1
p3 p1 therefore 31 = 0
p3 p2 therefore 32 = 0
p4 p2 therefore 42 = 0
p4 p1 therefore 41 = 0
p4 p3 therefore 43 = 0
Therefore we have
=
11 1
0 0 1
0 0 U 1
=
0
U 0
U U 0
U U 0 0
From matrices and , we can now derive another triangular matrix S that
describes the logical relationships between whole patterns. The Sjk entries in the
matrix, which are only defined for j > k, are computed as follows:
Sjk = k+1,1 k+2,2 j1,jk1 j,jk
Thus, say that the pattern was satisfied up to, and excluding, element j;then, Sjk = 0 means that the pattern cannot be satisfied if shifted k positions.
Moreover, Sjk = 1 (Sjk = U) means that the pattern is certainly (possibly)
satisfied after a shift of k. Figure 3.2 illustrates the situation. In calculating
matrix S, we use standard 3-valued logic, where U = U, U 1 = U, and
U 0 = 0. For the example at hand we have:
20
8/8/2019 Thesis Sadri
35/108
k
1 k + 1 j
1
j -1
j -kj -k -1
k
k
1 k + 1 j
1
j -1
j -kj -k -1
k
Figure 3.2: Shifting the pattern k positions to the right
Example 6 Computing the matrix S for Example 5
S2,1 = 2,1 = U
S3,1 = 2,1 3,2 = 1 U = U
S3,2 = 3,1 = U
S4,1 = 2,1 3,2 4,3 = 0
S4,2 = 3,1 4,2 = 0S4,3 = 4,1 = U
S =
U
U U
0 0 U
We can now compute shift(j) that is the least shift to the right for which
the overlapping sub-patterns not to contradict each other (Figure 3.3). Thus,
shift(j) is the column number for the leftmost non-zero entry in row j of S.
When all these entries are equal to zero, then a failure will occur for any shift up
21
8/8/2019 Thesis Sadri
36/108
Shifted Pattern
i j + 1
1
1
ij + shift(j) + 1 i -j + shift(j) + next(j)
shift(j) + 1 shift(j) + next(j)
i
j
next(j) j -shift(j)
Input
Pattern
shift(j)
Shifted Pattern
i j + 1
1
1
ij + shift(j) + 1 i -j + shift(j) + next(j)
shift(j) + 1 shift(j) + next(j)
i
j
next(j) j -shift(j)
Input
Pattern
shift(j)
Figure 3.3: Next and Shift definitions for OPS
to j. In this case, we set shift(j) = j; thus, the pattern is shifted to the right
till its first position coincides with the current position of the cursor in the text.
More formally:
shift(j) =
j if k < j, jk = 0
min({k | Sjk = 0}) otherwise
Thus, shift(j) tells us how much the pattern can be shifted to the right
before we have to start testing the input. We can now compute next(j) which
denotes the element in the pattern from which checking against the input should
be resumed. The are basically three case. The first case is when shift(j) = j, and
thus the first element in the pattern must be checked next against the current
element in the input. The second case is when shift(j) < j and Sj,shift(j) =
1; In this case we only need to begin our checking from the element in the
pattern that is aligned with the first input element after current input position
thus, next(j) = j shift(j) + 1. The third case occurs, when neither of the
previous cases hold; then the first pattern element should be applied to the inputelement i j + shift(j) + 1; but ifshift(j)+1,1 = 1, then the comparison becomes
unnecessary (and similar conditions might hold for the elements that follow).
Thus, we set next(j) to the leftmost element in the pattern that must be tested
against the input. Figure 3.3 shows how this works. Now we can formally define
next as follows:
22
8/8/2019 Thesis Sadri
37/108
1. if shift(j) = j then next(j) = 0, else
2. if Sj,shift(j) = 1 then
next(j) = j shift(j) + 1, else
3. if neither condition is true, then
next(j) = min(
{t | 1 t < j shift(j) shift(j)+t,t = 1}
{j shift(j)|j,jshift(j) = 1})
For the example at hand we have:
Example 7 Calculate the arrays next and shift for Example 5
shift(1) = 1
shift(2) = 1 since S21 = 0
shift(3) = 1 since S31 = 0
shift(4) = 3 since S41 = 0 S42 = 0
S43 = 0
next(1) = 0 since shift(1) = 1
next(2) = 1 since 12 = 1
next(3) = 2 since 21 = 1 23 = 1
next(4) = 1 since 41 = 1
The calculation of arrays next and shift is done as part of the query compi-
lation. This is discussed in Section 5.
The main algorithm. We can use the values stored in arrays next and shift
to optimize the pattern search at run time. Consider a predicate pattern P =
23
8/8/2019 Thesis Sadri
38/108
p1p2 . . . pm. Now, pj(ti) is equal to one, when the i-th element in sequence T
satisfies a pattern element pj ; otherwise, it is zero.
The OPS Algorithm
j = 1; i = 1;
while j m i n do {
while j > 0 pj (ti) do {
i = i j + shift(j) + next(j);
j = next(j); }
i = i + 1; j = j + 1; }
if i > n then failure
else success;
Here, as in the KMP algorithm, success denotes that tim+1 . . . ti satisfies the
the pattern. However, we see the following generalizations with respect to KMP:
The equality predicate ti = pj is replaced by pj (ti) that tests if pj holds for
the i-th element in the input( i.e., the j-th tuple of the sorted cluster).
When there is a mismatch, we modify both j and i, which, respectively,
index the input and the pattern. The new value for j is next(j) and the
new value for i is i j + shift(j) + next(j).
For instance, we used the pattern in the query of Example 4 to search the following
sequence:
5 5 5 0 4 5 5 7 5 4 5 0 4 7 4 9 4 5 4 2 5 5 5 7 5 9 6 0 5 7.
Figure 3.4 shows the values of j and i as the algorithm progresses for the naive
approach versus the OPS approach. Clearly, the backtracking episodes are less
24
8/8/2019 Thesis Sadri
39/108
2 4 6 8 10 12 141
1.5
2
2.5
3
3.5
4
Naive Search Path
i
j
2 4 6 8 10 12 141
1.5
2
2.5
3
3.5
4
OPS Search Path
i
j
Figure 3.4: Comparison between path curve of the naive search (top chart) andOPS (bottom chart)
frequent and less deep, and therefore the length of the search path is significantly
shorter for the OPS algorithm.
25
8/8/2019 Thesis Sadri
40/108
CHAPTER 4
Patterns with Stars and Disjunctions
An important advantage of the OPS algorithm is that it can be easily generalized
to handle recurrent input patterns which, in SQL-TS, are expressed using the star.
For example if pj is
ti.price < ti1.price
then pj matches sequences of records with decreasing prices.
The calculation of logic matrices and remains unchanged in the presence
of star patterns; thus, the formulas given in Section 3.4 will still be used. How-
ever, the calculation of the arrays next and shift must be generalization for star
patterns as described next.
At the runtime we maintain an array of counters (one per pattern element)
to keep track of the cumulative number of input objects that have matched the
pattern sequence so far. Take the following SQL-TS example:
Example 8 Find patterns consisting of a period of rising prices, followed by a
period of falling prices, followed another period of rising prices.
SELECT X.name, FIRST(X).date AS sdate,
LAST(Z).date AS edate
FROM quote
26
8/8/2019 Thesis Sadri
41/108
CLUSTER BY name
SEQUENCE BY date
AS ( *X, *Y, *Z)
WHERE X.price > X.previous.price
AND Y.price < Y.previous.price
AND Z.price > Z.previous.price
Therefore our three star predicates that must be satisfied on their input tuples:
p1(X) = (X.price > X.previous.price)
p2(Y) = (Y.price < Y.previous.price)
p3(Z) = (Z.price > Z.previous.price)
These will be called starpredicates because they are prefixed with a star in the
from clause of the query, which searches for the pattern: p1(X), p2(Y), p3(Z).
Assume our input data for t.price is:
2 0 2 1 2 3 2 4 2 2 2 0 1 8 1 5 1 4 1 8 2 1
Let us represent the counter for the j-th element of the pattern by count(j).
After matching the pattern with the text we have:
count(1) = 4
count(2) = 9 since 5 elements satisfy p2
count(3) = 11 since 2 elements satisfy p3
We update and use these counters at the run time. These are the modifications
that we need to support the star. When the current input element satisfies the
pattern then move to the next input, and
27
8/8/2019 Thesis Sadri
42/108
1. if the current pattern element is not a star element then move to the next
one, otherwise
2. update the current count.
When the current input element does not satisfy the current pattern element
then:
1. It this is a star element, whose predicate has already been satisfied by the
previous input element, move to the next pattern element and the next
input.
2. If this is not a star element, or is a star predicate tested for the first time,
then:
reset j (the index in the pattern) to next(j), and
reset i (the index in the input) to i count(j 1) + count(shift(j) +
next(j) 1).
In the presence of stars, the computation shift(j) and next(j) is more
complex and it is discussed next.
28
8/8/2019 Thesis Sadri
43/108
4.1 Finding next and shift for the Star Case
Consider the following graph based on the matrix (excluding the main diagonal)
21
31 32
41 42 43
The entry jk in our matrix correlates pattern predicates pj with pk, k < j,
when these are evaluated on the same input element. Therefore, we can picture
the simultaneous processing of the input on the original pattern, and on the same
pattern shifted back by j k. Thus the arcs between nodes in our matrix aboveshow the combined transitions in the original pattern and in the shifted pattern.
In particular, consider kj where neither pk nor pj are star predicates; then after
success in pj and pk, we transition to pj+1 in the original pattern, and to pk+1
in the shifted pattern: this transition is represented by an arc kj k+1,j+1.
However, if pj is not as star predicate, while pk is, then the success of both
will move pk to pk+1, but leave pj unchanged: this is represented by the arc
kj k+1,j. In general, it is clear that only a subset of the arcs listed in theprevious matrix represent valid transitions and should be considered and this set
is also limited by the values of . In particular, since all the predicates in the
pattern must be satisfied by the shifted input, every kj = 0 entry must removed
with all its incoming and departing arcs: we only retain entries that are either 1
or U.
29
8/8/2019 Thesis Sadri
44/108
Considering all possible situations, and assuming that all the neighbors are
non-zero entries, we conclude that only the following transitions are needed when
building the graph
1. If both elements j and k of the pattern sequence are star predicates and
jk = U, then we have three outgoing arcs from jk: one to j+1,k, one to
j+1,k+1 and one to j,k+1. Pictorially,
U j,k+1
j+1,k j+1,k+1
2. If both element j and element k of the pattern are stars and jk = 1, we
have two outgoing arcs from jk: one to j+1,k+1 and the other to j,k+1.
Pictorially,
1 j,k+1
j+1,k j+1,k+1
There is no arc to j,k+1, because j,k = 1; thus all input tuples that satisfy
pj must also satisfy pk.
3. If both elements j and k of the pattern are non-star predicates, then we
have only one arc from jk to j+1,k+1. Pictorially,
jk j,k+1
j+1,k j+1,k+1
4. If element j of the pattern is a star predicate, but element k is not, then
we have two arcs from jk : one to j+1,k+1 and the other to j,k+1,
30
8/8/2019 Thesis Sadri
45/108
jk j,k+1
j+1,k j+1,k+1
5. If element k of the pattern is a star predicate but element j is not, then we
have two arcs from jk: one to j+1,k+1 and the other to j+1,k. Thus we
have:
jk j,k+1
j+1,k j+1,k+1
These rules assume that the end nodes of the arcs are either U or 1; but when
such nodes are 0 the incoming arcs will be dropped.
The directed graph produced by this construction will be called the Implica-
tion Graph for pattern sequence P, and is denoted as GP. For each value of j
this graph must be further modified with entries from to account for the fact
that jth element of the pattern failed on the input.
Therefore, we replace the jth row of this graph (the row that starts with j,1)
with the jth row of matrix and remove rows greater than j. In addition we
update the arcs between elements in row j 1 and row j according to the new
values of elements in row j. We use the same rules that we used for arcs between
elements. If element k is star, there are up to two arcs from j1,k to row j:
one to jk and one to j,k+1. This is in addition to the possible existing arc fromj1,k to j1,k+1. If element k is not an star, then there will be only an arc from
j1,k to row j that goes to jk . Again in addition to possible existing arc from
j1,k to j1,k+1. Again we assume that the e nd nodes of the arcs are either U
or 1; but when such nodes are 0 the incoming arcs will be dropped. The resulting
graph will be called the Implication Graph for pattern element j, denoted GjP.
31
8/8/2019 Thesis Sadri
46/108
Take the following example where we want to find occurrences of this pattern
in IBMs stock price: increasing price to between 30 and 40 followed by a period of
decreasing price followed by another period of increasing price to a value between
35 and 40, followed by a decreasing period to a value below 30. The query written
in SQL-TS is:
SELECT X.NEXT.date, X.NEXT.price,
S.previous.date, S.previous.price
FROM quote
CLUSTER BY name,
SEQUENCE BY date
AS (*X, Y, *Z, *T, U, *V, S)
WHERE
X.name=IBM
AND X.price > X.previous.price
AND 30 < Y.price
AND Y.price < 40
AND Z.price < Z.previous.price
AND T.price > T.previous.price
AND 35 < U.price
AND U.price < 40
AND V.price < V.previous.price
AND S.price < 30
Therefore our pattern predicates (on an input tuple t) are:
32
8/8/2019 Thesis Sadri
47/108
p1(t) = (t.price > t.previous.price)
p2(t) = (30 < t.price < 40)
p3(t) = (t.price < t.previous.price)
p4(t) = (t.price > t.previous.price)
p5(t) = (35 < t.price < 40)
p6(t) = (t.price < t.previous.price)
p7(t) = (t.price < 30)
Observe that p1, p3, p4, and p6 are star predicates, and the others are not.
Our matrices and are:
=
1
U 1
0 U 1
1 U 0 1
U 1 U U 1
0 U 1 0 U 1
U 0 U U 0 U 1
=
0
U 0
U U 0
0 U U 0
U U U U 0
U U 0 U U 0
U U U U U U 0
33
8/8/2019 Thesis Sadri
48/108
Since p1, p3, p4, and p6 are star predicates, and p2 and p5 are not, we can connect
the elements of (after excluding the main diagonal) as follows:
GP =
U
0 U
1 U 0
U 1 U U
0 U 1 0 U
U 0 U U 0 U
Say now that we want to build G6P. We replace row 6 of GP with row 6 of
and update the paths from the 5th row to the 6th row according to new value.
Then, we have the following graph:
34
8/8/2019 Thesis Sadri
49/108
G6P =
U
0 U
1 U 0
U 1 U U
U U 0 U U
Consider now the node 41 in this graph. Observe that there are several paths
consisting of either 1 nodes or U nodes that take us to nodes in the last row of
the matrix. Therefore, the input shifted by 4 can succeeds along any of thesepaths. However, there is no path to the last row starting from node 31: thus, 3
is not a possible shift. Also there is not path to the last row starting from 21
and 11; thus shifts of size 2 and 1 can never succeed. Therefore, we conclude
that shift(6) = 3.
In general, we define shift(j) as follows:
Definition 1 For a pattern P
shift(j)=min{s| t where there is a path from s+1,1 to j,t in GjP}
For the case that the above set is empty, we have:
if j1 = 0 then shift(j) = j 1 else shift(j) = j
Now we can define next. Note that there might be more than one path found
in the definition of shift, but next must return a unique value to be used in
35
8/8/2019 Thesis Sadri
50/108
restarting the search. Therefore let us say that a node in our GjP graph is deter-
ministic if there is exactly one arc leaving this node, and the end-node of this arc
has value 1 (thus a deterministic node cannot take us to an U node or to several
1 nodes). Thus, we start from shift(j)+1,1, and if this is not deterministic, we set
next(j) = 1. Otherwise we move to the unique successor of this deterministic
node and repeat the test. When the first non-deterministic node is found in this
process, next(j) is set to the value of its column. If the search takes us to the
last row in GjP, that means that none of the input elements previously visited
needs to be tested again: thus next(j) = j shift(j).
For the example at hand, there is a non-zero path from node 41 to 61, thus
shift(6) = 3. We now consider 41 = 1 and see that this is not a deterministic
node, since there more than one arc leaving the node. Thus, we conclude that
next(6) = 1.
4.2 Complexity of Calculating next and shift
The GP graph built in the last section has at most m(m 1)/2 node and each
node has a constant number of (maximum 3) outgoing edges. Thus at the worst
case, traversing the graph in depth first search manner for finding shift(j) will
have the complexity of O(j2), and the worst case complexity for finding all the
shift values will be O(m3). Note that the first j 2 levels of graph Gj1P are
the same as the first j 2 levels of graph Gj
P, therefore we can use the results ofprevious traversal for j 1 in the current traversal for j. If we store the values of
indexes of the paths from the first column to j 2 row in GjP, then for calculating
shift(j) we can only take the starting points from the (j 2)nd row. This way
the complexity will be reduced to O(m2) because the complexity of going from
the (j 2)nd row to the jth row is constant for each element (since the branching
36
8/8/2019 Thesis Sadri
51/108
factor is less than or equal to 3).
4.3 Disjunctive Patterns
We will next consider queries that search for the disjunction of two patterns,
meaning that the input sequence should satisfy either one of the two pattern
sequences, or both. In effect it is equivalent to two independent queries, but we
can execute them in one scan of the database. Take the following example:
Example 9 Find 4 consecutive rise in the stock price or 4 consecutive closing
price between 55 and 57 for IBM
SELECT X.NEXT.date, X.NEXT.price,
S.previous.date, S.previous.price
FROM quote
SEQUENCE BY dateAS (X, Y, Z, T)
WHERE X.name=IBM
AND (
( X.previous.price < X.price
X.price < Y.price
AND Y.price < Z.price
AND Z.price < T.price
)
OR
( X.price > 55
AND X.price < 57
AND Y.price > 55
37
8/8/2019 Thesis Sadri
52/108
AND Y.price < 57
AND Z.price > 55
AND Z.price < 57
AND T.price > 55
AND T.price < 57
)
)
A naive approach in processing this query would consists in making a firstpass through data to satisfy the first pattern followed by a second pass to satisfy
the second pattern. This approach will not be considered since it is likely to
require each page in the secondary store to be retrieved twice. We next consider
two other approaches that do not suffer from this drawback. These are:
Multiple Stream model, and
Single Stream model.
Multiple Data Stream In this model, the starting i and j values for each
pattern are kept in a queue. When the pattern being tested fails, its new i
and j values are computed and replace the old values in the queue. Then, the
scheduler looks at the values in the queue and select a pattern for processing
according to some optimization criteria. The queue can be prioritized based on
different criteria for optimization. For instance, by selecting the pattern which
has the least value of i, we can ensure that all patterns are served fairly and no
pattern stays behind. This in turn minimizes the size of data that needs to be
kept in temporary memory.
An advantage of this method is its simplicity and amenability to both serial
and parallel processing. It can be implemented as a client-server model where
38
8/8/2019 Thesis Sadri
53/108
the server provides the next values for i and j to each client process. Thus, this
method is amenable to parallel execution based on multiple data streams.
Single Data Stream This model assumes that all the patterns are tested in
parallel against the current element in the input being scanned. Only patterns
that are known to be false or true from the matrices are excused. Take the
query in Example 9. That query is equivalent with two queries, one that finds
occurrences with 4 consecutive closing prices between 55 and 57 and the other
that finds 4 consecutive rise in the price. We calculate , , shift and next
independently for each pattern, but at the run time we handle both queries
simultaneously. The run time algorithm can be revised as follows.
39
8/8/2019 Thesis Sadri
54/108
The OPS Algorithm For Concurrent Disjunctive Patterns
j1 = 1; j2 = 1; i = 1;
while j1 m1 j2 m2 i n do {
while pj1(ti) qj2(ti) do {
j1 = j1 + 1; j2 = j2 + 1; i = i + 1;
}
iO = i;
if pj1(ti) then {while i iO do {
while j > 0 pj1(ti) do {
i = i j1 + next1(j1) + shift1(j1);
j1 = next1(j1);
}
j1 = j1 + 1; i = i + 1;
}else { / pj2(ti) /
while i iO do {
while j > 0 qj2(ti) do {
i = i j2 + next2(j2) + shift2(j2);
j2 = next2(j2);
}
j2 = j2 + 1; i = i + 1;
}
}
}
if i > n then failure
else success;
40
8/8/2019 Thesis Sadri
55/108
As the algorithm shows, we keep proceeding until one of the concurrent patterns
fails. At this point we save the current value of i in iO and reset i and j for the
failed pattern and keep searching only for the fail pattern until i becomes greater
than iO (since we know the other pattern doesnt have to checked against input
up to the point iO). In this way, we only scan the pattern once and we need only
one buffer to keep recent values of the input.
Disjunctive Normal Form Consider now the following query:
Example 10 Find patterns in IBM stock price where there is a sequence of de-
creases of more than 2% in price until it goes below 50 or it starts increasing,
followed by a period of price increases of more than 2 percent until it goes above
100 or it starts decreasing:
SELECT X.NEXT.date, X.NEXT.price,
S.previous.date, S.previous.price
FROM quote
CLUSTER BY name
SEQUENCE BY date
AS (X,*Y, Z, *T,U )
WHERE
X.name=IBM
X.price > 0.98 * X.previous.price
AND Y.price < 0.98 * Y.previous.price
AND (Z.price > Z.previous.price OR Z.price < 50)
AND T.price > 1.02 * T.previous.price
AND (U.price < U.previous.price OR U.price > 100)
41
8/8/2019 Thesis Sadri
56/108
The pattern in the where clause is:
p1 p2 (p3a p3b) p4 (p5a p5b)
where
p1(t) = t.price > 0.98 t.previous.price
p2(t) = t.price < 0.98 t.previous.price
p3a(t) = t.price > t.previous.price
p3b(t) = t.price < 50
p4(t) = t.price > 1.02 t.previous.price
p5a(t) = t.price < t.previous.price
p5b(t) = t.price > 100
This can be computed by expanding the where clause into its normal form con-
sisting of the following four patterns and processed accordingly:
p1 p2 p3a p4 p5a
p1 p2 p3a p4 p5b
p1 p2 p3b p4 p5a
p1 p2 p3b p4 p5b
This approach is the one that is preferable as long as it does not lead to too many
alternative patterns.
A second approach consists in evaluating this query as it contained only one
conjunctive pattern. In this case, the and matrices will have to be constructed
42
8/8/2019 Thesis Sadri
57/108
for implications between disjunctive clauses. This problem is discussed in Section
5.2.
For very complex disjunctive conditions the two approaches can be combined.
43
8/8/2019 Thesis Sadri
58/108
CHAPTER 5
Calculating and
Elements of and are calculated according to the semantics of the pattern
elements. Satisfiability and implication results in databases [7, 27, 10, 20, 24, 25]
are relevant to the computation of and for a class of patterns that involve
inequalities in a totally ordered domain (such as real numbers). Ullman [27] has
given an algorithm for solving the implication problem between two queries Sand
T. His algorithm works for queries which are conjunctions of terms of the form
X op Y, where op {}, and has complexity ofO(|S|3 +|T|). Klug
[10] has studied the implication problem in a broader range of queries that are
conjunction of terms of the form X op C and X op Y. Rosenkrantz and Hunt [20]
provided an algorithm for satisfiability problem with complexity |S|3, where S is
the query that is tested for satisfiability and is a conjunction of terms of the form
X op C, X op Y, and X op Y+C. In our implementation, we compute matrices
and first using the algorithms by Guo, Sun and Weiss (GSW) [7] for computing
implication and satisfiability of conjunctions of inequalities as explained below.
5.1 GSW Algorithm
The GSW algorithm deals with inequalities of the form X op C, X op Y, and
X op Y + C where X and Y are variables, C is constant, and op {=, =, ,
,< ,>}. Complexity of their algorithm is O(|S| n2 + |T|) for testing implication
44
8/8/2019 Thesis Sadri
59/108
(for the 1 entries in our matrices) and O(|S| + n3) for testing satisfiability (for
the 0 entries); n is the number of variables in S and |S| and |T| are the number
of inequalities in S and T. Given the limited number of variables and inequalities
used in queries, these compilation costs are quite reasonable. GSW starts with
applying the following transformations:
1. (X Y + C) (Y X C)
2. (X < Y + C) (X Y + C) (X = Y + C)
3. (X > Y + C) (Y X C) (X = Y + C)
4. (X = Y + C) (Y X C) (X Y + C)
5. (X < C) (X C) (X = C)
6. (X > C) (X C) (X = C)
7. (X = C) (X C) (X C)
After these transformations, for all the inequalities of the form X op Y + C, we
have op {, =}, and for all the inequalities of the form X op C, op {, =, }
(X op Y is a special form of X op Y + C where C = 0).
5.1.1 Satisfiability
For determining the satisfiability of a conjunctive query S, a directed weighted
graph Gs = (Vs, Es) is built where Vs is the set of variables in S, and there is a
directed edge from X to Y with weight C in Es, if and only if (X < Y + C) S.
Inequalities of the form (X < C) are transformed to the form (X < V0 + C) by
introducing dummy variable V0.Thus the following results are proven in [7]: If
45
8/8/2019 Thesis Sadri
60/108
there is a negative weighted cycle a cycle that sum of the weights of its edges
is negative, then S is unsatisfiable. If all the cycles are positive weighted, then S
is satisfiable. For the case that there are zero weighted cycles, the necessary and
sufficient condition for satisfiability is that for any two variables X and Y on the
same cycle, if the path from X to Y has a cost C, then (X = Y + C) S. As
shown in [7], this algorithm has the time complexity of O(|S| + n3) where |S| is
the number of inequalities in S and n is the number of variables (size ofVs). The
following example clarifies how the algorithm works:
Example 11 Assume that we want to find out if jk is zero or not where the two
pattern elements pj and pk are as follows:
pj = X < Y + 4 Y < Z
pk = Z < X+ 2 X < 6 Z > 7
To see ifpj pk is satisfiable or not, we first build a graph for pj pk as in Figure5.1. There are two cycles in the graph. Cycle X Y Z X , has weight of 6 and cycle
XV0ZX has weight of 1. Since there are no negative weighted cycles, pj pk is
satisfiable and value of jk is not zero.
5.1.2 Implication
The implication problem takes to queries S and T and determines ifS implies T.
S and T are assumed to be conjunction of inequalities of the form X op Y + C.
For the inequalities of type X op C, a dummy variable V0 is defined that can
take only value of zero and the inequality is transformed to X op V0 + C. It
can be proven [7] that under this transformation the implication problem doesnt
change. The algorithm starts by introducing the closure of S, i.e., a complete
46
8/8/2019 Thesis Sadri
61/108
X
ZY
V0
4
6
-7
0
2
X
ZY
V0
4
6
-7
0
2
Figure 5.1: Directed weighted graph for determining the satisfiability of a set of
inequalities
47
8/8/2019 Thesis Sadri
62/108
set that contains all the inequalities implied by S. Then, T is implied by S Iff
T is a subset of closure of S. Since the number of inequalities implied by S
is boundless, a set called modulo closure of S is defined that contains only non
redundant inequalities that belong to closure of S. For example if Y < X + C1
is in closure of S, for any C2 > C1, inequality Y < X + C2 would be a redundant
member of closure of S. The modulo closure of S can be computed by applying
the following set of axioms to S:
A1. X X+ 0;
A2. X = Y + C implies Y = X C where Y and X are distinct variables;
A3. X Y + C and Y V + C implies X V + C+ C;
A4. X W + C1, W Y + C2, X Z+ C3, Z Y + C4, W = C + Z, and
C = C3 C1 = C2 C4 imply X = Y + C1 + C2 where X and Y are distinct
variables. Also Z and W are distinct variables.
GSW proves that the size of the modulo closure called Sclosure is finite and
calculating it has the time complexity of O(|S| n2). After calculating Sclosure,
S implies T if and only if S is unsatisfiable or:
for every (X Y + C) T, there exist (X Y + C0) Sclosure such that
C0 < C; and
for any (X = Y + C) T,
1. (X = Y + C) Sclosure or
2. there exist (X Y + C1) Sclosure such that C1 < C or
3. there exist (Y X+ C2) Sclosure such that C2 < C.
48
8/8/2019 Thesis Sadri
63/108
This step takes O(|T|) [7]; therefore, the whole algorithm complexity is O(|S|
n2 + |T|) While the GSW algorithm is sufficient to handle examples listed so far,
a minor extension is needed to handle the next query Example 13. In this query
inequalities have the form say X op C Y. Then, we introduce a new variable
Z = X/Y and use Z op C, given that the domain ofY is positive numbers (stock
prices). Even though the GSW algorithm covers a broad range of queries, it is
still limited since it only covers conjunctive queries of very limited forms. In the
next section we consider disjunctive patterns.
5.2 Calculating and for Disjunctive Pattern Elements
We next discuss the problem of how to determine the and matrices for con-
junctive patterns that include disjunctive terms. Assume we have the pattern
P = p1, p2, p3, . . . , pm where pj = pja pjb for some 1 j m. We can first cal-
culate implication and satisfiability relations between pja and pjb and every other
element of the pattern P using the methods described in the previous sections.
For every k where 1 k < j, we need to calculate jk , i.e., we must calculate
the logical value of pj pk. If pja pk and pjb pk are true, pj pk is true
and the value of jk is 1. In a similar way if pja pk and pjb pk, then jk
is 0. In other cases, the information available is not enough and we need to set
the value of jk to U. For each value k, where j < k m, we need to calculate
the truth value of pk pj . In this case, if either pk pja or pk pjb always
have a truth value of 1, then we have that pk pj is true and the value of kj is
1. Also, since pj = (pja pjb) = pja pjb, if pk pja and pk pjb are
both true, then we can conclude that pk pj is true and kj is 0.
The same arguments can be used for calculating . For calculating jk when
k < j, we want to see if value of pj pk is always true or not. Since pj =
49
8/8/2019 Thesis Sadri
64/108
k < j ja,k = 1 jb,k = 1 jk = 1
ja,k = 0 jb,k = 0 jk = 0
else jk = U
ja,k = 1 jb,k = 1 jk = 1
ja,k = 0 jb,k = 0 jk = 0
ja,k = U jb,k = U jk = U
k > j ja,k = 1 jb,k = 1 jk = 1
ja,k
= 0 jb,k
= 0 jk
= 0
else jk = U
ja,k = 1 jb,k = 1 jk = 1
ja,k = 0 jb,k = 0 jk = 0
else jk = U
Table 5.1: Logic matrix elements for disjunction of pattern elements.
(pja pjb) = pja pjb , if pja pk or pjb pk, then we can conclude
that jk is one.
Table 5.2 summarizes different possibilities for and . In this table, by
ja,k we mean value of jk if we would replace pj with pja. The same notational
conventions are used for ja,k, and k,ja. When we are calculate jk, and both j
and k are disjunctive patterns, we can first decompose one of the elements and
use the technique just described for calculating , and then combine the results.
Obviously we can use the same technique for calculating . Furthermore, eventhough we only used two terms in the disjunctive elements, the number of terms
in the disjunctions doesnt have to be limited to two. As an example of disjunctive
elements, lets consider the following SQL-TS query:
Example 12 Calculate and for the following query
50
8/8/2019 Thesis Sadri
65/108
SELECT X.NEXT.date, X.NEXT.price,
S.previous.date, S.previous.price
FROM djia
SEQUENCE BY date
AS (X,*Y, Z, *T,U )
WHERE X.price > 0.98 * X.previous.price
AND Y.price < 0.98 * Y.previous.price
AND (0.98*Z.previous.price < Z.price OR Z.price > 50)
AND T.price > 1.02 * T.previous.price
AND (0.98*U.previous.price < U.price OR U.price > 50)
This can be represented as a pattern sequence P:
P=p1, p2, p3, p4, p5, where:
p1(t) = t.price > 0.98 t.previous.price
p2(t) = t.price < 0.98 t.previous.price
p3(t) = t.price > 0.98 t.previous.price t.price > 50
p4(t) = t.price < 0.98 t.previous.price
p5(t) = t.price > 0.98 t.previous.price t.price > 50
We can break p3 and p5 to p3 = p3a p3b and p5 = p5a p5b where:
p3a(t) = p5a(t) = t.price > 0.98 t.previous.price and
p3b(t) = p5b(t)t.price > 50
51
8/8/2019 Thesis Sadri
66/108
As an example, in order to calculate 32, notice that p3a(t) p2(t), p3b(t) p2(t),
and p3b(t) p2(t) are not equivalent to true; however, p3a(t) p2(t) is equiv-
alent to true. In other words, 3a,2 = 0 and 3b,2 is U; therefore, 32 = U. On the
other hand, to calculate 43, we have 4,3a = 0 and 4,3b = U; therefore 4,3 = 0.
In conclusion we obtain the following matrix:
P =
1
0 1
0 0 10 0 0 1
0 0 1 0 1
52
8/8/2019 Thesis Sadri
67/108
CHAPTER 6
Experimental Results
In order to measure performance, we count the number of times that input el-
ements are tested against pattern elements. Thus, in our experiments we mea-
sured speedups ranging from the modest accelerations obtained for simple search
patterns, such as that of Figure 3.4, to speedups of more than two orders of mag-
nitude for the more complex patterns used in practical applications. For instance,
a common search in stock market data analysis is for a double-bottom pattern,
where the stock price has two consecutive local minima. We searched the closing
price of DJIA (Dow Jones Industrial Average) in the last 25 years, for relaxed
double-bottoms. By a relaxed double bottom we mean a local maximum sur-
rounded by two local minima, where we only consider the increases or decreases
of more than 2%. In other words, if the price moves less than 2%, we consider it
as if it hasnt changed (Figure 6.1).
Example 13 expresses the relaxed double bottom pattern in SQL-TS; Z, U,
and W represent the areas where changes are less than 2% and the curve is con-
sidered approximately flat (Figure 6.1). For the relaxed double-bottom query,
the speedup gained using OPS on the DJIAs dataset for last 24 years was more
than 93. Figure 6.1 shows there are 12 matches found in the input. The graph
in the bottom of figure 6.1 shows one of this patterns that has occurred around
June 1990. We ran different pattern search queries, compared their performance
with the naive search, and obtained 800 times speedups.
53
8/8/2019 Thesis Sadri
68/108
*Z
(less than 2% change)
*U
(less than 2% change)
*W
(less than 2% change)
*Y *R
*V*T
*Z
(less than 2% change)
*U
(less than 2% change)
*W
(less than 2% change)
*Y *R
*V*T
Figure 6.1: The relaxed double bottom pattern.
Example 13 Relaxed double bottom
SELECT X.NEXT.date, X.NEXT.price,
S.previous.date, S.previous.priceFROM djia
SEQUENCE BY date
AS (X,*Y, *Z, *T, *U, *V, *W, *R, S)
WHERE X.price >= 0.98 * X.previous.price
AND Y.price < 0.98 * Y.previous.price
AND 0.98*Z.previous.price < Z.price
AND Z.price < 1.02*Z.previous.price
AND T.price > 1.02 * T.previous.price
AND 0.98*U.previous.price < U.price
AND U.price < 1.02*U.previous.price
AND V.price < 0.98 * V.previous.price
AND 0.98*W.previous.price < W.price
54
8/8/2019 Thesis Sadri
69/108
AND W.price < 1.02*W.previous.price
AND R.price > 1.02*R.previous.price
AND S.price
8/8/2019 Thesis Sadri
70/108
Oct76 Jul79 Mar82 Dec84 Sep87 Jun90 Mar93 Dec95 Sep98 May010
2000
4000
6000
8000
10000
12000
Jun90
2400
2500
2600
2700
2800
2900
3000
Figure 6.2: Doublebottoms found in the DJIA data are shown by boxes. The
bottom picture is zoomed for the area pointed by arrow in the top picture and
shows one of the matches.
56
8/8/2019 Thesis Sadri
71/108
CHAPTER 7
Improvements and Generalizations
The definition of the logic matrices and given in Section 3.4 are based on both
implication and satisfiability of the patterns elements. In Section 5, we provided
methods for calculating these matrices. Here we present an alternative approach
for finding the relationship between the pattern elements. This is set-theoretic
approach, where implication between patterns is expressed as inclusion among the
sets of objects satisfying pattern elements. Therefore, the logical matrices and
are expressed only in terms of set theoretic inclusions or equivalent implications.
This formulation is useful since it is generally easier to work with sets than 3-
valued logic constructs. Instead of solving both implication and satisfiability
problems in a specific domain, the problem is reduced to solving the set inclusion
problem.
7.1 Mapping between Set Theoretic and Logical Domains
Lets denote the set of all possible text elements that satisfy a pattern element
pj as j. Therefore,
j = {x|pj (x) = 1} (7.1)
There are three possible situations:
1. pj pk is a tautology. This is equivalent to j k.
57
8/8/2019 Thesis Sadri
72/108
2. pj pk is always false. This is equivalent to j k = .
3. pj pk is sometimes false and sometimes true. This is equivalent to
j k = and j k.
Using the above mentioned three basic cases we can derive the detailed re-
lationship between all the set theoretic cases and the corresponding and
elements. This set theoretic/logic relation is general and holds for any pattern
structure. In the next section we use this relation for calculation of and for
specific pattern structures. Let denote the universe, containing all the possible
values for the text elements. All disjoint possibilities of the set-theoretic relations
between two sets are enumerated in Table 7.1. These distinct set theoretic possi-
bilities can be defined either by set confinement and intersection relations between
j and k, or by set containment relations among j, k and their complements
j and k. The former version is shown in Table 7.1. The set-theoretic cases can
be categorized as either typical and non-typical. By non-typical we mean that
either (i) at least one of the patterns is empty or equal to the universe, or (ii) the
two patterns, or their complements are equal. In a set of uniformly random gen-
erated patterns for an specific pattern structure, the relative number non-typical
patterns is negligible. In the classical Knuth algorithm for string matching only
two of the set-theoretic cases can happen: case III for a match and case I for
a mismatch. For generalized predicate patterns all the set theoretic cases, are
possible.
Next consider the logic relation among the pattern elements. Each entry in
the and matrices has three possible values: 0,1 or U. Therefore, for each pair
of j and k elements in a pattern there are nine possibilities for the values for the
pair jk and jk. Table 7.1 provides the systematic mapping between possible set
theoretic cases and the nine logical cases.
58
8/8/2019 Thesis Sadri
73/108
I II III IV V V I V II V III IX
Set theoretic description
Typical j k j k j k =
j , k = - - or j k k j - j k -
{, } j = k k j k j
Non- j = j = k j = k j = - - j = - j =
typical k {, } j {, } k {, } k = k =
Logical description
jk 0 0 1 U 1 U 1 U 0
jk U 1 0 1 U 0 1 U 0
Relative speed up of OPS algorithm respect to brute force search
Speedup High (A) Medium (B) None (C) -
Table 7.1: Mapping between set theoretic logic relations among pattern elements
and corresponding OPS speedups for scanning left to right.
59
8/8/2019 Thesis Sadri
74/108
I II III IV V V I V II V III I V
Set theoretic description
i j 0 1 0 0 1 1 0 0 1 0 1 1 0 0 1
i j 0 0 1 1 1 0 1 1 0 0 1 1 0 0 0
i j 1 1 1 1 1 0 0 0 0 0 0 0 0 1 1
i j 0 0 0 1 0 1 0 1 0 1 0 1 0 1 1
Logical description
jk 0 0 0 0 0 1 U U 1 U 1 1 U 0 0jk U U 1 1 1 0 1 1 U 0 1 1 U 0 0
kj 0 0 0 0 0 1 U 1 U 1 U 1 U 0 0
kj U 0 1 1 1 0 1 1 0 U 1 1 U 0 0
Table 7.2: Detailed mapping between set theoretic logic inclusions among pattern
elements and the logic matrices elements.
60
8/8/2019 Thesis Sadri
75/108
The OPS algorithm leverages the pairwise logic relation encoded in and
for optimizing the pattern search. These pairwise relations have different effects
on the speedup. The last row in Table 7.1 shows the relative speedup for different
cases. Values of 0 for have the most effect since they can cause a whole shift
(they can change the value of shift(j)). Values of 1 have less effect since they
only change the value of next(j). The details of mapping between the inclusion
(or implication) relations and the logic matrices elements are provided in Table
7.2.
In different domains, we can use the structure of the elements and relations
in that domain to parameterize the set inclusion problem. In the next section we
demonstrate how this can be done for single and multi dimensional intervals.
7.2 Interval Patterns
One the most widely used pattern sequences are the interval patterns. In these
queries, we look for a sequence in the input that fit a sequence of intervals. For
example we might seek sequences where a number between 5 and 6 is followed
by a number between 2 and 3. With proper transformations, all the examples
presented so far can be described using interval patterns. The main advantage
of these patterns is that their logic matrix elements can be calculated easily and
in constant time, as we will see next.
7.2.1 One Dimensional Intervals
Intervals can be defined using the {
8/8/2019 Thesis Sadri
76/108
Thus, j is represented as [j, j, cj , dj], where cj and dj are th