Upload
jonjon
View
225
Download
0
Embed Size (px)
Citation preview
8/9/2019 CQL
1/2
MSc project:
Stream Query Processing with CQLTore Risch
2010-03-05
A Data Stream Management (DSMS) is similar to a database management system(DBMS) with the difference that a DBMS allows searching only stored data, while a
DSMS in addition provides query facilities to search directly in data streaming fromsome source. DSMS queries are different from conventional database queries in, e.g.,
SQL where a query requests data from tables stored in the database. The result of aDSMS query can be not only a set of tuples as in SQL, but also a potentially infinite
stream of tuples. Furthermore, stream queries are continuous queries (CQs) in that theyrun all the time until they are terminated, while conventional queries are executed on
demand and run until all requested data is delivered.
There are several DSMS research prototypes developed, such as STEAM (Stanford),Aurora (MIT), Gigascope (Bell Labs), and Wavescope (MIT). Streambase(http://www.streambase.com/) provides the first DSMS product. In UDBL we are
developing the DSMS SCSQ (SuperComputer Stream Query processor) based on themain memory DBMS Amos II (http://user.it.uu.se/~udbl/amos). Amos II is a functional
DBMS where data and information represented as typed functions. In SCSQ databasequeries over streams are expressed in SCSQL, a query language similar to the OO parts
of SQL:99 but extended with parallel stream query facilities. SCSQL is an extension ofthe functional query language AmosQL
(http://user.it.uu.se/~udbl/amos/doc/amos_users_guide.html). Thus, in SCSQ queries andview are expressed as functions, i.e. SCSQ has a functional data model.
There are several query language developed for DSMSs, CQL (Stanford), StreamSQL
(StreamBase), WaveScript (MIT), and SCSQL (UU). The present project aims atproviding CQL support integrated with SCSQ.
The performance of DSMSs are evaluated using the Linear Road Benchmark for DSMSs
(http://www.cs.brandeis.edu/~linearroad/). Linear Road simulates an expressway systemwith dynamically varying toll rates producing data streams to be processed by a DSMS. It
is endorsed by several universities, including Brandeis, Brown, MIT, and Stanford. Thetraffic events in Linear Road (LR) are generated by a traffic simulator from MIT. The
performance of a DSMS is measured by how many expressways it can handle
simultaneously, called theL-rating. LRB was originally developed by Stanford in termsof CQL. SCSQ is currently the fastest DSMS in the world according to LRB. Thisperformance is achieved by optimized parallelization of CQs on a cluster.
The project is divided into two phases:
1. The first phase, called evaluation of the stream query language CQL, is toinvestigate what are the main properties of CQL and to what extent they are
8/9/2019 CQL
2/2
implemented by the Stanford STREAM project. The result should be a descriptionof how CQL differs from SQL, what is the implementation status, and what APIs
are provided. A state-of-the-art overview of related query languages should beincluded. This phase counts as a 15 p Xjob on C level.
2. The second phase, calledprocessing continuous CQL queries over a functionalDSMS, is to design and implement a CQL processor in Amos II/SCSQ. It countsas a 30 p Xjob on E level.
a. First it should be investigated what existing functionality in SCSQ andAmos II can be utilized to support CQL. This investigation is based on theresult from the evaluation of the stream query language CQL.
b. An extension of the current Amos II SQL parser to support CQL should beimplemented. The parser is mainly a preprocessor that translates SQL to
AmosQL, but the parser is also integrated with the Amos II kernel andmay call the kernel during the parsing. The SQL parser translates SQL to
Lisp so knowledge of Lisp is needed to extend it to support CQL.c. Cases where there is missing functionality in Amos II/SCSQ to support
CQL should be identified. If the missing functionality is minor, properextensions to SCSQ should be made; more problematic extension should
be documented and possible solutions outlined. In some cases AmosQLmay need to be extended using itsforeign function facilities
(http://user.it.uu.se/~torer/publ/external.pdf).d. For testing the CQL parser it should be investigated whether there are
some test scripts available for verifying CQL. If that is the case theyshould be used for verifying that the CQL parser over SCSQ behaves
properly. Another possibility, in case test scripts for CQL cannot beobtained, is to use a subset of LRB as test script. Even if a full
implementation of LRB may not be possible in the present project, itshould be investigated whether parts are implementable with the new CQL
implementation.e. The result of the second phase should be a fully functional system with an
implemented demonstration script that illustrates the functionality. Theimplementation should be made on the Windows platform.
The result from this work should be a report describing both phases of the project. The
implemented system should be satisfactory documented and the report should include anoverview of related state-of-the-art implementations.