62
CQL: A Language for CQL: A Language for Continuous Queries over Continuous Queries over Streams and Relations Streams and Relations Jennifer Widom Stanford University Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

Embed Size (px)

Citation preview

Page 1: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

CQL: A Language for Continuous CQL: A Language for Continuous Queries over Streams and RelationsQueries over Streams and Relations

Jennifer Widom

Stanford UniversityStanford University

Joint work with Arvind Arasu & Shivnath Babu

stanfordstreamdatamanager

Page 2: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 2

Data StreamsData Streams• Continuous, unbounded, rapid, time-varying

streams of data elements

• Occur in a variety of modern applications– Network monitoring and traffic engineering– Sensor networks, RFID tags– Telecom call records– Financial applications– Web logs and click-streams– Manufacturing processes

• DSMSDSMS = Data Stream Management System

Page 3: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 3

DBMS versus DSMSDBMS versus DSMS• Persistent dataPersistent data

• One-time queriesOne-time queries

• Random access

• Access plan determined by query processor and physical DB design

• Transient data Transient data (and persistent data)(and persistent data)

• Continuous queriesContinuous queries

• Sequential access

• Unpredictable data characteristics and arrival patterns

Page 4: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 4

DSMS

Scratch Store

The (Simplified) Big PictureThe (Simplified) Big Picture

Input streams

RegisterQuery

StreamedResult

StoredResult

Archive

StoredData

Page 5: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 5

(Simplified) Network Monitoring(Simplified) Network Monitoring

RegisterMonitoring

Queries

DSMS

Scratch Store

Network measurements,Packet traces

IntrusionWarnings

OnlinePerformance

Metrics

Archive

LookupTables

Page 6: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 6

STREAMSTREAMThe Stanford Stream Data ManagerThe Stanford Stream Data Manager

• General-purpose DSMS for streams and stored data

• Relational (unlikely to change)

• Centralized server model (likely to change)

– Single-threaded and parallel versions

• Declarative language for registering Declarative language for registering continuous queriescontinuous queries

Page 7: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 7

The STREAM System:The STREAM System:Some Implementation IssuesSome Implementation Issues

• Designed to cope with:– Stream ratesStream rates that may be high,variable, bursty

– Continuous query loadsContinuous query loads that may be high, volatile

• Primary coping techniques– Continuous self-monitoringself-monitoring and reoptimizationreoptimization

– Graceful approximationapproximation as necessary

– Careful resource allocation and useresource allocation and use

These issues do not affect the query language directly

Page 8: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 8

The STREAM System:The STREAM System:Current StatusCurrent Status

• Version 1.0 up and running

• Implements most of query language

• Now incorporating the fun stuff

• Currently approx. 25,000 lines of C++ code

• Release plans

1) Web-accessible running server

2) Downloadable software

Page 9: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 9

The Stream Systems LandscapeThe Stream Systems Landscape

• (At least) three general-purpose DSMS prototypes underway– STREAMSTREAM (Stanford)

– AuroraAurora (Brown, Brandeis, MIT)

– TelegraphCQ TelegraphCQ (Berkeley)

• Cooperating to develop stream system stream system benchmarkbenchmark One goal: demonstrate that conventional systems

are far inferior for data stream applications

Page 10: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 10

Declarative Language for Declarative Language for Continuous QueriesContinuous Queries

• A distinction between Aurora and the other two systems (STREAM and TelegraphCQ)

– Aurora users directly manipulate a single large execution plan through a “boxes-and-arrows” interface

– STREAM compiles declarative queries into individual plans, system may merge plans

– STREAM also supports direct entry of plans

Page 11: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 11

Declarative Language (cont’d)Declarative Language (cont’d)

• STREAM’s declarative language is based on SQLSQL syntax + some aspects of semantics– TelegraphCQ’s also based on SQL but more

loosely

• Modifications to standard SQL semantics for:– Streams plus relationsStreams plus relations

– Continuous query resultsContinuous query results

• Additional constructs for sliding windowssliding windows and samplingsampling

Page 12: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 12

Aside: Query Language SemanticsAside: Query Language Semantics

• Around 1980

People were inventing, understanding, and implementing relational query languages

• Around 1990 People were inventing and implementing trigger

(active rule) systems without completely defining or understanding rule behavior

• Around 2000 Ditto for continuous queries

Page 13: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 13

Aside on Semantics (cont’d)Aside on Semantics (cont’d)

• The semantics of SQL queries is (relatively) easy to understand– Even lots of SQL queries running together

• The semantics of a single trigger is (relatively) easy to understand– But lots of triggers together can be complex

• The semantics of even a single continuous query may not be obvious– But lots running together is no harder

Page 14: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 14

A Nonobvious Continuous QueryA Nonobvious Continuous Query

• Stream of stock quotes: Stocks(ticker,price)

• Monitor last 10 minutes of quotes:

Select From Stocks [Range 10 minutes]

• Is result a relation, a stream, or something else?

• If a relation, what exactly does it contain?

• If a stream, how does query differ from:Select From Stocks [Range 1 minute]or Select From Stocks []

Page 15: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 15

Another Nonobvious CQAnother Nonobvious CQ

• Stream of ordered items, table of item prices

• Prices for five most recent ordered items:Select P.priceFrom Items [Rows 5] I, PriceTable PWhere I.itemID = P.itemID

• Is result a stream or a relation?

• What if item price changes?

Page 16: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 16

Our Original Working SemanticsOur Original Working Semantics

“The result of a continuous query at time T is the result of treating the streams up to T as relations and evaluating the query using standard relational semantics.”

+ Relies on existing well-understood semantics

– Asymmetric: streams-to-relations but not relations-to-streams

– Can break down for complex queries (subqueries, aggregation)

Page 17: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 17

Our Final Semantics and LanguageOur Final Semantics and Language

• Abstract: Abstract: interpretation for continuous queries based on certain “black boxes”

• Concrete: Concrete: SQL-based instantiation for our system; includes syntactic shortcuts, defaults, equivalences

Page 18: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 18

Goals in Language DesignGoals in Language Design

1) Support continuous queries over multiple streams and relations

2) Exploit relational semantics to the extent possible

3) Easy queries should be easy to write

4) Simple queries should do what you expect

Page 19: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 19

Example Query 1Example Query 1Two streams, contrived for ease of examples:

Orders (orderID, customer, cost) Fulfillments (orderID, clerk)

Total cost of orders fulfilled over the last day by clerk “Sue” for customer “Joe”

Select Sum(O.cost)From Orders O, Fulfillments F [Range 1 Day]Where O.orderID = F.orderID And F.clerk = “Sue” And O.customer = “Joe”

Page 20: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 20

Example Query 2Example Query 2

Using a 10% sample of the Fulfillments stream, take the 5 most recent fulfillments for each clerk and return the maximum cost

Select F.clerk, Max(O.cost)From Orders O, Fulfillments F [Partition By clerk Rows 5] 10% SampleWhere O.orderID = F.orderIDGroup By F.clerk

Page 21: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 21

Formalization (and Rest of Talk)Formalization (and Rest of Talk)

• Formal definitions for relations and streams

• Formal conversions between them

• Abstract semantics

• Concrete language: CQL

• Syntactic defaults and shortcuts

• Equivalence-based transformations

• Benchmark, system screenshots

Page 22: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 22

Relations and StreamsRelations and Streams

• Assume global, discrete, ordered time domain (more on this later)

• Relation– Maps time T to set-of-tuples R

• Stream– Set of (tuple,timestamp) elements

Page 23: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 23

ConversionsConversions

Streams Relations

Window specification

Special operators:Istream, Dstream, Rstream

Any relational query language

Page 24: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 24

Conversion DefinitionsConversion Definitions

• Stream-to-relation– S [W ] is a relation — at time T it contains all tuples

in window W applied to stream S up to T

– When W = , contains all tuples in stream S up to T

• Relation-to-stream– Istream(R) contains all (r,T ) where rR at time T

but rR at time T–1

– Dstream(R) contains all (r,T ) where rR at time T–1 but rR at time T

– Rstream(R) contains all (r,T ) where rR at time T

Page 25: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 25

Abstract SemanticsAbstract Semantics

• Take any relational query language

• Can reference streams in place of relations– But must convert to relations using any window

specification language ( default window = [] )

• Can convert relations to streams– For streamed results

– For windows over relations (note: converts back to relation)

Page 26: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 26

Query Result at Time Query Result at Time TT

• Use all relations at time T

• Use all streams up to T, converted to relations

• Compute relational result

• Convert result to streams if desired

Page 27: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 27

Abstract Semantics – Example 1Abstract Semantics – Example 1Select F.clerk, Max(O.cost)From O [], F [Rows 1000]Where O.orderID = F.orderIDGroup By F.clerk

• Maximum-cost order fulfilled by each clerk in last 1000 fulfillments

Page 28: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 28

Abstract Semantics – Example 1Abstract Semantics – Example 1Select F.clerk, Max(O.cost)From O [], F [Rows 1000]Where O.orderID = F.orderIDGroup By F.clerk

• At time T: entire stream O and last 1000 tuples of F as relations

• Evaluate query, update result relation at T

Page 29: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 29

Abstract Semantics – Example 1Abstract Semantics – Example 1Select IstreamIstream(F.clerk, Max(O.cost))From O [], F [Rows 1000]Where O.orderID = F.orderIDGroup By F.clerk

• At time T: entire stream O and last 1000 tuples of F as relations

• Evaluate query, update result relation at T

• Streamed result:Streamed result: New element (<clerk,max>,T) whenever <clerk,max> changes from T–1

Page 30: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 30

Abstract Semantics – Example 2Abstract Semantics – Example 2

Relation CurPrice(stock, price)

Select stock, Avg(price) From Istream(CurPrice) [Range 1 Day] Group By stock

• Average price over last day for each stock

Page 31: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 31

Abstract Semantics – Example 2Abstract Semantics – Example 2

Relation CurPrice(stock, price)

Select stock, Avg(price) From Istream(CurPrice) [Range 1 Day] Group By stock

• Istream provides history of CurPrice

• Window on history, back to relation, group and aggregate

Page 32: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 32

Digression: TimeDigression: Time• Relation updates carry timestamps too

• “Well-behaved” streams and relation updates:• Arrive in order

• No “delayed activity” – no long pauses then resume with very old timestamps

• Semantics of query language tacitly assumes well-behaved streams and updates• Poor behavior handled separately, not within

language

• Distinction from Aurora, which does not separate

Page 33: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 33

Time Digression (cont’d)Time Digression (cont’d)

• Simplest scenario: global system clock– Stream elements and relation updates

timestamped on entry to system

– Guaranteed ordering, no delayed activity

– Query results also based on system time

Page 34: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 34

Flexible Application-Defined TimeFlexible Application-Defined Time• Streams and relation updates contain

application timestamps– May arrive out of order

– May pause and resume

• Query results in application time

• Application generates “heartbeats”– Or deduce heartbeat from parameters: scrambling,

skew, latency, clock progress

– Reminder: separate from query language semantics

Page 35: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 35

Concrete Language – CQLConcrete Language – CQL

• Relational query language: SQL

• Window specification language derived from SQL-99– Tuple-based windows

– Time-based windows

– Partitioned windows

• Simple “X% Sample” construct– May incorporate emerging SQL standard

Page 36: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 36

CQL (cont’d)CQL (cont’d)

• Syntactic shortcuts and defaults– So easy queries are easy to write and simple

queries do what you expect

• Equivalences– Basis for query-rewrite optimizations

– Includes all relational equivalences, plus new stream-based ones

• Examples: already seen some, more later

Page 37: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 37

Shortcuts and DefaultsShortcuts and Defaults

• Prevalent stream-relation conversions can make some queries cumbersome– Easy queries should be easy to write

• Two defaults:– Omitted window:Omitted window: Default []

– Omitted relation-to-stream operator:Omitted relation-to-stream operator: Default Istream operator on:

• Monotonic outermost queries• Monotonic subqueries with windows

Page 38: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 38

The Simplest CQL QueryThe Simplest CQL QuerySelect From Strm

• Had better return Strm (It does)– Default [] window for Strm

– Default Istream for result

Page 39: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 39

Simple Join QuerySimple Join QuerySelect From Strm, Rel Where Strm.A = Rel.B

• Default [] window on Strm, but often want Now window for stream-relation joins

Select Istream(O.orderID, A.City)From Orders O, AddressRel AWhere O.custID = A.custID

Page 40: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 40

Simple Join QuerySimple Join QuerySelect From Strm, Rel Where Strm.A = Rel.B

• Default [] window on Strm, but often want Now window for stream-relation joins

Select Istream(O.orderID, A.City)From Orders O [[NowNow]], AddressRel AWhere O.custID = A.custID

• We decided against a separate default

Page 41: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 41

Equivalences and TransformationsEquivalences and Transformations

• All relational equivalences apply to all relational constructs directly– Queries are highly relational

• Many low-level transformations in implementation

• Two new language-based transformations, more to come– Window reductionWindow reduction

– Filter-window commutativityFilter-window commutativity

Page 42: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 42

Window ReductionWindow Reduction

Select Istream(L) From S [] Where C

is equivalent to

Select Rstream(L) from S [Now] Where C

• Question for audience – time to wake up!time to wake up!Why Rstream and not Istream in second query?

• Answer: Consider stream <5>, <5>, <5>, <5>, …

Page 43: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 43

Window Reduction (cont’d)Window Reduction (cont’d)

Select Istream(L) From S [] Where C

is equivalent to

Select Rstream(L) from S [Now] Where C

• First query form is very common due to defaults

• In a naïve implementation second form is much more efficient

Page 44: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 44

Some Subtleties (or Dirty Laundry):Some Subtleties (or Dirty Laundry):IstreamIstream, , RstreamRstream, and Windows, and Windows

• Example: Emit 5-second moving average on every timestep

Select Istream(Avg(A)) From S [Range 5 seconds]

Only emits a result when average changes

• To emit a result on every timestep

Select RstreamRstream(Avg(A)) From S [Range 5 seconds]

• To emit a result every second

Select Rstream(Avg(A)) From S [Range 5 seconds Slide 1 secondSlide 1 second]

Page 45: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 45

More onMore on Istream Istream, , RstreamRstream, and Windows, and Windows

• Example: windowed join, emit new result tuples whenever there’s a new input tuple

Select RstreamRstream() From S1 [w1], S2 [w2]

• Istream may not emit results in presence of duplicates

• Rstream emits entire join result at each timestep or slide interval

• Correct solution requires union of multiple joins, somewhat messy

Page 46: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 46

Some Windowing Rules of ThumbSome Windowing Rules of Thumb• Dstream used infrequently but is needed

• Istream and Dstream can be expressed using Rstream and other constructs — but don’t try it at home

• Istream always intuitive in presence of keys

• Rstream sensible with Now or Slide windows

• Row-based windows nondeterministic in presence of duplicate timestamps

• No need for partitioned time-based windows

Page 47: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

(end of long aside)(end of long aside)

Page 48: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 48

Filter-Window CommutativityFilter-Window Commutativity

• Another question for the audienceAnother question for the audience

When is

Select L From S [window] Where C

equivalent to

Select L From (Select L From S Where C) [window]

• Is this transformation always advantageous?

Page 49: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 49

Constraint-Based TransformationsConstraint-Based Transformations• Recall first example query (simplified)

Select Sum(O.cost)From Orders O, Fulfillments F [Range 1 Day]Where O.orderID = F.orderID

• If orders always fulfilled within one week

Select Sum(O.cost)From Orders O [Range 8 Days][Range 8 Days], Fulfillments F [Range 1 Day]Where O.orderID = F.orderID

• Useful constraints: keyskeys, stream referential stream referential integrityintegrity, clusteringclustering, orderingordering

Page 50: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 50

Evaluating a Query LanguageEvaluating a Query Language

• Is it expressive enough?

• Is it user-friendly?

• Can it be implemented?

• Can it be implemented efficiently?

Page 51: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 51

Evaluating a Query LanguageEvaluating a Query Language

• Is it expressive enough? YesYes, see our Stream Query RepositoryStream Query Repository

Online auctions, network traffic management, habitat monitoring, military logistics, immersive environments, road traffic monitoring

• Is it user-friendly? To be determinedTo be determined

• Can it be implemented? Yes, see our demoYes, see our demo

• Can it be implemented efficiently? We think so, with lots of fun researchWe think so, with lots of fun research

Page 52: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 52

Stream System Benchmark: “Linear Road”Stream System Benchmark: “Linear Road”

100 segments of 1 mile each

Main Input Stream: Car Locations (CarLocStr)

Linear City10 Expressways

Reports every 30 seconds

Reports every 30 seconds

car_id speed exp_way lane x_pos

1000 55 5 3 (Right) 12762

1035 30 1 0 (Ramp) 4539

… … … … …

Page 53: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 53

Linear Road BenchmarkLinear Road Benchmark

• Primarily developed by Aurora group– With some help from us

• Suite of continuous queries based on real traffic management proposals. Example CQs:– Stream car segments based on x-positions (easy)(easy)

– Identify probable accidents (medium)(medium)

– Compute toll whenever car enters segment (hard)(hard)

• Metric:Metric: Scale to as many expressways as possible without falling behind

Page 54: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 54

Easy ExampleEasy ExampleMonitor speed and segments of cars 1-100 Select car_id, speed, x_pos/5280 as segment From CarLocStr Where car_id >= 1 and car_id <= 100

Page 55: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 55

Hard ExampleHard Example Whenever a car enters a segment, issue it the

current toll for that segment

Page 56: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 56

Hard Example in CQLHard Example in CQL

CarSegEntryStr: Select Istream() From CurCarSeg

CurCarSeg: Select car_id, x_pos/5280 as seg, Location(expr_way, dir, x_pos/5280) as loc From CarLocStr [Partition By car_id Rows 1]

Select Rstream(E.car_id, E.seg, T.toll)From CarSegEntryStr [NOW] as E, SegToll as TWhere E.loc = T.loc

Page 57: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 57

Hard Example (cont’d)Hard Example (cont’d)SegToll: Select S.loc, BaseToll (V.volume – 150)2

From SegAvgSpeed as S, SegVolume as V Where S.loc = V.loc and S.avg_speed < 40.0

SegVolume: Select loc, Count() as volume From CurCarSeg Group By loc

SegAvgSpeed: Select loc, Avg(speed) as avg_speed From CarLocStr [Range 5 minutes] Group By location(expr_way, dir, x_pos/5280) as loc

Page 58: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 58

STREAM Query InterfaceSTREAM Query Interface

Page 59: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 59

STREAM Query InterfaceSTREAM Query Interface

Page 60: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 60

““Introspection” QueriesIntrospection” Queries• Real-time monitoring of query execution and

general system properties, for example:– Tuple-flow rate in a queue– Selectivity of a join– Ordering of a stream– Overall memory usage– Etc.

• Monitoring performed using regular CQL queries over extensible system-generated property streamproperty stream

Page 61: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 61

Query Plan MonitoringQuery Plan Monitoring

Page 62: CQL: A Language for Continuous Queries over Streams and Relations Jennifer Widom Stanford University Joint work with Arvind Arasu & Shivnath Babu stanfordstreamdatamanager

stanfordstreamdatamanager 62

For more information:– Talk to me this afternoon

– Visit our web site: http://www-db.stanford.edu/streamhttp://www-db.stanford.edu/stream

The Stanford STREAM Team: Arvind Arasu, Brian Babcock, Shivnath Babu, John Cieslewicz, Mayur Datar, Keith Ito, Rajeev Motwani, Itaru Nishizawa, Utkarsh Srivastava, Dilys Thomas