View
226
Download
5
Tags:
Embed Size (px)
Citation preview
Telegraph: An Adaptive Global-Scale Query Engine
Joe Hellerstein
Scenarios
• Ubiquitous computing: more than clients!– sensors and their data feeds are key
• smart dust, biomedical (MEMS sensors)• each consumer good records (mis)use
– disposable computing
• video from surveillance cameras, broadcasts, etc.
• Global Data Federation– all the data is online – what are we waiting for?– The plumbing is coming
• XML/HTTP, etc. give LCD communication• but how do you query robustly over many sites in the
wide area?
There’s a Data Flood Coming
There’s a Data Flood Coming
• What does it look like?– Never ends: interactivity required– Big: data reduction/aggregation is key– Unpredictable: this scale of devices and
nets will not behave nicely
The Telegraph Query Engine
• Key technologies– Interactive Control
• interactivity with early answers• online aggregation for data reduction
– Continuously adaptive flow optimization• massively parallel, adaptive dataflow via
Rivers and Eddies
CONTROLContinuous Output, Navigation & Transformation with Refinement On Line
• Data-intensive jobs are long-running. How to give early answers and interactivity?– online interactivity over feeds: data “juggle”– online query processing algs: ripple joins– statistical estimators, and their performance
implications
• Appreciate interplay of massive data processing, stats, and UIs
CONTROLContinuous Output and Navigation Technology with Refinement On Line
CONTROLContinuous Output and Navigation Technology with Refinement On Line
River
• We built the world’s fastest sorting machine– On the “NOW”: 100 Sun workstations + SAN– But it only beat the record under ideal
conditions!• River: performance adaptivity for data
flows on clusters– simplifies management and programming– perfect for sensor-based streams
Eddy
• How to order and reorder operators over time– based on performance, economic/admin feedback
• Vs.River:– River optimizes each operator “horizontally”– Eddies optimize a pipeline “vertically”
Eddy
Telegraph: Putting it Together• Scalable, adaptive dataflow infrastructure. Apps
include…– sensor nets– massively parallel and wide-area query engines– net appliances: chaining xform8n/aggreg8n/etc. proxies– any unpredictable dataflow scenario
• Technology: a marriage of…– CONTROL, River & Eddy
• Many research questions here• E.g. how to combine River and Eddy adaptivity• E.g. how to tune Eddies for statistical performance goals
– Combinations of browse/query/mine at UI– Storage management to handle new hardware realities
Integration with Endeavour
• Give– Be data-intensive backbone to diverse clients– Be replication dataflow engine for OceanStore– Telegraph Storage Manager provides storage
(xactional/otherwise) for OceanStore– Provide platform for data-intensive “tacit info
mining”
• Take– Leverage OceanStore to manager distributed
metadata, security– Leverage protocols out of TinyOS for sensors
Additional Slides
• For use in questions, etc.
Connectivity & Heterogeneity
• Lots of folks working on data format translation, parsing– we will borrow, not build– currently using JDBC & Cohera Net Query
• commercial tool, donated by Cohera Corp. • gateways XML/HTML (via http) to ODBC/JDBC
– we may write “Teletalk” gateways from sensors• Heterogeneity
– never a simple problem– Control project developed interactive, online data
transformation tool: Potter’s Wheel
Potter’s Wheel Anomaly Detection