Upload
mark-ginnebaugh
View
3.943
Download
2
Embed Size (px)
DESCRIPTION
Microsoft StreamInsight, part of the recent SQL Server 2008 R2 release, is a new platform for building rich applications that can process high volumes of event stream data with near-zero latency. Mark Simms of Microsoft's SQLCAT will demonstrate the core skill sets and technologies needed to deliver StreamInsight enabled solutions, and discuss some of the core scenarios. Mark will provide a detailed walkthrough of the three major components of StreamInsight: input and output adapters, the StreamInsight engine runtime, and the semantics of the continuous standing queries hosted in the StreamInsight engine. This presentation includes hands-on demos, including building out a real-time data processing solution interacting with SQL Server and Sharepoint. You will learn: • The new capabilities StreamInsight brings to data processing and analytics, unlocking the ability to extract real time business intelligence from streaming data. • How StreamInsight interacts with and compliments other components of SQL Server and the rest of the Microsoft technology stack. • How to ramp up on the skills and technology necessary to build out end to end solutions leveraging streaming data sources.
Citation preview
SQL Server 2008 R2
StreamInsight
Speaker: Mark Simms
Microsoft SQLCAT
Silicon Valley SQL Server User Group
May, 2010
Mark Ginnebaugh, User Group Leader,
100000
Custom-built solutions that carry huge development and
customization costs
Facts/sec.
Load barrier is dictated by current choices of the solution, e.g., loading into databases, persisting into files. This is intrinsic because in current approaches no processing can be done till the data is loaded.
Traditional DW Analytics
Active DW analytics
Present
Time of interest
100000
10000
1000
100
carry huge development and customization costs
years months days hrs min sec
Load time in ETLET time in ETL
Analytical results need to reflect important changes in
business reality immediately and enable responses to them
with minimal latency
Database Applications Event-driven Applications
QueryParadigm
Ad-hoc queries or requests
Continuous standing queries
Latency Seconds, hours, days Milliseconds or less
5
Data Rate Hundreds of events/sec Tens of thousands of events/sec or more
Query Semantics
Declarative relational analytics
Declarative relational and temporal analytics
request
response
Eventoutput stream
input stream
Relational Database Applications
Latency
Months
Days
hours
Minutes
Operational Analytics
Applications, e.g., Logistics,
etc.
StreamInsight
Target Scenarios
Data Warehousing
Applications
Financial trading
Applications
Aggregate Data Rate (Events/sec.)
0 10 100 1000 10000 100000 ~1million
Seconds
100 ms
< 1ms
Manufacturing
ApplicationsMonitoring
Applications
ApplicationsWeb Analytics Applications
6
Da
ta S
tre
am
Da
ta S
tre
am
Power Utilities:• Energy
consumption• Outages• Smart grids• 100,000 events/sec
Visual trend-line and KPI monitoringBatch & product managementAutomated anomaly detectionReal-time customer segmentation
Web Analytics:• Click-stream data• Online customer
behavior• Page layout• 100,000 events /sec
Manufacturing:• Sensor on plant
floor• React through
device controllers• Aggregated data • 10,000 events/sec
Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds
Financial Services:• Stock & news feeds• Algorithmic trading• Patterns over time• Super-low latency• 100,000 events /sec
7
Da
ta S
tre
am
Stream Data Store & Archive
StreamInsight Engine
Da
ta S
tre
am
Asset Specs & Parameters
Real-time customer segmentation Algorithmic tradingProactive condition-based maintenance
• Threshold queries• Event correlation from
multiple sources• Pattern queries
Lookup
Industry trends
• Data acquisition costs are negligible
• Raw storage costs
Manage business via
KPI-triggered actions
Monitor KPIsRecord raw
data (history)
StreamInsightadvantage
• Process data incrementally, i.e., while it is in flight• Raw storage costs
are small and continue to decrease
• Processing costs are non-negligible
• Data loading costs continue to be significant
actions
Mine historical dataDevise new KPIs
data (history) flight
• Avoid loading while still doing the processing you want
• Seamless querying for monitoring, managing and mining
8
Event sources Event targets
Devices, Sensors Pagers &
StreamInsight Application at Runtime
StreamInsight Application Development
InputAdapters
OutputAdaptersStreamInsight Engine
Standing Queries
Query Logic
Devices, Sensors
Web servers
Event stores & Databases
Stock ticker, news feeds Event stores & Databases
Pagers &Monitoring devices
KPI Dashboards, SharePoint UI
Trading stations
Adapters AdaptersStreamInsight Engine
Query Logic
Query Logic
SELECT COUNT(*) FROM ParkingLot
WHERE type = ‘AUTO’
AND color = ‘RED’
red cars
last hour
Doesn’t seem like a great solution…
This is the streaming data paradigm in a nutshell –ask questions about data in flight.
Engine
AdaptersEngineEngine
Queries
Extensions
Host
visual debugger API
expressed
questionquestiondata
dataquestion
Tell me the just the color of each car that passes.
var result = from car in carStream
select new
{
car.Color
};
Give me only trucks.
var result = from car in carStream
where car.Type == “Truck”
select car;
Tell me the number of cars passedevery 10 seconds.
var result = from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
var result = from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
count = win.Count()
};
Count the number of cars for each make separately every 10 seconds.
var result = from car in carStreamvar result = from car in carStream
group car by car.make into eachGroup
from win in carStream.TumblingWindow(
TimeSpan.FromSeconds(10))
select new
{
make = eachGroup.Key,
count = win.Count()
};
application time
Current Time Indicators
public void EnqueueEvent(SourceData d)
{
var ev = CreateInsertEvent();
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
if (ev == null) return;
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
Enqueue(ref ev);
}
public void EnqueueEvent(SourceData d)
{
if AdapterState
return
var ev = CreateInsertEvent();
if (ev == null) return;
ev.Payload = new MouseEvent { Id = d.id, Value = d.value };
ev.StartTime = d.timestamp;
if (Enqueue(ref ev) == EnqueueOperationResult.Full)
{
Ready();
return;
}
}
Use them wisely!
public class TimeWeightedAverage : CepTimeSensitiveAggregate<double, double>
{
public override doubleGenerateOutput(IEnumerable<IntervalEvent<double>> events,
WindowDescriptor windowDescriptor)WindowDescriptor windowDescriptor)
{
double avg = 0;
foreach (IntervalEvent<double> ev in events)
{
avg += intervalEvent.Payload * (ev.EndTime - ev.StartTime).Ticks;
}
return = avg / (windowDescriptor.EndTime –windowDescriptor.StartTime).Ticks;
}
}
To learn more or inquire about speaking opportunities, please contact:
Mark Ginnebaugh, User Group Leader