Microsoft StreamInsight

Preview:

DESCRIPTION

Microsoft StreamInsight, part of the recent SQL Server 2008 R2 release, is a new platform for building rich applications that can process high volumes of event stream data with near-zero latency. Mark Simms of Microsoft's SQLCAT will demonstrate the core skill sets and technologies needed to deliver StreamInsight enabled solutions, and discuss some of the core scenarios. Mark will provide a detailed walkthrough of the three major components of StreamInsight: input and output adapters, the StreamInsight engine runtime, and the semantics of the continuous standing queries hosted in the StreamInsight engine. This presentation includes hands-on demos, including building out a real-time data processing solution interacting with SQL Server and Sharepoint. You will learn: • The new capabilities StreamInsight brings to data processing and analytics, unlocking the ability to extract real time business intelligence from streaming data. • How StreamInsight interacts with and compliments other components of SQL Server and the rest of the Microsoft technology stack. • How to ramp up on the skills and technology necessary to build out end to end solutions leveraging streaming data sources.

Citation preview

SQL Server 2008 R2

StreamInsight

Speaker: Mark Simms

Microsoft SQLCAT

Silicon Valley SQL Server User Group

May, 2010

Mark Ginnebaugh, User Group Leader,

mark@designmind.com

masimms@microsoft.com

100000

Custom-built solutions that carry huge development and

customization costs

Facts/sec.

Load barrier is dictated by current choices of the solution, e.g., loading into databases, persisting into files. This is intrinsic because in current approaches no processing can be done till the data is loaded.

Traditional DW Analytics

Active DW analytics

Present

Time of interest

100000

10000

1000

100

carry huge development and customization costs

years months days hrs min sec

Load time in ETLET time in ETL

Analytical results need to reflect important changes in

business reality immediately and enable responses to them

with minimal latency

Database Applications Event-driven Applications

QueryParadigm

Ad-hoc queries or requests

Continuous standing queries

Latency Seconds, hours, days Milliseconds or less

5

Data Rate Hundreds of events/sec Tens of thousands of events/sec or more

Query Semantics

Declarative relational analytics

Declarative relational and temporal analytics

request

response

Eventoutput stream

input stream

Relational Database Applications

Latency

Months

Days

hours

Minutes

Operational Analytics

Applications, e.g., Logistics,

etc.

StreamInsight

Target Scenarios

Data Warehousing

Applications

Financial trading

Applications

Aggregate Data Rate (Events/sec.)

0 10 100 1000 10000 100000 ~1million

Seconds

100 ms

< 1ms

Manufacturing

ApplicationsMonitoring

Applications

ApplicationsWeb Analytics Applications

6

Da

ta S

tre

am

Da

ta S

tre

am

Power Utilities:• Energy

consumption• Outages• Smart grids• 100,000 events/sec

Visual trend-line and KPI monitoringBatch & product managementAutomated anomaly detectionReal-time customer segmentation

Web Analytics:• Click-stream data• Online customer

behavior• Page layout• 100,000 events /sec

Manufacturing:• Sensor on plant

floor• React through

device controllers• Aggregated data • 10,000 events/sec

Asset Instrumentation for Data Acquisition, Subscriptions to Data Feeds

Financial Services:• Stock & news feeds• Algorithmic trading• Patterns over time• Super-low latency• 100,000 events /sec

7

Da

ta S

tre

am

Stream Data Store & Archive

StreamInsight Engine

Da

ta S

tre

am

Asset Specs & Parameters

Real-time customer segmentation Algorithmic tradingProactive condition-based maintenance

• Threshold queries• Event correlation from

multiple sources• Pattern queries

Lookup

Industry trends

• Data acquisition costs are negligible

• Raw storage costs

Manage business via

KPI-triggered actions

Monitor KPIsRecord raw

data (history)

StreamInsightadvantage

• Process data incrementally, i.e., while it is in flight• Raw storage costs

are small and continue to decrease

• Processing costs are non-negligible

• Data loading costs continue to be significant

actions

Mine historical dataDevise new KPIs

data (history) flight

• Avoid loading while still doing the processing you want

• Seamless querying for monitoring, managing and mining

8

Event sources Event targets

Devices, Sensors Pagers &

StreamInsight Application at Runtime

StreamInsight Application Development

InputAdapters

OutputAdaptersStreamInsight Engine

Standing Queries

Query Logic

Devices, Sensors

Web servers

Event stores & Databases

Stock ticker, news feeds Event stores & Databases

Pagers &Monitoring devices

KPI Dashboards, SharePoint UI

Trading stations

Adapters AdaptersStreamInsight Engine

Query Logic

Query Logic

SELECT COUNT(*) FROM ParkingLot

WHERE type = ‘AUTO’

AND color = ‘RED’

red cars

last hour

Doesn’t seem like a great solution…

This is the streaming data paradigm in a nutshell –ask questions about data in flight.

Engine

AdaptersEngineEngine

Queries

Extensions

Host

visual debugger API

expressed

questionquestiondata

dataquestion

Tell me the just the color of each car that passes.

var result = from car in carStream

select new

{

car.Color

};

Give me only trucks.

var result = from car in carStream

where car.Type == “Truck”

select car;

Tell me the number of cars passedevery 10 seconds.

var result = from win in carStream.TumblingWindow(

TimeSpan.FromSeconds(10))

select new

{

count = win.Count()

};

var result = from win in carStream.TumblingWindow(

TimeSpan.FromSeconds(10))

select new

{

count = win.Count()

};

Count the number of cars for each make separately every 10 seconds.

var result = from car in carStreamvar result = from car in carStream

group car by car.make into eachGroup

from win in carStream.TumblingWindow(

TimeSpan.FromSeconds(10))

select new

{

make = eachGroup.Key,

count = win.Count()

};

application time

Current Time Indicators

public void EnqueueEvent(SourceData d)

{

var ev = CreateInsertEvent();

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };

ev.StartTime = d.timestamp;

Enqueue(ref ev);

}

public void EnqueueEvent(SourceData d)

{

if AdapterState

return

var ev = CreateInsertEvent();

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };

ev.StartTime = d.timestamp;

Enqueue(ref ev);

}

public void EnqueueEvent(SourceData d)

{

if AdapterState

return

var ev = CreateInsertEvent();

if (ev == null) return;

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };

ev.StartTime = d.timestamp;

Enqueue(ref ev);

}

public void EnqueueEvent(SourceData d)

{

if AdapterState

return

var ev = CreateInsertEvent();

if (ev == null) return;

ev.Payload = new MouseEvent { Id = d.id, Value = d.value };

ev.StartTime = d.timestamp;

if (Enqueue(ref ev) == EnqueueOperationResult.Full)

{

Ready();

return;

}

}

Use them wisely!

public class TimeWeightedAverage : CepTimeSensitiveAggregate<double, double>

{

public override doubleGenerateOutput(IEnumerable<IntervalEvent<double>> events,

WindowDescriptor windowDescriptor)WindowDescriptor windowDescriptor)

{

double avg = 0;

foreach (IntervalEvent<double> ev in events)

{

avg += intervalEvent.Payload * (ev.EndTime - ev.StartTime).Ticks;

}

return = avg / (windowDescriptor.EndTime –windowDescriptor.StartTime).Ticks;

}

}

To learn more or inquire about speaking opportunities, please contact:

Mark Ginnebaugh, User Group Leader

mark@designmind.com

Recommended