Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Laboratory for Advanced Collaboration
LAC
Microsoft Azure
Stream Analytics
Marcos Roriz and Markus Endler
Laboratory for Advanced Collaboration (LAC)
Departamento de Informática (DI)
Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)
3
Microsoft Azure
Microsoft’s cloud computing solution
Several Open Source Components
Windows and Linux VMs
6
Microsoft Azure Stream Analytics
Middleware for data stream processing
Offered as a service (Platform as a Service – PAAS)
Provides a SQL-like continuous query language
Developer write “Stream Analytics” Jobs instead of imperative code
Can integrate with visualization frameworks
7
Microsoft Azure Stream Analytics
Stream Analytics only process data in the cloud.
Thus, how can we send data to Stream Analytics Jobs (cloud)?
Event Hubs (streams), BLOB (static)
Event Hubs is a publish/subscribe middleware.
Bindings for multiple languages.
.NET, Java, Python, etc.
8
Stream Analytics Architecture
Architectural Overview
Event Hubs
StreamAnalytics
IncomingData
BLOB
Event Hubs
BLOB
9
Sending Data - Event Hubs
Each Event Hub represent an Event.
Need to be in JSON, CSV, or AVRO format.
Example: Temperature data sent to an event hub.
Event Hubs
IncomingData
String msg = "{\"id\": 123456,\"reading\": 28}"byte[] payloadBytes = msg.getBytes();EventData sendEvent = new EventData(payloadBytes);
EventHubClient ehClient =EventHubClient.createFromConnectionString("eventConnectionKey");
ehClient.sendSync(sendEvent);
Event Hub Middleware (API)
10
Microsoft Azure Stream Analytics
SQL-like language (similar to StreamInsight)
Input: Event Hubs and/or BLOBs
Output: Event Hubs and/or BLOBS
Example: (Temperature Stream)
SELECT id, readingINTO HighTemperatureStreamFROM TemperatureStreamWHERE reading > 20
11
Stream Analytics Query Language
Implicit or External Time (timestamp column)
Tumbling, Hopping, Sliding Time Windows
12
Stream Analytics Query Language
Standard SQL aggregate functions
AVG, SUM, COUNT, MIN, MAX
Example:
Tumbling Window to compute AVG reading per sensor
SELECT id, AVG(reading)INTO AVGTemperatureStreamFROM TemperatureStream
GROUP BY id, TUMBLINGWINDOW(5, s)
13
Expanding Azure Stream Analytics
Multiple event hub consumers
Each consumer have their own reader (similar to Apache Kafka)
14
Expanding Azure Stream Analytics
Each Stream Analytics Job implements a single continuous query
Network of Event Hubs and Stream Analytics Jobs
16
Example step-by-step
Telecommunications and SIM fraud detection in real-time
Large volume of Call Detail Records (CDR)
Jobs:
Pare this data down to a manageable amount and obtain insights about
customer usage over time and geographical regions.
Detect SIM fraud (multiple calls coming from the same identity around
the same time but in geographically different locations) in real-time
We will use an existing simulator to generate the input data
stream.
17
Step 1: Create an Event Hub
We will create an event hub to receive the input stream.
In the Azure Portal go to:
18
Step 2: Create a Consumer Group (Event Hub)
Create a consumer group to consume data from this hub.
In the Azure Portal go to:
Event hub created
Consumer group
Create new consumer group (bottom of the page)
19
Step 2: Create a Consumer Group (Event Hub)
Create a consumer group to consume data from this hub.
In the Azure Portal go to:
Event hub created
Consumer group
Create new consumer group (bottom of the page)
20
Step 3: Grant access to consume/send events
Create an access policy
for the Event Hub.
In the Azure Portal go to:
Event hub created
Configure Tab
Create a policy with
management permissions
Save
21
Step 4: Generate Input Data Stream
Simulator (uses the event bus middleware to send messages)
Other applications need to the use the middleware API to send/receive data.
Data sent:
22
Step 4: Generate Input Data Stream
Program: Download Link ( )
Need to use Event Hub key to connect with the azure server
Get connection info in event hub panel (at the bottom)
23
Step 4: Generate Input Data Stream
Edit with this info (remove entity part).
Use connection info and event hub name (CallEventHub)
<?xml version="1.0" encoding="utf-8"?><configuration><appSettings><!-- Service Bus specific app setings for messaging connections --><add key="EventHubName" value="calleventhub"/><add key="Microsoft.ServiceBus.ConnectionString" value="Endpoint=sb://rorizhelloworldhub-
ns.servicebus.windows.net/;SharedAccessKeyName=managepolicy;SharedAccessKey=doggvCnbnq56nwNrdeEGaPsGAOfpTpsZV6mcCmghVqo="/></appSettings>...
24
Step 4: Generate Input Data Stream
Generate the data stream
telcodatagen.exe [#NumCDRsPerHour]
[SIM Card Fraud Probability] [#DurationHours]
26
Step 6: Link Event Hub to Stream Analytics Job
Click in the Stream Analytics Jobs
Go to Input tab
Add Input
28
Step 7: Get Sample Data
Before we design the query it’s recommended to test the query
with sample data
External input or sample the data stream
Go to Stream Analytics Input Tab
Then choose Sample Data
30
Step 8: Create Continuous Query
Click on the Query tab:
Write down the query.
Use sample data to see the output.
32
Step 8: Create Continuous Query
The interface now presents the test query output over the
sample data.
36
Step 10: Create Output
Remember, one continuous query per Stream Analytics job
Create BLOB Storage. In Azure portal go to:
38
Step 11: Link Output to Stream Analytics Job
Go to Output Tab in Stream Analytics Job
Add Output (BLOB, Event Hub, etc)
39
Step 11: Link Output to Stream Analytics Job
Go to Output Tab in Stream Analytics Job
Add Output (BLOB, Event Hub, etc)
Choose Output Name
(we will use this in the query)
Pick the desired output (event hub/blob)
40
Go to Query Tab in Stream Analytics Job
Add INTO StreamOutput in the query
Save
Step 12: Change query to refer to Output
43
Pro:
Fast and easy to deploy
SQL-like declarative language
Scale the processing units
Cons:
You cannot dynamically change/alter a Stream Analytics Job
• Complex task due to state transfer, losing events, etc
• However, you can create new Jobs with the new queries
You need several “Event Hubs” to make an event processing network
Conclusion, Pro and Cons
44
Stream Analytics is priced on two variables:
Volume of data processed
Streaming units required to process the data stream
Pricing
* Streaming unit is a unit of compute capacity with a maximum throughput of 1MB/s
45
Daily Azure Stream Analytics cost for 1 MB/sec of average processing
Volume of Data Processed Cost -
$0.0005 /GB * 84.375 GB = $0.08 per day, streaming max 1 MB/s non-stop
Streaming Unit Cost -
$.031 /hr * 24 hrs = $0.74 per day, for 1 MB/sec max. throughput
Total cost -
$0.38 + $0.08 = $0.82 per day -or- ~$24.60 per month
Example Pricing
46
Conclusions
Thanks!