46
Laboratory for Advanced Collaboration LAC Microsoft Azure Stream Analytics Marcos Roriz and Markus Endler Laboratory for Advanced Collaboration (LAC) Departamento de Informática (DI) Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)

Data Distribution Service - DDS - PUC-Rioendler/courses/RT-Analytics/transp/AzureRori… · Microsoft Azure Stream Analytics Stream Analytics only process data in the cloud. Thus,

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Laboratory for Advanced Collaboration

LAC

Microsoft Azure

Stream Analytics

Marcos Roriz and Markus Endler

Laboratory for Advanced Collaboration (LAC)

Departamento de Informática (DI)

Pontifícia Universidade Católica do Rio de Janeiro (PUC-Rio)

2

Topics

Azure Overview

Stream Analytics Programming Model

Step-by-step example

3

Microsoft Azure

Microsoft’s cloud computing solution

Several Open Source Components

Windows and Linux VMs

4

Cloud Computing Overview

5

Microsoft Azure Cloud Solutions

6

Microsoft Azure Stream Analytics

Middleware for data stream processing

Offered as a service (Platform as a Service – PAAS)

Provides a SQL-like continuous query language

Developer write “Stream Analytics” Jobs instead of imperative code

Can integrate with visualization frameworks

7

Microsoft Azure Stream Analytics

Stream Analytics only process data in the cloud.

Thus, how can we send data to Stream Analytics Jobs (cloud)?

Event Hubs (streams), BLOB (static)

Event Hubs is a publish/subscribe middleware.

Bindings for multiple languages.

.NET, Java, Python, etc.

8

Stream Analytics Architecture

Architectural Overview

Event Hubs

StreamAnalytics

IncomingData

BLOB

Event Hubs

BLOB

9

Sending Data - Event Hubs

Each Event Hub represent an Event.

Need to be in JSON, CSV, or AVRO format.

Example: Temperature data sent to an event hub.

Event Hubs

IncomingData

String msg = "{\"id\": 123456,\"reading\": 28}"byte[] payloadBytes = msg.getBytes();EventData sendEvent = new EventData(payloadBytes);

EventHubClient ehClient =EventHubClient.createFromConnectionString("eventConnectionKey");

ehClient.sendSync(sendEvent);

Event Hub Middleware (API)

10

Microsoft Azure Stream Analytics

SQL-like language (similar to StreamInsight)

Input: Event Hubs and/or BLOBs

Output: Event Hubs and/or BLOBS

Example: (Temperature Stream)

SELECT id, readingINTO HighTemperatureStreamFROM TemperatureStreamWHERE reading > 20

11

Stream Analytics Query Language

Implicit or External Time (timestamp column)

Tumbling, Hopping, Sliding Time Windows

12

Stream Analytics Query Language

Standard SQL aggregate functions

AVG, SUM, COUNT, MIN, MAX

Example:

Tumbling Window to compute AVG reading per sensor

SELECT id, AVG(reading)INTO AVGTemperatureStreamFROM TemperatureStream

GROUP BY id, TUMBLINGWINDOW(5, s)

13

Expanding Azure Stream Analytics

Multiple event hub consumers

Each consumer have their own reader (similar to Apache Kafka)

14

Expanding Azure Stream Analytics

Each Stream Analytics Job implements a single continuous query

Network of Event Hubs and Stream Analytics Jobs

15

Complete Picture

16

Example step-by-step

Telecommunications and SIM fraud detection in real-time

Large volume of Call Detail Records (CDR)

Jobs:

Pare this data down to a manageable amount and obtain insights about

customer usage over time and geographical regions.

Detect SIM fraud (multiple calls coming from the same identity around

the same time but in geographically different locations) in real-time

We will use an existing simulator to generate the input data

stream.

17

Step 1: Create an Event Hub

We will create an event hub to receive the input stream.

In the Azure Portal go to:

18

Step 2: Create a Consumer Group (Event Hub)

Create a consumer group to consume data from this hub.

In the Azure Portal go to:

Event hub created

Consumer group

Create new consumer group (bottom of the page)

19

Step 2: Create a Consumer Group (Event Hub)

Create a consumer group to consume data from this hub.

In the Azure Portal go to:

Event hub created

Consumer group

Create new consumer group (bottom of the page)

20

Step 3: Grant access to consume/send events

Create an access policy

for the Event Hub.

In the Azure Portal go to:

Event hub created

Configure Tab

Create a policy with

management permissions

Save

21

Step 4: Generate Input Data Stream

Simulator (uses the event bus middleware to send messages)

Other applications need to the use the middleware API to send/receive data.

Data sent:

22

Step 4: Generate Input Data Stream

Program: Download Link ( )

Need to use Event Hub key to connect with the azure server

Get connection info in event hub panel (at the bottom)

23

Step 4: Generate Input Data Stream

Edit with this info (remove entity part).

Use connection info and event hub name (CallEventHub)

<?xml version="1.0" encoding="utf-8"?><configuration><appSettings><!-- Service Bus specific app setings for messaging connections --><add key="EventHubName" value="calleventhub"/><add key="Microsoft.ServiceBus.ConnectionString" value="Endpoint=sb://rorizhelloworldhub-

ns.servicebus.windows.net/;SharedAccessKeyName=managepolicy;SharedAccessKey=doggvCnbnq56nwNrdeEGaPsGAOfpTpsZV6mcCmghVqo="/></appSettings>...

24

Step 4: Generate Input Data Stream

Generate the data stream

telcodatagen.exe [#NumCDRsPerHour]

[SIM Card Fraud Probability] [#DurationHours]

25

Step 5: Create a Stream Analytics Job

In the Azure Portal go to:

26

Step 6: Link Event Hub to Stream Analytics Job

Click in the Stream Analytics Jobs

Go to Input tab

Add Input

27

Step 6: Link Event Hub to Stream Analytics Job

Options:

Data Stream

Event Hub

Config

JSON

28

Step 7: Get Sample Data

Before we design the query it’s recommended to test the query

with sample data

External input or sample the data stream

Go to Stream Analytics Input Tab

Then choose Sample Data

29

Step 7: Get Sample Data

Specify the sample data length.

Download the data.

30

Step 8: Create Continuous Query

Click on the Query tab:

Write down the query.

Use sample data to see the output.

31

Step 8: Create Continuous Query

Click on test button

Choose the sample data for the test query.

32

Step 8: Create Continuous Query

The interface now presents the test query output over the

sample data.

33

Step 9: Refine the Query

Amount of incoming call per region in the last two hour.

34

Step 9: Refine the Query

Amount of fraudulent calls (different countries, less than 5 seconds):

35

Step 9: Refine the Query

Save Query

36

Step 10: Create Output

Remember, one continuous query per Stream Analytics job

Create BLOB Storage. In Azure portal go to:

37

Step 10: Create Output

Create a Container

38

Step 11: Link Output to Stream Analytics Job

Go to Output Tab in Stream Analytics Job

Add Output (BLOB, Event Hub, etc)

39

Step 11: Link Output to Stream Analytics Job

Go to Output Tab in Stream Analytics Job

Add Output (BLOB, Event Hub, etc)

Choose Output Name

(we will use this in the query)

Pick the desired output (event hub/blob)

40

Go to Query Tab in Stream Analytics Job

Add INTO StreamOutput in the query

Save

Step 12: Change query to refer to Output

41

Go to Stream Analytics Job Menu

Click Start

Step 13: Start the Stream Analytics Job

42

Azure Storage Explorer

Step 14: View Output

43

Pro:

Fast and easy to deploy

SQL-like declarative language

Scale the processing units

Cons:

You cannot dynamically change/alter a Stream Analytics Job

• Complex task due to state transfer, losing events, etc

• However, you can create new Jobs with the new queries

You need several “Event Hubs” to make an event processing network

Conclusion, Pro and Cons

44

Stream Analytics is priced on two variables:

Volume of data processed

Streaming units required to process the data stream

Pricing

* Streaming unit is a unit of compute capacity with a maximum throughput of 1MB/s

45

Daily Azure Stream Analytics cost for 1 MB/sec of average processing

Volume of Data Processed Cost -

$0.0005 /GB * 84.375 GB = $0.08 per day, streaming max 1 MB/s non-stop

Streaming Unit Cost -

$.031 /hr * 24 hrs = $0.74 per day, for 1 MB/sec max. throughput

Total cost -

$0.38 + $0.08 = $0.82 per day -or- ~$24.60 per month

Example Pricing