43
BASEL BERN BRUGG DÜSSELDORF FRANKFURT A. M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH Streaming Visualization Guido Schmutz DOAG Big Data 2018 20.9.2018 @gschmutz guidoschmutz.wordpress.com

Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

  • Upload
    others

  • View
    12

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF

HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH

Streaming Visualization

Guido Schmutz DOAG Big Data 2018 – 20.9.2018

@gschmutz guidoschmutz.wordpress.com

Page 2: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Guido Schmutz

Working at Trivadis for more than 21 years

Oracle ACE Director for Fusion Middleware and SOA

Consultant, Trainer Software Architect for Java, Oracle, SOA and

Big Data / Fast Data

Head of Trivadis Architecture Board

Technology Manager @ Trivadis

More than 30 years of software development experience

Contact: [email protected]

Blog: http://guidoschmutz.wordpress.com

Slideshare: http://www.slideshare.net/gschmutz

Twitter: gschmutz

Page 3: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Agenda

1. Visualization in Big Data Reference Architecture

2. How to implement „Data-in-Motion“?

3. Blueprints for Streaming Visualization

4. Blueprints for Stream Visualization – Implementation

• ,

Page 4: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Visualization in Big Data Reference

Architecture

Page 5: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Data Value Chain

Milliseconds • Place Trace • Serve ad • Enrich Stream • Approve Trans

Hundredths of Seconds • Calculate Risk • Leaderboard • Aggregate • Count

Second(s) • Retrieve Click

Stream • Show orders

Minutes • Backtest algo • BI • Daily Reports

Hours • Algo discovery • Log analysis • Fraud pattern match

Architekturen von Big Data Anwendungen

Page 6: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Traditional BI Infrastructures

Enterprise Data

Warehouse

ETL / Stored

Procedures

Bulk Source

DB

Extract

File

DB

Architekturen von Big Data Anwendungen

BI Tools

Search / Explore

Enterprise Apps

Logic

{ }

API

high latency

Page 7: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Page 8: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Event Source

Location

Telemetry

IoT

Data

Mobile

Apps

Social

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Event Stream

Page 9: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Bulk Source

Hadoop Clusterd Hadoop Cluster

Big Data Platform

BI Tools

Enterprise Data

Warehouse

SQL

Search / Explore

• Machine Learning • Graph Algorithms • Natural Language Processing

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

high latency

Enterprise Apps

Logic

{ }

API

File Import / SQL Import

DB

Extract

File

DB

Event Stream

Event Source

Location

IoT

Data

Mobile

Apps

Social

Big Data solves Volume and Variety – not Velocity

Introduction to Stream Processing

Event

Hub Event

Hub Event

Hub

Telemetry

Page 10: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

"Data at Rest" vs. "Data in Motion"

Data at Rest Data in Motion

Store

Act

Analyze

Store Act

Analyze

111010101010110

111010101010110

Introduction to Stream Processing

Page 11: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Event

Hub Event

Hub

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Stream Processing Architecture solves Velocity

BI Tools

Enterprise Data

Warehouse

Event

Hub

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Event

Stream

Event

Stream

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Introduction to Stream Processing

Low(est) latency, no history

Telemetry

Page 12: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Big Data for all historical data analysis

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Event

Stream

Event

Stream

Hadoop Clusterd Hadoop Cluster

Big Data Platform

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

Data Flow Event

Hub

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

File Import / SQL Import

Introduction to Stream Processing

Telemetry

Page 13: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Data Store

Integrate existing systems through CDC

Data

Event Hub

Integration

Consuming Systems

State Logic

CDC

CDC Connector

Traditional Silo-based

System

Logic User Interface

Capture changes directly on database

Change Data Capture (CDC) => think like

a global database trigger

Transform existing systems to event

producer

Event

Stream

Event

Stream

Introduction to Stream Processing

Page 14: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Hadoop Clusterd Hadoop Cluster

Stream Analytics

Platform

Integrate existing systems with lower latency through CDC

BI Tools

Enterprise Data

Warehouse

Search / Explore

Enterprise Apps

Search

Results Stream Analytics

Reference /

Models

Dashboard

Logic

{ }

API

Hadoop Clusterd Hadoop Cluster

Big Data Platform

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

File Import / SQL Import

Event

Stream

Event

Stream

Data Flow Event

Hub

Event

Stream

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Introduction to Stream Processing

Telemetry

Page 15: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Hadoop Clusterd Hadoop Cluster

Big Data

Unified Architecture for Modern Data Analytics Solutions

SQL

Search

BI Tools

Enterprise Data

Warehouse

Search / Explore

File Import / SQL Import

Event

Hub

Parallel

Processing

Storage

Storage

Ra

w

Re

fin

ed

Results

Microservice State

{ }

API

Stream

Processor State

{ }

API

Event

Stream

Event

Stream

Service

Stream Analytics

Microservices

Enterprise Apps

Logic

{ }

API

Edge Node

Rules

Event Hub

Storage

Bulk Source

Event Source

Location

DB

Extract

File

DB

IoT

Data

Mobile

Apps

Social

Event Stream

Telemetry

Page 16: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Two Types of Stream Processing

(from Gartner)

Introduction to Stream Processing

Stream Data Integration

• primarily focuses on the ingestion and

processing of data sources targeting real-

time extract-transform-load (ETL) and data

integration use cases

• filter and enrich the data

• optionally calculate time-windowed

aggregations before storing the results in a

database or file system

Stream Analytics

• targets analytics use cases

• calculating aggregates and detecting

patterns to generate higher-level, more

relevant summary information (complex

events)

• Complex events may signify threats or

opportunities that require a response from

the business through real-time dashboards,

alerts or decision automation

Page 17: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

How to implement „Data-in-

Motion“?

are

Page 18: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

”Data-in-Motion” Ecosystem

Stream Analytics

Event Hub

Open Source Closed Source

Stream Data Integration

Source: adapted from Tibco

Edge

Introduction to Stream Processing

Page 19: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Apache Kafka – A Streaming Platform

High-Level Architecture

Distributed Log at the Core

Scale-Out Architecture

Logs do not (necessarily) forget

Page 20: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Blueprints for Stream Visualization

are

Page 21: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

1) Direct Streaming to the Consumer

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

Page 22: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

2) Use a fast datastore and do regular polling from

consumer

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

Page 23: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

3) Use stateful Stream Analytics and query directly the

store

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

Page 24: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Blueprints for Stream Visualization

- Impementation

are

Page 25: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Visualization: many many options! But do they support

Streaming Data?

Page 26: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Oracle Stream Analytics

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Data Sources

Page 27: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Oracle Stream Analytics

• Stream Analytics and Visualization in

one

• offers real-time actionable business

insight on streaming data

• automates action to drive today’s agile businesses (business user)

• Runs on top of Spark Streaming

• Cloud and on-premises

• Data Sources: Kafka, JMS, GoldenGate,

File

Page 28: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Web Sockets / SSE / Custom Java Script Application

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow Sever Sent Event (SSE)

Page 29: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Slack / WhatsApp / Twitter / …

”Data in Motion”

Stream

Analytics

Event Hub

Integration

Streaming

Visualization

Channel

Consumer

Data Flow

Page 30: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

WebSockets vs. Server Sent Events (SSE)

WebSockets

• provide a richer protocol to perform bi-

directional, full-duplex communication

• require full-duplex connections and

new Web Socket servers to handle the

protocol

• Having a two-way channel is more

attractive for things like games,

messaging apps, and for cases where

you need near real-time updates in

both directions

SSE

• SSEs are sent over traditional HTTP

• do not require a special protocol or

server implementation to get working

• If only one direction is necessary,

• Server-Sent Events on the other hand,

have been designed from the ground

up to be efficient

Page 31: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

KSQL / REST API / Custom App

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

Page 32: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

KSQL & Arcadia Data

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Streaming

Visualization

Consumer Data Sources

Page 33: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Arcadia Data

• Combines Batch and Streaming

Visualization in one

• Streaming Visualizations based on

Confluent KSQL (Kafka)

• Acadia Instant and Arcadia Enterprise

Page 34: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Druid & Superset / Imply

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

Page 35: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

What is Druid?

• Open Source Time Series DB by

Metamarkets

• Apache Incubating

• Column-Oriented Storage

• Streaming and Batch Ingest

• Time optimized partitioning

• SQL Support

• Deep Storage can be HDFS / S3

Page 36: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Imply

• Commercial offering of Druid

• Built around Apache Druid

• Analytics, search and intelligence for

event-driven data

Page 37: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Superset

• Open source data visualization tool by

Airbnb

• Apache incubator

• Superset supports 30 types of

visualizations

• easy-to-use interface for exploring and

visualizing data

• Create and share dashboards

• Deep integration with Druid

• Integration with most SQL-speaking

RDBMS through SQLAlchemy

Page 38: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Elasticsearch / Kibana

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

Page 39: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Elasticsearch / Kibana

Elasticsearch

• NoSQL store

• a distributed, RESTful search and analytics

engine

• centrally stores your data so you can

discover the expected and uncover the

unexpected

• lets you perform and combine many types

of searches — structured, unstructured,

geo, metric

• aggregations let you zoom out to explore

trends and patterns in your data

Kibana

• Window into Elasticsearch

• Enables visual exploration and analysis of

data stored in Elasticsearch

Page 40: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

InfluxDB / Grafana or Chronograf

”Data in Motion”

Stream

Analytics

Event Hub

Integration

API Data Store Streaming

Visualization

Data Flow

Consumer Data Sources

Page 41: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

InfluxDB

InfluxDB

• Popular Time Series Database

• Open source as well as Commercial offering

Chronograf

Page 42: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Grafana

Grafana allows to query, visualize, alert

and understand metrics independent of

their storage

Supports various datasources

• Elasticsearch

• InfluxDB

• Prometheus

• OpenTSDB

• MySQL

• …

Page 43: Streaming Visualization - doag.org · time extract-transform-load (ETL) and data integration use cases filter and enrich the data optionally calculate time-windowed aggregations before

Technology on its own won't help you. You need to know how to use it properly.