15
IBM Streams 28 August 2017 Roger Rea, IBM Streams Offering Manager [email protected] The Past, Present and Future of Real-time Analytics Analyze more, store less, and act now Eleventh International Workshop on Real-Time Business Intelligence and Analytics August 28, 2017 - Munich, Germany

The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

  • Upload
    buique

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

IBM Streams

28 August 2017Roger Rea, IBM Streams Offering Manager

[email protected]

The Past, Present and Future of Real-time Analytics

Analyze more, store less, and act now

Eleventh International Workshop on

Real-Time Business Intelligence and Analytics

August 28, 2017 - Munich, Germany

Page 2: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Streaming in the past

1954 – The first super computer IBM SAGE

Semi-automatic ground environment

250 tons and 60,000 vacuum tubes

Designed to coordinate radar stations

and direct airplanes to intercept

incoming planes

Remained in continuous operation until

1983, over 20 years

2Sources: Wikipedia – SAGE and AN/FSQ-72

Page 3: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Streaming analytics – A paradigm shift

Historical Fact Finding

Analyze Persisted Data

Batch Philosophy

Pull Approach

On-Demand

Analyze the Current Moment “Now”

Analyze Data Directly “In Motion”

Analyze Data at Speed it is Created

Push Approach

Continuous Insights

Traditional Approach Streaming Analytics

Data Store Analysis Insight Data

Aggregate

Analysis Insight

Raw

3Source: Father Son & Co. My Life at IBM and Beyond by Thomas J Watson, Jr. Page 2313

Page 4: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

IBM Research papers and Patents related to IBM Streams

IBM Research page related to Streaming Analytics

Tab with list of over 120 publications

Earliest publication, 2004

• Interval query indexing for efficient stream processing

K L Wu, S K Chen, P S Yu

Proceedings of the thirteenth ACM international conference on Information and

knowledge management, pp. 88--97, 2004

– 2010 Patent review: Over 200 applications, over 40 approved

– 2017 Patent review: Over 60 new applications, over 140 approved

Source: IBM4

Page 5: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Streaming in the Present: A very crowded market

CEP Vendors:

Proprietary

1. 2000: Software AG Apama (acquisition

2013)

2. 2003: Tibco Streambase (acquisition 2013)

3. 2004: IBM ODM (merger of Aptsoft and

iLog acquisition) – Decision Management,

not CEP

4. 2004: Tibco Business Events

5. 2005: SAP Event Stream Processing (from

Sybase EP, merger of Aleri & Coral8)

6. 2006: Oracle Event Processing

7. 2007: Informatica Rule Point

8. 2009: Microsoft StreamInsight

9. 2012: Fujitsu Big Data CEP Server

Open Source

1. 2006: Esper

2. 2008: Redhat Drools Fusion

3. 2010: WS02 CEP Server

Streaming Vendors: Proprietary

1. 2003: IBM Streams (commercial v1 2009)

2. 2006: Cisco Prime Analytics (Truviso,

acquired 2012)

3. 2010: Hitachi uContinuous Stream Data

Platform

4. 2010: Vitria Operational Intelligence

5. 2011: SQLStream

6. 2011: Evam Event and Action Manager

7. 2012: Striim (originally WebAction)

8. 2013: SAS Event Stream Process

9. 2013: Amazon Kinesis Streams (in memory

store)

10. 2015: Microsoft Trill .NET

11. 2015 Microsoft Azure Stream Analytics

12. 2015: Unscrambl BRAIN

13. 2016 Amazon Kinesis Analytics

(SQLStream OEM)

SQL Query Based

Inference Rule Based

Event Condition Action Rule Based

Programmatic Based

Neural Net Based

Streaming Vendors: Open Source

1. 2010: Yahoo S4

2. 2011: Apache Storm

3. 2011: Typesafe Reactive Platform

(Akka, Scala)

4. 2013: Spring XD

5. 2013: Apache Samza

6. 2013: Apache Spark Streaming

(microbatch)

7. 2014: Data Torrent Real Time

Streaming/Apache Apex

8. 2014: Apache Flink Streaming

9. 2014: Google Millwheel Framework

10. 2014 Tigon Cask

11. 2014: Apache NiFi

12. 2015: eBay Pulsar

13. 2015: Google Dataflow/Apache Beam

14. 2016: Apache Edgent

15. 2016: Twitter Heron

16. 2016: Apache Kafka Streaming

17. 2017: AirBnB StreamAlert

SOURCES: Author Experience, Forrester, Bloor Research,

Complex Events , Predictive Analytics Today5

Page 6: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

IBM Streams at a glanceNearly 200 operators with 1300 functions

Hadoop

Data

Warehouse

Communications Data Sources

TCP/IP

UDP/IP

HTTP

FTP

RSS

Messaging Toolkit (Kafka, XMS, IBM

MQ, Apache ActiveMQ, RabbitMQ, MQ

TT, MQ Low Latency Messaging)

IBM DataStage

IBM Data Replication

Functions:

• Filter

• Enrich

• Normalize

• Windowed Aggregations

• Machine Learning

• Scoring (SPSS, R,

SparkML, Python)

• CEP & Pattern Matching

• Geospatial

• Video/Image

• Text Analytics (AQL)

• Speech to Text

• Rules

IBM Streams

Scale-out RuntimeHadoop: HDFS, GPFS, Hive, Hbase,

BigSQL, Parquet, Thrift, Avro

RDBMS: IBM DB2, IBM DB2 Parallel

writer, IBM Informix, IBM BigInsights

BigSQL, IBM Netezza,

IBM Netezza NZLoad, solidDB, Oracle,

Microsoft SQL Server, MySQL,

Teradata, Aster, HP Vertica

NoSQL:

Key Value Stores (Memcached, Redis,

Redis-Cluster, Aerospike)

Column Oriented Stores (Cassandra,

Hbase)

Document Oriented Stores (IBM

Cloudant, Mongo, Couchbase)

NoSQL

Application Development

Streams Processing Language

Visual or Text

Java

Scala

Python

6

Page 7: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Machine Learning“The science of getting computers to act without being explicitly programmed”

“Systems that can learn from data”

Many categories of Machine Learning:

• Supervised, Unsupervised and Reinforcement Learning

• Decision Trees, Regressions, Classification, Clustering, Filtering, Associations

• Single variant, Multi-variant

Data

7

Page 8: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Streams Machine LearningUnsupervised: Learn as you go in Streams

– Time Series toolkit has about 20 algorithms

• Continuous update of model and making of predictions

• Anomaly Detection, Classification, Regressions, Clustering, Filtering

Supervised: Learn offline and Score models in Streams– PMML import: Classification, Clustering, Regression, Association

– SPSS import: all SPSS models, including data preparation

– Spark MLLib: Classification, Regression, Trees, Clustering, Filtering

– R scripts: Classification, Regression, Trees, Clustering, Filtering

– Python: Classification, Regression, Trees, Clustering, Filtering

Redeploy updated models without stopping Streams application

Data

8

Page 9: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

adopts IBM Streams

Personal Weather Stations

World’s largest PWS network: 250k+ worldwide

Doubling annually since 2015

SOURCE: weather.com9

9

Page 11: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Verizon uses IBM Streams to

deliver Cognitive Customer Care Speech to TextListens side by side to

agent-customer

conversation

Intent Detection Comprehends the

discussion and classifies the

intent

Scoring & Next

Best ActionIdentifies proactive and

reactive relevant content

Contextual Assist Delivers cognitive agent

assist

Source: ibm.com

case studies11

Page 12: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Areas to consider for value Functionality required

Reduced hardware footprint

Developer & Admin productivity

Agility to quickly react to new data

Savings sooner via faster development

High Availability/limited downtime

Comparable software prices

New releases of software

Smarter business insights

Vendor Sales Team

Vendor Tech Sales Team

Vendor Quality

Vendor Software Development processes

Vendor Research

Breadth of Vendor offerings

Worldwide or local support

Flexible software

Legal

Governance

Open Source

Patents

Security

Developer availability

Community

Tangible Benefits

Intangible Benefits

Reduced Risk

12

Page 13: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

One technology becomes winner take all (2% odds)

Half the current vendors/technologies drop out in 5 years (20% odds)

Half the current vendors/technologies drop out in 10 years (50% odds)

Apache Beam becomes a uniting development API (20% odds)

Sophisticated, cognitive apps with dozens of data sources become

pervasive within 5 years (40% odds)

Data Volumes, Varieties and Velocities will continue to grow (100% odds)

Streaming Analytics outpaces traditional Hadoop/Spark market (70% odds)

Streaming in the near future

These opinions are from the author, Roger Rea, and do not necessarily represent IBM13

Page 14: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Streaming in the Far Future

Foundation by Isaac Asimov

Mathematician Hari Seldon

Mathematics known as psychohistory

Predict the future, at large scale

Source:

Wikipedia

14

Page 15: The Past, Present and Future of Real-time Analyticsdb.cs.pitt.edu/birte2017/files/Rao-BIRTE2017-invited-Streaming... · 12. 2015: eBay Pulsar 13. 2015: ... Hadoop Data Warehouse Communications

Thank you

Roger Rea, IBM Streams Offering Manager

[email protected]

Eleventh International Workshop on

Real-Time Business Intelligence and Analytics

August 28, 2017 - Munich, Germany

15