24
© 2015 IBM Corporation Github Projects Overview IBM InfoSphere Streams 4.0 Samantha Chan Team Lead, Streams Toolkits Team For questions about this presentation contact: [email protected]

Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

  • Upload
    lisanl

  • View
    93

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

© 2015 IBM Corporation

Github Projects Overview

IBM InfoSphere Streams 4.0

Samantha Chan

Team Lead, Streams Toolkits Team

For questions about this presentation contact: [email protected]

Page 2: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

2 © 2015 IBM Corporation

Important Disclaimer

THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.

WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.

IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:

• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR

• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.

IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.

THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.

IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.

Page 3: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

3 © 2015 IBM Corporation

Agenda

IBMStreams Organization

Github Projects Overview

Github Releases

Support

How to get a toolkit?

Propose a new feature / Report a bug!

Propose a new project

How to participate?

Page 4: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

4 © 2015 IBM Corporation

IBMStreams

https://github.com/IBMStreams

Open-source organization established on Github in March 2014

Goals:– Provide a platform and foster a community to extend and share Streams

programming resources (toolkits, samples, performance benchmark,

utilities, etc.)

– Allow us to deliver new toolkit functions in a more open, agile and rapid

manner

– Improve visibility of Streams programming resources and make them more

easily accessible.

Page 5: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

5 © 2015 IBM Corporation

IBMStreams

March 2014– Started the 3 repositories:

• 3 toolkits from the Streams product

(HDFS, Messaging, Inet)

April 2015– 6 Adapters

– 5 Parsers and Formatters

– 5 Processing and Analytics

– 4 Utilities

– 5 Samples and Demo

– 6 Under Construction

– Total: 31 Projects

Page 6: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

6 © 2015 IBM Corporation

Github Projects Overview - Adapters

These toolkits are included as part of Streams product.

HBase Toolkit (streamsx.hbase)

– The HBase toolkit provides support for interacting with Apache HBase systems from

InfoSphere Systems. We provide support for reading and writing to HBase.

HDFS Toolkit (streamsx.hdfs)

– The HDFS Toolkit provides operators that can read and write data from Hadoop

Distributed File System (HDFS).

Messaging Toolkit (streamsx.messaging)

– The Messaging Toolkit provides operators that allow your Streams application to read

and send messages to popular messaging systems, like Kafka, MQTT, Websphere MQ

and Apache MQ.

Inet Toolkit (streamsx.inet)

– The Inet toolkit provides support for common internet protocols. Supported protocols

include FTP, WebSocket, HTTP.

Page 7: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

7 © 2015 IBM Corporation

Github Projects Overview - Adapters

These toolkits are only hosted on Github

Multi-Connection TCP Server Toolkit (streamsx.tcp)– This toolkit contains a TCPServer operator which allows for multi-

connections, and is a multi-threaded source operator. The operator accepts

text or binary data from one or more TCP sockets.

Mongo DB Toolkit (streamsx.mongoDB)– This toolkit provides support for insert and query support for MongoDB

Page 8: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

8 © 2015 IBM Corporation

Github Projects Overview – Parsers and Formatters

These toolkits are only hosted on Github

JSON Toolkit (streamsx.json)– The JSON toolkit allows you to convert data from JSON to Streams tuples

format, and vice versa.

Document Toolkit (streamsx.document)– This toolkit allows extract text and metadata from documents in a binary

formats such as PDF, Word, Office, etc. For this purpose the toolkit implements

a DocumentSource operator. Some of the supported text extractors

are: Apache Tika, PDFBox, TrueZip, JUnrar, Plain Text

Bytes Toolkit (streamsx.bytes)– This toolkit is for ease developing and analysis of binary data. It provides

functions to process string data: ASCII to HEX, HEX to BIN, etc. It is also able

to extract raw string from binary data.

Page 9: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

9 © 2015 IBM Corporation

Github Projects Overview – Parsers and Formatters

Thrift Toolkit (streamsx.thrift) – This toolkit provides Thrift server and client functionality.

Adaptive Parser (streamsx.adaptiveParser)– Repository is created for hosting operators for parsing structured text data

easily. The goal of this project is to ease the parsing, tuple structure definition,

and tuple mapping development steps in a Streams application.

Page 10: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

10 © 2015 IBM Corporation

Github Projects Overview – Analytics and Processing

These toolkits are only hosted on Github

NgramHashing Toolkit (streamsx.ngrams)– This toolkit is intended to be a supplement to a wide range of algorithms for

calculating n-grams, such as NLP, machine learning, speech recognition,

compression, etc.

Regex Toolkit (streamsx.regex)– This toolkit provides support for the RE2 regular expression library.

Math Toolkit (streamsx.math)– This repository contains operators and functions for complex mathematics

and statistics.

Page 11: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

11 © 2015 IBM Corporation

Github Projects Overview – Analytics and Processing

Date Time Toolkit (streamsx.datetime)– This toolkit contains additional operators and functions to process dates and

times in data.

Transportation Toolkit (streamsx.transportation)– It is intended to contain adapters to access transit feed as well as generic

transportation based operators and functions. Initial contribution provides

support to access live bus feeds from NextBus.

Page 12: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

12 © 2015 IBM Corporation

Github Projects Overview – Utilities

Process Store Toolkit (streamsx.ps)– The Process Store toolkit provides a simple way for the SPL and C++

operators that are fused inside a single PE to share application specific state

information. It does this via a collection of APIs that can be called from any

part of the and C++ operator code.

Distributed Process Store Toolkit (streamsx.dps)– The Distributed Process Store toolkit provides a simple way for the SPL, C++

and Java operators belonging to a single or multiple applications to share

application state information via an external key-value store. Some of the

supported value stores systems are: Memcached, Redis, Cassandra, IBM

Cloudant, HBase, Mongo, Couchbase and Aerospike.

Page 13: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

13 © 2015 IBM Corporation

Github Projects Overview – Utilities

Plumbing Toolkit (streamsx.plumbing)– This toolkit contains an operator that allows your application to

dynamically manipulate application tuple flow to achieve best performance.

Utilities Project (streamsx.utility)– This repository contains useful utilities for InfoSphere Streams. For example,

the repository currently has an utility that displays CPU utilization for PEs in a

Streams Instance.

Page 14: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

14 © 2015 IBM Corporation

Github Projects Overview – Samples and Demos

Samples– This project contains a set of useful sample Streams applications. E.g.

consistent region, Geospatial samples, HDFS samples, Timeseries

samples.

Benchmark (benchmarks)– This repository contains performance benchmark applications for

Streams. The project contains the two email processing applications, one is

written with InfoSphere Streams, and another one is written with Apache

Storm.

– These two benchmarks were used as part of a detailed performance

report: https://developer.ibm.com/streamsdev/wp-

content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf

Accelerator Demo Project (streamsx.demo.accelerator)– This repository contains a collection of demo streaming applications for

analyzing smart phone accelerometers or gyroscope data.

– https://developer.ibm.com/streamsdev/docs/streaming-realtime-

smartphone-data-infosphere-streams/

Page 15: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

15 © 2015 IBM Corporation

Github Projects Overview – Samples and Demos

Resource Manager Project (resourceManagers)– This repository contains projects on getting Streams to work with other

resource managers, like Yarn.

Log Watch Demo (streamsx.demo.logwatch)– This repository contains a set of applications to demonstrate basic concepts

of SPL and Streams, while working through some real-world examples. The

applications are self-contained, small, easy to understand, with well-defined

problem statements.

Page 16: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

16 © 2015 IBM Corporation

Coming Soon…

Approved Project Proposals– CDC Toolkit (streamsx.cdc)

• It provides support to efficiently read / write data from CDC.

– Parquet Toolkit (streamsx.parquet)• Parquet is a columnar storage format for Hadooop. This repository is created for

hosting operators for reading and writing data in Parquet format.

– Location Based Services (streamsx.locationbasedservices)• It is intended to contain a toolkit that provides generic location based services.

– Logging Toolkit (streamsx.logging)• It is intended to contain operators and functions for analysing log files.

– Social Toolkit (streamsx.social)• It is intended to contain operators and functions for integrating Streams with social

media sites.

– Patterns Repository (streamsx.pattern)• This repository is intended to host pattern classes and common functionality for Java

primitive operators.

Page 17: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

17 © 2015 IBM Corporation

Github Releases

These projects have official releases:– HDFS, Messaging, HBase and Inet

Official releases are thoroughly tested and are recommended for

production environment

Official releases can also be included in the Streams project

For Inet v2.0 – a subset of operators are included in the product

Inet v2.5 – contains full set of operators, and should be stable and

work correctly for Streams v4.0.

Page 18: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

18 © 2015 IBM Corporation

Github Releases

Page 19: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

19 © 2015 IBM Corporation

Support

Operators / Toolkits that are part of the Streams v4.0 product

– Operators are supported by IBM support channels

– You may report problems by opening PMRs

– You may ask questions on DeveloperWorks

– You may ask questions by opening issues on Github

Toolkits / Projects that are only on Github

– Projects are supported by committers / contributors on the Github project

– You may ask questions by opening issues on Github – Issues are

monitored.

– You may ask questions on DeveloperWorks

– Committers will provide the best effort to support the project and provide

the help that you need.

Page 20: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

20 © 2015 IBM Corporation

How to get Github Toolkits?

Demo– Download a release, and add as Streams toolkits in Studio

– Clone a repository and build it at command line or Studio

Page 21: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

21 © 2015 IBM Corporation

Enhancement Requests / Report a bug

Enter enhancement requests and bug reports in the project that

you are working with.

Vote on existing enhancement / defects!

For enhancement requests, describe the usecase and how you

intend to use the feature.

For bug report, will be really helpful if you can describe:– Version of toolkit used

– SPL snippet

– Problems encountered

– Include domain / PE logs

if possible

Page 22: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

22 © 2015 IBM Corporation

Propose a new project

Process is documented here:– https://github.com/IBMStreams/administration/blob/master/process.md

Enter a new issue in the Administration Project:– https://github.com/IBMStreams/administration

Proposal should include information on:– Provide a meaningful title for the issue

– Goal of the project

– Content of initial contribution

– Provide enough information for us to evaluate if the project should be added.

Project proposal will be evaluated and voted by Managing

Committee.

Page 23: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

23 © 2015 IBM Corporation

How to Participate?

Many new and cool projects to try!– E.g. AdaptiveParsers, Document

We want your feedback and input!– Report any issues you have found

– If you think it’s cool, let us know!

– Which toolkit do you use? How do you use it?

– Do you want the toolkit to be included in the product?

Contribute code and samples!– Got a clever way to do things? Contribute it to the samples project!

– Need a new parameter? Need support for a new type of server / data format?

Work with us to try to implement it!

Got an idea?– Propose a new project / new feature.

Page 24: Streams GitHub Products Overview for IBM InfoSphere Streams V4.0

24 © 2015 IBM Corporation

Questions?