Upload
lisanl
View
93
Download
1
Tags:
Embed Size (px)
Citation preview
© 2015 IBM Corporation
Github Projects Overview
IBM InfoSphere Streams 4.0
Samantha Chan
Team Lead, Streams Toolkits Team
For questions about this presentation contact: [email protected]
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
IBMStreams Organization
Github Projects Overview
Github Releases
Support
How to get a toolkit?
Propose a new feature / Report a bug!
Propose a new project
How to participate?
4 © 2015 IBM Corporation
IBMStreams
https://github.com/IBMStreams
Open-source organization established on Github in March 2014
Goals:– Provide a platform and foster a community to extend and share Streams
programming resources (toolkits, samples, performance benchmark,
utilities, etc.)
– Allow us to deliver new toolkit functions in a more open, agile and rapid
manner
– Improve visibility of Streams programming resources and make them more
easily accessible.
5 © 2015 IBM Corporation
IBMStreams
March 2014– Started the 3 repositories:
• 3 toolkits from the Streams product
(HDFS, Messaging, Inet)
April 2015– 6 Adapters
– 5 Parsers and Formatters
– 5 Processing and Analytics
– 4 Utilities
– 5 Samples and Demo
– 6 Under Construction
– Total: 31 Projects
6 © 2015 IBM Corporation
Github Projects Overview - Adapters
These toolkits are included as part of Streams product.
HBase Toolkit (streamsx.hbase)
– The HBase toolkit provides support for interacting with Apache HBase systems from
InfoSphere Systems. We provide support for reading and writing to HBase.
HDFS Toolkit (streamsx.hdfs)
– The HDFS Toolkit provides operators that can read and write data from Hadoop
Distributed File System (HDFS).
Messaging Toolkit (streamsx.messaging)
– The Messaging Toolkit provides operators that allow your Streams application to read
and send messages to popular messaging systems, like Kafka, MQTT, Websphere MQ
and Apache MQ.
Inet Toolkit (streamsx.inet)
– The Inet toolkit provides support for common internet protocols. Supported protocols
include FTP, WebSocket, HTTP.
7 © 2015 IBM Corporation
Github Projects Overview - Adapters
These toolkits are only hosted on Github
Multi-Connection TCP Server Toolkit (streamsx.tcp)– This toolkit contains a TCPServer operator which allows for multi-
connections, and is a multi-threaded source operator. The operator accepts
text or binary data from one or more TCP sockets.
Mongo DB Toolkit (streamsx.mongoDB)– This toolkit provides support for insert and query support for MongoDB
8 © 2015 IBM Corporation
Github Projects Overview – Parsers and Formatters
These toolkits are only hosted on Github
JSON Toolkit (streamsx.json)– The JSON toolkit allows you to convert data from JSON to Streams tuples
format, and vice versa.
Document Toolkit (streamsx.document)– This toolkit allows extract text and metadata from documents in a binary
formats such as PDF, Word, Office, etc. For this purpose the toolkit implements
a DocumentSource operator. Some of the supported text extractors
are: Apache Tika, PDFBox, TrueZip, JUnrar, Plain Text
Bytes Toolkit (streamsx.bytes)– This toolkit is for ease developing and analysis of binary data. It provides
functions to process string data: ASCII to HEX, HEX to BIN, etc. It is also able
to extract raw string from binary data.
9 © 2015 IBM Corporation
Github Projects Overview – Parsers and Formatters
Thrift Toolkit (streamsx.thrift) – This toolkit provides Thrift server and client functionality.
Adaptive Parser (streamsx.adaptiveParser)– Repository is created for hosting operators for parsing structured text data
easily. The goal of this project is to ease the parsing, tuple structure definition,
and tuple mapping development steps in a Streams application.
10 © 2015 IBM Corporation
Github Projects Overview – Analytics and Processing
These toolkits are only hosted on Github
NgramHashing Toolkit (streamsx.ngrams)– This toolkit is intended to be a supplement to a wide range of algorithms for
calculating n-grams, such as NLP, machine learning, speech recognition,
compression, etc.
Regex Toolkit (streamsx.regex)– This toolkit provides support for the RE2 regular expression library.
Math Toolkit (streamsx.math)– This repository contains operators and functions for complex mathematics
and statistics.
11 © 2015 IBM Corporation
Github Projects Overview – Analytics and Processing
Date Time Toolkit (streamsx.datetime)– This toolkit contains additional operators and functions to process dates and
times in data.
Transportation Toolkit (streamsx.transportation)– It is intended to contain adapters to access transit feed as well as generic
transportation based operators and functions. Initial contribution provides
support to access live bus feeds from NextBus.
12 © 2015 IBM Corporation
Github Projects Overview – Utilities
Process Store Toolkit (streamsx.ps)– The Process Store toolkit provides a simple way for the SPL and C++
operators that are fused inside a single PE to share application specific state
information. It does this via a collection of APIs that can be called from any
part of the and C++ operator code.
Distributed Process Store Toolkit (streamsx.dps)– The Distributed Process Store toolkit provides a simple way for the SPL, C++
and Java operators belonging to a single or multiple applications to share
application state information via an external key-value store. Some of the
supported value stores systems are: Memcached, Redis, Cassandra, IBM
Cloudant, HBase, Mongo, Couchbase and Aerospike.
13 © 2015 IBM Corporation
Github Projects Overview – Utilities
Plumbing Toolkit (streamsx.plumbing)– This toolkit contains an operator that allows your application to
dynamically manipulate application tuple flow to achieve best performance.
Utilities Project (streamsx.utility)– This repository contains useful utilities for InfoSphere Streams. For example,
the repository currently has an utility that displays CPU utilization for PEs in a
Streams Instance.
14 © 2015 IBM Corporation
Github Projects Overview – Samples and Demos
Samples– This project contains a set of useful sample Streams applications. E.g.
consistent region, Geospatial samples, HDFS samples, Timeseries
samples.
Benchmark (benchmarks)– This repository contains performance benchmark applications for
Streams. The project contains the two email processing applications, one is
written with InfoSphere Streams, and another one is written with Apache
Storm.
– These two benchmarks were used as part of a detailed performance
report: https://developer.ibm.com/streamsdev/wp-
content/uploads/sites/15/2014/04/Streams-and-Storm-April-2014-Final.pdf
Accelerator Demo Project (streamsx.demo.accelerator)– This repository contains a collection of demo streaming applications for
analyzing smart phone accelerometers or gyroscope data.
– https://developer.ibm.com/streamsdev/docs/streaming-realtime-
smartphone-data-infosphere-streams/
15 © 2015 IBM Corporation
Github Projects Overview – Samples and Demos
Resource Manager Project (resourceManagers)– This repository contains projects on getting Streams to work with other
resource managers, like Yarn.
Log Watch Demo (streamsx.demo.logwatch)– This repository contains a set of applications to demonstrate basic concepts
of SPL and Streams, while working through some real-world examples. The
applications are self-contained, small, easy to understand, with well-defined
problem statements.
16 © 2015 IBM Corporation
Coming Soon…
Approved Project Proposals– CDC Toolkit (streamsx.cdc)
• It provides support to efficiently read / write data from CDC.
– Parquet Toolkit (streamsx.parquet)• Parquet is a columnar storage format for Hadooop. This repository is created for
hosting operators for reading and writing data in Parquet format.
– Location Based Services (streamsx.locationbasedservices)• It is intended to contain a toolkit that provides generic location based services.
– Logging Toolkit (streamsx.logging)• It is intended to contain operators and functions for analysing log files.
– Social Toolkit (streamsx.social)• It is intended to contain operators and functions for integrating Streams with social
media sites.
– Patterns Repository (streamsx.pattern)• This repository is intended to host pattern classes and common functionality for Java
primitive operators.
17 © 2015 IBM Corporation
Github Releases
These projects have official releases:– HDFS, Messaging, HBase and Inet
Official releases are thoroughly tested and are recommended for
production environment
Official releases can also be included in the Streams project
For Inet v2.0 – a subset of operators are included in the product
Inet v2.5 – contains full set of operators, and should be stable and
work correctly for Streams v4.0.
18 © 2015 IBM Corporation
Github Releases
19 © 2015 IBM Corporation
Support
Operators / Toolkits that are part of the Streams v4.0 product
– Operators are supported by IBM support channels
– You may report problems by opening PMRs
– You may ask questions on DeveloperWorks
– You may ask questions by opening issues on Github
Toolkits / Projects that are only on Github
– Projects are supported by committers / contributors on the Github project
– You may ask questions by opening issues on Github – Issues are
monitored.
– You may ask questions on DeveloperWorks
– Committers will provide the best effort to support the project and provide
the help that you need.
20 © 2015 IBM Corporation
How to get Github Toolkits?
Demo– Download a release, and add as Streams toolkits in Studio
– Clone a repository and build it at command line or Studio
21 © 2015 IBM Corporation
Enhancement Requests / Report a bug
Enter enhancement requests and bug reports in the project that
you are working with.
Vote on existing enhancement / defects!
For enhancement requests, describe the usecase and how you
intend to use the feature.
For bug report, will be really helpful if you can describe:– Version of toolkit used
– SPL snippet
– Problems encountered
– Include domain / PE logs
if possible
22 © 2015 IBM Corporation
Propose a new project
Process is documented here:– https://github.com/IBMStreams/administration/blob/master/process.md
Enter a new issue in the Administration Project:– https://github.com/IBMStreams/administration
Proposal should include information on:– Provide a meaningful title for the issue
– Goal of the project
– Content of initial contribution
– Provide enough information for us to evaluate if the project should be added.
Project proposal will be evaluated and voted by Managing
Committee.
23 © 2015 IBM Corporation
How to Participate?
Many new and cool projects to try!– E.g. AdaptiveParsers, Document
We want your feedback and input!– Report any issues you have found
– If you think it’s cool, let us know!
– Which toolkit do you use? How do you use it?
– Do you want the toolkit to be included in the product?
Contribute code and samples!– Got a clever way to do things? Contribute it to the samples project!
– Need a new parameter? Need support for a new type of server / data format?
Work with us to try to implement it!
Got an idea?– Propose a new project / new feature.
24 © 2015 IBM Corporation
Questions?