View
82
Download
1
Category
Tags:
Preview:
Citation preview
© 2015 IBM Corporation
Timeseries Toolkit – What’s New!
IBM InfoSphere Streams Version 4.0
James Cancilla
Streams Toolkit Developer
For questions about this presentation contact James Cancilla -
cancilla@ca.ibm.com
2 © 2015 IBM Corporation
Important Disclaimer
THE INFORMATION CONTAINED IN THIS PRESENTATION IS PROVIDED FOR INFORMATIONALPURPOSES ONLY.
WHILE EFFORTS WERE MADE TO VERIFY THE COMPLETENESS AND ACCURACY OF THEINFORMATION CONTAINED IN THIS PRESENTATION, IT IS PROVIDED “AS IS”, WITHOUT WARRANTYOF ANY KIND, EXPRESS OR IMPLIED.
IN ADDITION, THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY,WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OROTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
NOTHING CONTAINED IN THIS PRESENTATION IS INTENDED TO, OR SHALL HAVE THE EFFECT OF:
• CREATING ANY WARRANTY OR REPRESENTATION FROM IBM (OR ITS AFFILIATES OR ITS ORTHEIR SUPPLIERS AND/OR LICENSORS); OR
• ALTERING THE TERMS AND CONDITIONS OF THE APPLICABLE LICENSE AGREEMENTGOVERNING THE USE OF IBM SOFTWARE.
IBM’s statements regarding its plans, directions, and intent are subject to change orwithdrawal without notice at IBM’s sole discretion. Information regarding potentialfuture products is intended to outline our general product direction and it should notbe relied on in making a purchasing decision. The information mentioned regardingpotential future products is not a commitment, promise, or legal obligation to deliverany material, code or functionality. Information about potential future products maynot be incorporated into any contract. The development, release, and timing of anyfuture features or functionality described for our products remains at our solediscretion.
THIS INFORMATION IS BASED ON IBM’S CURRENT PRODUCT PLANS AND STRATEGY, WHICH ARE SUBJECT TO CHANGE BY IBM WITHOUT NOTICE.
IBM SHALL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING OUT OF THE USE OF, OR OTHERWISE RELATED TO, THIS PRESENTATION OR ANY OTHER DOCUMENTATION.
3 © 2015 IBM Corporation
Agenda
What’s New!
AnomalyDetector Operator
KMeansClustering Operator
DSPFilterFinite Operator
4 © 2015 IBM Corporation
Timeseries – What’s New!
9 New operators:– AnomalyDetector
– AutoForecaster2
– CrossCorrelate2
– CrossCorrelateMulti
– DSPFilter2
– DSPFilterFinite
– DWT2
– KMeansClustering
– VAR2
7 New Functions:– laggedCrosscorrelate()
– laggedConvolve
– dtw()
– dtw_itakura()
– dtw_sakoe_chiba()
– lcss()
– lpNorm()
5 © 2015 IBM Corporation
AnomalyDetector Operator
Capable of performing
online anomaly detection of
a time series
Detects anomalous
subsequences
Change the sensitivity of the
anomalies detected
6 © 2015 IBM Corporation
AnomalyDetector Operator
Many industries need to be able to detect anomalies as they occur in real-
time
Energy & Utility Natural Resource
Health Care Network Intrusion
7 © 2015 IBM Corporation
AnomalyDetector Operator
How it works:
Assume the following graph:
8 © 2015 IBM Corporation
AnomalyDetector Operator
How it works:
1. As data arrives it is saved in memoryWe will refer to this as the “reference pattern”
2. Also as data arrives, we will save another pattern in memoryWe will refer to this as the “current pattern”
9 © 2015 IBM Corporation
AnomalyDetector Operator
How it works:
3. Each time a tuple arrives, the current pattern is updated and is
then compared against a subsequence of the reference pattern
4. The compare operation will generate a score
Score = X1X1X2X3X4X5
10 © 2015 IBM Corporation
AnomalyDetector Operator
How it works:
5. A final score is calculated from each comparison with the
subsequence
6. If the score is above the confidence value specified in the
operator, an output tuple will be generated containing the current
(anomalous) pattern
11 © 2015 IBM Corporation
AnomalyDetector Operator
Parameters:
Output Functions:
Parameter Name Description
inputTimeseries Specifies the input attribute containing the time series data
inputTimestamp Specifies the input attribute containing timestamp data
patternLength Specifies the length of the ‘current pattern’
referenceLength The number of tuples to store as part of the ‘reference pattern’
patternCount The number of subsequence patterns that the current pattern will be compared against
stepSize Specifies how many steps the sliding window will shift (default value is 1)
confidence Limits the output to only those sequences that have a score equal to or greater than the specified value
Output function Description
getSubsequence() Returns a list<float64> that contains the anomalous pattern.
getScore() Returns the calculated score of the anomalous pattern.
getStartTime() Returns the start time of the anomalous pattern
getEndTime() Returns the end time of the anomalous pattern
12 © 2015 IBM Corporation
AnomalyDetector Operator
Additional Information
AnomalyDetector Operator – Info Center Page– http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc
/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$AnomalyDete
ctor.html
13 © 2015 IBM Corporation
KMeansClustering Operator
Clustering analysis is a popular
technique used to find natural
grouping of a set of objects
Cluster analysis is useful in
multiple fields such as biology,
medicine, business and social
media– In medicine, cluster analysis may be
used to distinguish between different
types of blood and tissue samples
– In social media, cluster analysis can
be used to distinguish between
different groups within large
communities
14 © 2015 IBM Corporation
The KMeansClustering operator uses
the K-Means algorithm to find groups
within a set of data
Summary of K-Means algorithm:
1. Determine how many clusters you
want to find
2. Randomly define a mean value for
each cluster
3. Using a set of training data, determine
which mean value each data point
is closest to
4. Once all of the data points have been assigned to a mean, recalculate the position of the
mean values
5. Repeat steps 3 & 4 until the mean values no longer move
KMeansClustering Operator
15 © 2015 IBM Corporation
KMeansClustering Operator
How the operator works:
sample1
sample2
sample3
sample4
.
.
.
sampleN
sampleN+1
.
.
.
Incoming data
generated k-means model
initial samples
sampleN,<cluster#>
sampleN+1,<cluster#>
16 © 2015 IBM Corporation
KMeansClustering Operator
Inputs
The KMeansClustering operator can accept data in two formats:
– As a list<float64>, where each tuple represents a single data point with
multiple dimensions• For example, a list with a value [10, 20] may represent the x- and y-coordinates of a
single data point
– A single float64 value• In this case, the operator must be configured with a window that has a fixed size
• The size of the window represents the number of dimensions in a single data-point
17 © 2015 IBM Corporation
KMeansClustering Operator
Parameters
Output Functions
Output function Description
getDataPoint() Returns the data point that was scored against the cluster
getClusterIndex() Returns the index of the cluster that the data point was assigned to
getClusterMean() Returns the mean of the cluster that the data point was assigned to
getClusterVariance() Returns the variance of the cluster that the data point was assigned to
getClusterLabel() Returns the label of the cluster that the data point was assigned to
Parameter Name Description
initSamples Specifies the initial number of tuples to use to build the cluster
clusters Specifies the number of clusters to generate
inputData Specifies the attribute that contains the data points
initMeans Specifies the initial set of cluster means
seed Specifies the seed value to use when randomly generating the initial mean values
clusterLabels Allows for setting the labels to use for each of the clusters
18 © 2015 IBM Corporation
KMeansClustering Operator
Additional Information
KMeansClustering Operator – Info Center Page– http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc
/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.modeling$KMeansClu
stering.html
19 © 2015 IBM Corporation
DSPFilterFinite Operator
Unlike the DSPFilter operator, the DSPFilterFinite operator operates on
signal segments
The operator ingests a complete time series signal and filters it
There are many applications where segments of a signal need to be filtered– For example, a call center may want to filter out the noise of phone calls prior to analysis
• Each phone call from each of the call center employees can be considered a finite-length time
series signal
20 © 2015 IBM Corporation
DSPFilterFinite Operator
DSPFilterFinite operator has the ability to set the filter parameters on a per-
tuple basis
Each incoming time series segment can be filtered using a different set of
filter parameters
Allows for a real-time, dynamic filter bank– “a filter bank is an array of band-pass filters that separates the input signal into multiple
components, each one carrying a single frequency sub-band of the original signal.” (Filter
bank – Wikipedia)
21 © 2015 IBM Corporation
DSPFilterFinite Operator
22 © 2015 IBM Corporation
DSPFilterFinite Operator
23 © 2015 IBM Corporation
DSPFilterFinite Operator
Parameters
Outputs
Parameter Name Description
inputTimeSeries Specifies the input attribute containing the signal segment
filterType Specifies the type of filter to apply (lowPass or highPass)
samplingRate Specifies the sampling rate
cutOffFrequency Specifies the cut off frequency
xcoef Allows for specifying the x-coefficients of the butterworth filter
ycoef Allows for specifying the y-coefficients of the butterworth filter
coefParameterFile Allows for specifying a file containing the x- and y-coefficients of the butterworth filter
Output function Description
filteredTimeSeries() Returns the filtered time series
getInputTimeSeries() Returns the input time series (useful when using windowing)
24 © 2015 IBM Corporation
DSPFilterFinite Operator
Additional Information
DSPFilterFinite Operator – Info Center Page– http://www-
01.ibm.com/support/knowledgecenter/SSCRJU_4.0.0/com.ibm.streams.toolkits.doc/doc
/tk$com.ibm.streams.timeseries/op$com.ibm.streams.timeseries.analysis$DSPFilterFini
te.html
25 © 2015 IBM Corporation
Questions?
Recommended