Upload
dave-callaghan
View
25
Download
0
Embed Size (px)
Citation preview
Big Fast Data
Enabling Perishable Insight at Scale
Perishable Insights
1.Data is perishable if there is only a limited amount of time to act upon that event.
Market data, clickstream, mobile devices, sensors, and transactions may contain valuable, but perishable insights.
2.Sieze opportunities and avoid crises amidst explosive complexity.
Streaming Analytics
“Streaming analytics is anything but a sleepy, rearview mirror analysis of data. No, it is about knowing and acting on what’s happening in your business at this very moment — now. Forrester calls these perishable insights because they occur at a moment’s notice and you must act on them fast within a narrow window of opportunity before they quickly lose their value. The high velocity, white-water flow of data from innumerable real-time data sources such as market data, Internet of Things, mobile, sensors, clickstream, and even transactions remain largely unnavigated by most firms. The opportunity to leverage streaming analytics has never been greater.”
Big Fast Data
Big DataEnable analytics of data at scale
PB's of data
Batch Processing (minutes to hours)
High latency
Fast DataEnable analytics of data in motion
Millions of events per second
Stream Processing (nanoseconds to seconds)
Low latency
Data Explosion
Computational Explosion
Big Fast Data
l Batch Processingl In a traditional query model, you store data and then
run queries on the data as needed.l Query-driven model
l Stream Processingl In a streaming data model, you store queries and then
continuously run data through the queries.l Event-driven model
Thinking in Streams
l Filteringl Streaming data can be filled with irrelevant or invalid
data since data typically does not go through a data governance step until it is stored in Hadoop.
l Aggregation/Correlationl Streaming data typically comes from multiple sources
and can be combined in multiple ways.l Location/Motion
l Mobile devices, from phones to wearables, are ubiquitous.
Thinking in Streams
l Time Windowsl By taking a snapshot of the data stream, time windows
can be used to provide time series analysis in real time (weighted moving averages, Bollinger Bands, etc)
l Temporal Patternsl Events can often contain interesting patterns relative
to new data coming in, such as data streaming at different times and different patterns of data at the same time.
Use Case for Fast Data Analytics
l Manage Riskl Market Surveillance in a high frequency world
l Maximize Rewardl Customer Satisfaction in the Internet of Things
Real-Time Market Surveillance
l Convergent threat systeml Support for historical, realtime and predictive modeling
l Support for Big Fast Datal Support for multi- and cross-asset class monitoring
l Support for cross-border surveillancel
Real-Time Market Surveillance
l Continuous predictive analysisl By leveraging historical data, a continuous predictive analysis can extrapolate events that have happened up to the current time and then predict a future event.
l For example, the normal operating parameters of an algorithm (size, frequency, instrument, order-to-trade ratio) can be established strictly based on history. Deviation from this norm can trigger defensive measures. Consider Knight Capital.
Real-Time Market Surveillance
l Convergent threat systeml A rogue algorithm can be shut down if its operating
outside of its normal behavior pattern.l A rogue trader, market manipulator or insider dealer can
be identified based on rules specified by Compliance.l The key is a unified framework. l Provide a unified data ingestion system to ingest data
from the large number of disconnected (and unconnectable) data sources.
Disintermediation
l Mobile wallets pose the same threat level to banks as self-directed equities trading.
l DBS uses its vast amount of client infromation to categorize customers, statistically predict behavior and send targeted promotional offers to their client's cell phones when they are engaged with a business partner.
IoT Smart Bank
l A location-aware promotion application requires continuous, event-driven queries that continuously monitoring , analyzing and responding to the changing status of subscribers and promotions.
l These queries are multidimensional; matching location, context, preferences, socioeconomic factors, and spending habits. These dimensions are in flux.
IoT Smart Bank
l The numbersl 1,000 people moving once per minute equals 17 events per second
l 1 million people would generate 16,667 events per second
l Wells Fargo has 70 million customers.
Streaming Use Cases
l Network monitoringl Intelligence and surveillancel Risk managementl E-commercel Fraud detectionl Smart order routingl Transaction cost analysisl Pricing and analyticsl Market data managementl Algorithmic tradingl Data warehouse augmentation
Components of Fast Data Analytics
l To store data at velocity, Flume into Hbasel To analyze data at velocity, Kafka in Storm or Spark
Open Source Landscape
l Apache Storml Provides massively scalable event collection
l Apache Sparkl General framework for large-scale data processing
l Apache Samza
Apache Storm
Apache Spark
Commercial Landscape
l Commercial products will differ from open source products in the following areas:l Development toolsl Business applications and platform integrationl Implementation supportl Company financialsl Licensing
Commercial Landscape
l IBM InforSphere Streamsl Informatica Platform for Streaming Analyticsl SAP Event Stream Processorl Software AG IBOl SQLstream Blazel Tibco Streambasel Vitria Operational Intelligence