Upload
pello
View
27
Download
0
Tags:
Embed Size (px)
DESCRIPTION
REAL-TIME NETWORK ANALYTICS WITH STORM. Mauricio Vacas Fausto Inestroza Sonali Parthasarathy. The Team. Anita Mehrotra Data Scientist. Mauricio Vacas Big Data Architect. Krista Schnell Visualization. Fausto Inestroza Big Data Architect. Sonali Parthasarathy Real-Time Processing. - PowerPoint PPT Presentation
Citation preview
REAL-TIME NETWORK ANALYTICS WITH STORM
Mauricio VacasFausto Inestroza
Sonali Parthasarathy
Mauricio VacasBig Data Architect
Sonali ParthasarathyReal-Time Processing
Fausto InestrozaBig Data Architect
Anita MehrotraData Scientist
Susie LuVisualization
Krista SchnellVisualization
Rick DrushalEngineering Lead
John AkredProduct Lead
The Team
WHY REAL-TIME?
Distributed Analytics
Real-Time Data Ingestion
Model Prototyping
Exploratory Analytics
Real-Time Rule Execution
PROCESS
UNDERSTAND
REACT
Accenture Cloud Platform
Recommender as a Service
…
Network Analytics Services
Big Data Platform
Drivers
consumer devices
video usage
Issues
Operational Costs
Understanding service quality degradation
Inefficient capacity planning
INGEST PROCESS
VISUALIZE
ANALYZE
STORE
WHY STORM?
Scalability
Reliability
Data types, size, velocity
Mission critical data
Processing, computation, etc.
Time series / pattern analysis
Fault-tolerance
What do we need?
Multiple use cases
How do we get this from Storm?
Processing guarantees
Low-level Primitives
Parallelization
Robust fail-over strategies
Scalability
Reliability
Fault-tolerance
Processing, computation, etc.
PRIMITIVES
Stream
Spout
Bolt
Topology Suboptimal network speed, geospatial analysis
Request info (IP, user-agent, etc)
Pull messages from distributed queue
Sessionization, speed calculation
Tuple Tuple
PARALLELISM
Nimbus Zookeeper
Supervisor
WT T
WT T
Supervisor
WT T
WT T
Topology
Worker Process
Task
Task
Task
Task
Executor Executor
FAULT TOLERANCE
Nimbus
Supervisor
WT T
WT T
Supervisor
WT T
WT T
Supervisor
WT
W
TTT
TT
TT
RELIABILITY
IP2IP2
IP3
IP1
A
IP2IP2
IP3
IP1
A
SUBOPTIMAL NETWORK SPEED TOPOLOGY AN EXAMPLE
KafkaSpout Pre-process Sessionize
Calculate N/W Speed per Session
Update Speed per
IP
Identify Suboptimal
Speed
Store in Cassandra
Cassandra
Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)Tuple (ip 1)
Cassandra
KafkaSpout Pre-process Sessionize
Calculate N/W Speed per Session
Update Speed per
IP
Identify Suboptimal
Speed
Store in Cassandra
Tuple (ip 2)Tuple (ip 2)Tuple (ip 2)
Tuple (ip 1)Tuple (ip 1)Tuple (ip 1)
Tuple (ip 1)
Parallelism
Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)
Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2)
Cassandra
KafkaSpout Pre-process Sessionize
Calculate N/W Speed per Session
Update Speed per
IPJoin Compare
SpeedStore in
Cassandra
Speed by Location
Stream 1
Stream 2
KafkaSpout
Tuple (ip 1)
Branching and Joins
Tuple (ip 1/NY) Tuple (ip 1/NY)
Tuple (NY)
RULE EXECUTION
Drools
METHOD 1Storm
METHOD 2Storm + Drools
KafkaSpout Pre-process Sessionize
Calculate N/W Speed per Session
Update Speed per
IP
Identify Suboptimal
Speed
Store in Cassandra
Cassandra
Drools
Storm + Drools
Copyright © 2012 Accenture All rights reserved. 28
Integration with Cassandra
Cassandra Optimal for time series dataNear-linear scalableLow read/write latency
Custom BoltUses Hector API to access CassandraCreates dynamic columns per request Stores relevant network data
Copyright © 2012 Accenture All rights reserved. 29
Lessons Learned
• Rebalance Topology• Tweak Parallelism in bolt• Isolation of Topologies• Use TimeUUIDUtils• Log4j level set to INFO by default
Copyright © 2012 Accenture All rights reserved. 30
DEMO
Copyright © 2012 Accenture All rights reserved. 31
Next Steps
• Trident• Externalizing Rules • Predictive Models• Real-Time Notifications