Upload
taylor-jenkins
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
REAL-TIME NETWORK ANALYTICS WITH STORM
Mauricio VacasFausto Inestroza
Sonali Parthasarathy
Mauricio VacasBig Data Architect
Sonali ParthasarathyReal-Time Processing
Fausto InestrozaBig Data Architect
Anita MehrotraData Scientist
Susie LuVisualization
Krista SchnellVisualization
Rick DrushalEngineering Lead
John AkredProduct Lead
The Team
WHY REAL-TIME?
Distributed Analytics
Real-Time Data Ingestion
Model Prototyping
Exploratory Analytics
Real-Time Rule Execution
PROCESS
UNDERSTAND
REACT
Accenture Cloud Platform
Recommender as a Service
Recommender as a Service
……
Network Analytics Services
Network Analytics Services
Big Data Platform
Drivers
consumer devices
video usage
Issues
Operational Costs
Understanding service quality degradation
Inefficient capacity planning
INGEST PROCESS
VISUALIZE
ANALYZE
STORE
WHY STORM?
Scalability
Reliability
Data types, size, velocity
Mission critical data
Processing, computation, etc.
Time series / pattern analysis
Fault-tolerance
What do we need?
Multiple use cases
How do we get this from Storm?
Processing guarantees
Low-level Primitives
Parallelization
Robust fail-over strategies
Scalability
Reliability
Fault-tolerance
Processing, computation, etc.
PRIMITIVES
Stream
Spout
Bolt
TopologySuboptimal network speed, geospatial analysis
Request info (IP, user-agent, etc)
Pull messages from distributed queue
Sessionization, speed calculation
Tuple Tuple
PARALLELISM
Nimbus Zookeeper
Supervisor
WT T
WT T
Supervisor
WT T
WT T
Topology
Worker Process
Task
Task
Task
Task
Executor Executor
FAULT TOLERANCE
Nimbus
Supervisor
WT T
WT T
Supervisor
WT T
WT T
Supervisor
WT
W
TTT
TT
TT
RELIABILITY
IP2IP2
IP3
IP1
A
IP2IP2
IP3
IP1
A
SUBOPTIMAL NETWORK SPEED TOPOLOGY
AN EXAMPLE
KafkaSpout
Pre-process SessionizeCalculate N/W
Speed per Session
Update Speed per IP
Identify Suboptimal
Speed
Store in Cassandra
Cassandra
Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)Tuple (ip 1)
Cassandra
KafkaSpout
Pre-process SessionizeCalculate N/W
Speed per Session
Update Speed per IP
Identify Suboptimal
Speed
Store in Cassandra
Tuple (ip 2)Tuple (ip 2)Tuple (ip 2)
Tuple (ip 1)Tuple (ip 1)Tuple (ip 1)
Tuple (ip 1)
Parallelism
Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1) Tuple (ip 1)
Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2) Tuple (ip 2)
Cassandra
KafkaSpout
Pre-process SessionizeCalculate N/W
Speed per Session
Update Speed per IP
JoinCompare
SpeedStore in
Cassandra
Speed by Location
Stream 1
Stream 2
KafkaSpout
Tuple (ip 1)
Branching and Joins
Tuple (ip 1/NY) Tuple (ip 1/NY)
Tuple (NY)
RULE EXECUTION
Drools
METHOD 1Storm
METHOD 2Storm + Drools
KafkaSpout
Pre-process SessionizeCalculate N/W
Speed per Session
Update Speed per IP
Identify Suboptimal
Speed
Store in Cassandra
Cassandra
Drools
Storm + Drools
Copyright © 2012 Accenture All rights reserved. 28
Integration with Cassandra
Cassandra Optimal for time series data
Near-linear scalable
Low read/write latency
Custom BoltUses Hector API to access Cassandra
Creates dynamic columns per request
Stores relevant network data
Copyright © 2012 Accenture All rights reserved. 29
Lessons Learned
• Rebalance Topology
• Tweak Parallelism in bolt
•Isolation of Topologies
• Use TimeUUIDUtils
• Log4j level set to INFO by default
Copyright © 2012 Accenture All rights reserved. 30
DEMO
Copyright © 2012 Accenture All rights reserved. 31
Next Steps
• Trident
• Externalizing Rules
• Predictive Models
• Real-Time Notifications