View
71
Download
1
Category
Preview:
Citation preview
Apache Storm
• A Stream Processing framework
• Used to pull data from a stream and perform real time analytics on the data
a Stream…
• Can be Apache Kafka , Amazon Kinesis.
• Normally has partitions / shards for better read & write throughput
Partition Metadata
• Storm uses INTEGERS (0,1…) to identify partitions.
• Where as ……
• Amazon Kinesis uses STRINGS to identify partitions
So how can we process data ?
• User sorts the STRINGS (shard Id’s)• User maps the sorted items id’s from 0...N
So how can we process data ?
• User sorts the STRINGS(shard Id’s)• User maps the sorted items id’s from 0...N
Shard-‐id-‐0001 <-‐> 0Shard-‐id-‐0002 <-‐> 1
…..…..
Disturbance in the Force
• Storm partition metadata NO longer valid as the shard has been deleted.
• Storm partition metadata should now be:shard-‐2 <-‐> 0shard-‐3 <-‐> 1
a Solution:
• WHITE_LIST of shards for a storm topology.• A storm topology pulls from a specific set of shards.
a Solution:
• WHITE_LIST of shards for a storm topology.• A storm topology pulls from a specific set of shards.
• So in our case:– start topology-‐1 with WHITELIST =“shard-‐1”
a Solution:
• WHITE_LIST of shards for a storm topology.• A storm topology pulls from a specific set of shards.
• So in our case:– start topology-‐1 with WHITELIST =“shard-‐1”– split shard
a Solution:
• WHITE_LIST of shards for a storm topology.• A storm topology pulls from a specific set of shards.
• So in our case:– start topology-‐1 with WHITELIST =“shard-‐1”– split shard– start topology-‐2 with WHITELIST=“shard-‐2 & 3”
a Solution…
• When shard-‐1 gets deleted , topology 1 dies with it.
• Topology 2 continues processing data for the new shards.
a Solution…
So, there is NO metadata conflict ,
as there are 2 different topologies
pulling data from different sets of shards.
Recommended