Gossip Protocol & Key-Value Store
Theory and Practice
Dr. SAJEEV G P
April 16, 2016
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 1 / 1
Contents
Outline
I Background
I Introduction to GossipI Gossip Model
I Key-Value StoreI CassandraI CAP Theorem
I Further study
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 2 / 1
Background
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 3 / 1
Real World Applications
Resume Youtube Video
Last watched status..
Already Watched?
Youtube
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 4 / 1
Some Real World Applications
Facebook Search
Term Search
Amazon Search
Amazon Recommendations
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 5 / 1
Cloud Platform
I App Engine
I Compute Engine
I Cloud storage
I Cloud BigTable
I Google Data�ow
I Google Translate API
I Google BigQuery
I Cloud Prediction API
Gossip & Key-Value StoreI Gossip is for communication
I Key-Value store is the database
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 6 / 1
Gossip Protocol
Multicast
Multicast is group communication where information is addressed to agroup of destination computers simultaneously.
Types of Casting
I Unicast
I Multicast
I Broadcast
I Multicast in application level
I Multicast in Network level
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 7 / 1
Multicast Protocol: Centralized & Tree Based
Centralized
Tree based
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 8 / 1
Gossip Protocol: Epedemic multicast
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 9 / 1
Gossip Analysis
Basics
I Population of (n+1) individuals mixing homogeneously
I Contact rate between any individual pair is β
I At any time, each individual is either uninfected (numbering x) orinfected (numbering y)
I Then, x0 = n , y0 = 1 and at all times x+ y = n+ 1
I Infected-uninfected contact turns latter infected, and it stays infected
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 10 / 1
Gossip Analysis
Gossip Properties
I Lightweight in large groups
I Spreads quickly
I Fault-tolerant
Terms
I n+ 1 nodes
I x: # of uninfected nodes
I y: # of infected nodes
x+ y = n+ 1
Continuous time ..
I dxdt = −βxy
I β contact rate = bn
I Solution:I x = n(n+1)
n+eβ(n+1)t
I y = (n+1)1+ne−β(n+1)t
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 11 / 1
Gossip Analysis ..
No. of infected nodes ..
I At t = c log (n),
Iy ≈ (n+ 1)− 1
ncb−2
Low Latency
I Set c, b to besmall numbers
I Withint = c log (n)rounds:
I all will receivethe multicastexcept 1
ncb−2
Lightweight
I Each node hastransmitted nomore than:
I cb log (n)gossipmessages.
Fault toleranceI With 50% of packet drop
I b← b2 :
I Takes twice asmany rounds
I With 50% node failures
I n← n2 :
I Same as above
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 12 / 1
Key-Value Stores
I Simplest form of databasemanagement systems.
I They store pairs of keys andvalues as well as retrievevalues when a key is known.
Examples
I twitter.com: Tweet id ⇒information about tweet
I amazon.com: Item number ⇒information about it
I kayak.com: Flight number ⇒information about �ight, e.g.,availability
I yourbank.com: Ac number ⇒information about it
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 13 / 1
Key-Value Stores..
I It's a dictionary datastructure.
I NoSQL databaseI Insert, lookup, and delete
by keyI E.g., hash table, binary tree
I But distributed
I Key-Value stores reuse manytechniques from DHTs
NoSQL Databases
Traditional RDBMS..
I Schema-based, i.e., structured tables
I Primary key that is unique within that table
I Queried using SQL , Supports joinsDr. SAJEEV G P Gossip Protocol & Key-Value Store 14 / 1
Todays Workload
Big Data Era ..
I Data: Large and unstructured
I Lots of random reads and writes
I Sometimes write-heavy
I Foreign keys rarely needed
I Joins infrequent
Need of todays Workload
I Speed
I Avoid Single Point of Failure (SPOF)
I Low TCO (Total cost of operation)
I Fewer system administrators
I Incremental scalability, Scale out, not up
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 15 / 1
Key-Value or NoSQL
I NoSQL systems often use column-oriented storageI RDBMSs store an entire row together (on disk or at a server)I NoSQL systems typically store a column together (or a group of
columns).I Entries within a column are indexed and easy to locate, given a key
(and vice-versa)
I Why useful?I Range searches within a column are fast since you don't need to
fetch the entire databaseI E.g., get me all the blog-ids from the blog table that were updated
within the past monthI Search in the the last-updated column, fetch corresponding blog-id
columnI Don't need to fetch the other columns
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 16 / 1
CASSANDRA
What is CASSANDRA
I A distributed key-value storeIntended to run in adata-center (and also acrossDCs)
I Originally designed atFacebook
I Open-sourced later, today anApache project
In use
I Some of the companies thatuse Cassandra in theirproduction clusters
I IBM, Adobe, HP, eBay,Ericsson, Symantec
I Twitter, SpotifyI PBS KidsI Net�ix: uses Cassandra to
keep track of your currentposition in the video youârewatching
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 17 / 1
CASSANDRA
Objectives and Functions
I PerformanceI AvailabilityI ScalabilityI Fault-Tolerance
I P2P ClusterI Decentralized design
I Each node has the same role
I No single point of failureI Avoids issues of master-slave DBMS's
I No bottlenecking
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 18 / 1
Cassandra Architecture
DHT Like
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 19 / 1
Cassandra Architecture ..
Replication Strategy
I Simple Strategy: uses the Partitioner, of which there are two kindsI RandomPartitioner: Chord-like hash partitioningI ByteOrderedPartitioner: Assigns ranges of keys to servers.
I Easier for range queries (e.g., get me all twitter users starting with[a-b])
I NetworkTopologyStrategy NetworkTopologyStrategy: for multi-DCdeployments
I Two replicas per DCI Three replicas per DCI Per DC
I First replica placed according to PartitionerI Then go clockwise around ring until you hit a di�erent rack
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 20 / 1
Cassandra Architecture ..
At node
On receiving a write
I Log it in disk commit log (for failure recovery)
I Make changes to appropriate memtables
I Memtable = In-memory representation of multiple key- value pairs
Later, when memtable is full or old, �ush to disk
I Data �le: An SSTable (Sorted String Table) - list of key-valuepairs, sorted by key
I Index �le: An SSTable of (key, position in data sstable) pairs
I And a Bloom �lter (for e�cient search)
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 21 / 1
Cassandra Architecture ..
CAP
C consistency: all nodes see same data at any time, or readsreturn latest written value by any client
A vailability: the system allows operations all the time, andoperations return quickly
P artition-tolerance: the system continues to work in spite ofnetwork partitions
CAP Theorem
In a distributed system you can satisfy at most 2 out of the 3 guarantees.
Eventual consistency : weak consistency model
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 22 / 1
Cassandra Architecture ..
CAP TheoremCAP
Cassandra Consistency
I Cassandra choosesConsistency
andAvailability
I Cassandra has consistency levelsDr. SAJEEV G P Gossip Protocol & Key-Value Store 23 / 1
CAP Theorem ..
Consistency: No. of Replicas ..
I Client is allowed to choose a consistency level for each operation(read/write)
I ANY: any server, FastestI ALL: all replicas, Ensures strong consistency, but slowestI ONE: at least one replica, Faster than ALLI QUORUM: quorum across all replicas in all datacenters (DCs)
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 24 / 1
Cassandra Architecture ..
Cassandra & Gossip
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 25 / 1
For further learning
Gossip Applications
I DistributedComputing/Networking
I Information Dissemination
I Gossip Learning
I Distributed Data
Research/Projects
I Gossip Protocol:I Anti-entropy (simple
epidemics)I Rumor mongering
(complex epidemics)I Eager epidemic
dissemination
I Key-Value Store:I CassandraI RedisI python-Cassandra
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 26 / 1
Further Learning ..
Cassandra
Resources
I http://cassandra.apache.org/
I http://academy.datastax.com/
I
http://www.planetcassandra.org/
I Use locally installedapache-cassandra-3.4
I Use Cassandra Cluster service
Python-Cassandra
I Cassndra Cluster Setup
I cqlengine: Cassandra CQLobject mapper for Python
I Python Application
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 27 / 1
References I
Atikoglu, B., Xu, Y., Frachtenberg, E., Jiang, S., and Paleczny, M.Workload analysis of a large-scale key-value store.In ACM SIGMETRICS Performance Evaluation Review (2012),vol. 40, ACM, pp. 53�64.
Datastax.Python cassandra-driver[onlne] available:.https://pypi.python.org/pypi/cassandra-driver (2015).
Gupta, I., and Meseguer, J.Quantitative analysis of consistency in nosql key-value stores.In Quantitative Evaluation of Systems: 12th InternationalConference, QEST 2015, Madrid, Spain, September 1-3, 2015,Proceedings (2015), vol. 9259, Springer, p. 228.
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 28 / 1
References II
Jenkins, K., Hopkinson, K., and Birman, K.A gossip protocol for subgroup multicast.In Distributed Computing Systems Workshop, 2001 InternationalConference on (2001), IEEE, pp. 25�30.
Lakshman, A., and Malik, P.Cassandra: A decentralized structured storage system.SIGOPS Oper. Syst. Rev. 44, 2 (Apr. 2010), 35�40.
van der Hoek, W.A framework for epistemic gossip protocols.In Multi-Agent Systems: 12th European Conference, EUMAS 2014,Prague, Czech Republic, December 18-19, 2014, Revised SelectedPapers (2015), vol. 8953, Springer, p. 193.
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 29 / 1
Thank You..
Dr. SAJEEV G P Gossip Protocol & Key-Value Store 30 / 1