An Information-Theoretic Perspective of Consistent...

An Information-Theoretic Perspective of Consistent Distributed Storage

Viveck R. Cadambe

Pennsylvania State University

Joint with Prof. Zhiying Wang (UCI) Prof. Nancy Lynch (MIT), Prof. Muriel Medard (MIT) and Dr. Peter Musial (EMC Corporation)

Distributed Storage Systems

•  Failure tolerance, Low storage costs, Fast reads and writes

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

•  High-level principle: read the “latest” value stored in the system

•  Failure tolerance, Low storage costs, Fast reads and writes •  This talk: Consistency

•  High-level principle: read the “latest” value stored in the system •  Modern key-value stores - Amazon Dynamo DB, Couch DB, Apache Cassandra DB, Google Spanner, Voldermort DB ….. •  Used for transactions, reservation systems, multi-player gaming, social

networks, news feeds, distributed computing tasks etc.

Servers

Write Clients Read Clients (Decoders)

High level Distributed Storage Model

•  Asynchrony – packets don’t arrive at all the servers simultaneously

•  Distributed nature - nodes do not know which packets have been received by other nodes, or if they have failed.

•  Consistency – the reader/decoder needs the latest “possible” version.

Servers

•  Asynchrony, Distributed Nature, Consistency

Analytical understanding of storage costs, latency, is very limited Replication is used in every commercial solution to provide fault tolerance

Standard model in distributed systems

theory

Multi-version Coding

theory

Toy model for distributed storage

theory

The multi-version coding (MVC) problem [Wang-C, ISIT, Allerton 2014, arxiv 2015]

As the data gets updated

•  Asynchrony: all servers may not simultaneously get the new version of the data

•  Distributed nature: each node is unaware of the versions received by the other nodes

•  Consistency: A decoder must get the latest possible version of the data

Ver. 1

Ver. 1 Ver. 1 Ver. 1

The multi-version coding problem

Ver. 1

[Wang-C, ISIT, Allerton 2014, arxiv 2015]

Ver. 1 Ver. 2

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2

Ver. 1

Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……

Ver. 1

Ver. 1 Ver. 2

Ver. 3 Ver. 4 ……

Ver. 1

Latest common or something later: Ver. 2

Ver. 1 Ver. 2

Latest common or something later: Ver.

Ver. 3 Ver. 4 ……

Ver. 1 Ver. 1 Ver. 2 Ver. 1 Ver. 2 Ver. 1

Ver. 1 Ver. 2

1 or Ver. 2

Ver. 3 Ver. 4 ……

In general, client connects to c servers, demands the latest common version among v versions

Ver. 1 Ver. 2

1 or Ver. 2

Ver. 3 Ver. 4 ……

The multi-version coding problem •  n servers •  v versions •  c connectivity

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

The multi-version coding problem •  n servers •  v versions •  c connectivity •  Goal: decode the latest common version among the c servers •  Minimize the storage cost – Worst case, across all “states” – across all servers

Ver. 1

Ver. 2

Latest common or something later: Ver. 1 or Ver. 2

Ver. 3

Ver. 4

……

Ver. 1

Ver. 2

Ver. 1

Ver. 2

Ver. 1

Solution 1: Replication

Ver 1 Ver 1 Ver 1 Ver 1 Version 1

Version 2

Storage size = size-of-one-version

Ver 1 Ver 1

Ver 2 Ver 2

Version 1

Version 2

Storage size = size-of-one-version

Solution 1: Replication

1/2 1/2 1/2 1/2

1/2 1/2

Version 1

Version 2

Solution 2: MDS code

Separate coding across versions. Each server stores all the versions received.

Storage size = (Number of versions / c)*size-of-one-version = v/c*size-of-one-version

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

Lower bound vc+v�1

v = Number of Versions

c = Connectivity

�o(size-of-one-version)

Storage Cost Normalized by size-of-

value Replication 1

Naïve MDS codes

Constructions

c = Connectivity

1dc/ve

�o(size-of-value)

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

Partition 1 Partition 2Partition v

Partition i: Version i is the latest version

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

There is at least one partition with dc/ve servers

Achievability

. . . . . . . . .

z}|{ z}|{ z}|{

There is at least one partition with dc/ve servers

Simple achievable scheme:

Server in partition i stores 1dc/ve of version i

Converse: v = 2, storage cost & 2c+1

Start with c servers

Ver 1 Ver 1 Ver 1 Ver 1 . . .

Ver 1 Ver 1 Ver 1 Ver 1

Ver 2 Ver 2

Propagate version 2 to a minimal set of servers such that it is decodable

Virtual server

Versions 1 and 2 decodable from c+1 symbols

Ver 2 Ver 2

Virtual server

=) Storage � 2c+1

Ver 2 Ver 2

Virtual server

=) Storage � 2c+1

Ver 2 Ver 2

•  Intuition: Find c+v-1 virtual servers, where all v versions can be decoded

•  A more intricate puzzle as compared to v=2. •  Multi-version coding problem related to index-coding/

multiple-unicast/non-multicast network coding –  More precisely, it is related to pliable index coding

Converse: v > 2

[Brahma-Fragouli 12]

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

Ver 2 Ver 1 Ver 3

Server a1

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Server a2Server a1

Converse: v=3

a1 is the smallest number such that, there is a version x, such that

Version x is decodable, given the symbols of the first a1 servers with all 3 versions

and the messages of versions {1, 2, 3}� {x}

a2 is the smallest number such that, there is a version y 2 {1, 2, 3}� {x}, suchthat

Version y is decodable, given the symbols of the first a1 � 1 servers with all 3

versions and the remaining a2 � (a1 � 1) servers with versions {1, 2, 3}� {x}

and the message of version {1, 2, 3}� {x, y}

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Server a2Server a1

Ver 2 Ver 2

Converse: v=3

Ver 1 Ver 1 Ver 1

Ver 3 Ver 3 Ver 3 . . .

. . . Ver 3

Ver 1 Ver 3

Ver 3 Ver 3

Ver 1 Ver 3

Server a2Server a1

Storage Cost Normalized by size-of-one-

version Replication 1

Naïve MDS codes

Constructions

c = Connectivity

1dc/ve

Summary

These bounds can be improved.

See “Multi-version Coding – An Information Theoretic Perspective of Distributed Storage ”, Wang-Cadambe, arxiv, 2015

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

Multi-version codes – Main Insights

•  Redundancy required to ensure consistency in an asynchronous environment –  Redundancy increases with the number of parallel versions in the

system

•  Simple codes are (approximately) optimal –  Separate coding across versions –  Random linear codes within versions

•  More insights may be obtained by going beyond worst-case measures –  Correlated versions –  Allow a small fraction of “erroneous” statess

theory

Servers

Toy Model for packet arrivals, links

Servers

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Servers

•  Arrival at client: One packet in every time slot. Sent immediately to the servers. •  Channel from the write client to the server: Delay is an integer in [0,T-1].

•  Channel from server to read client: instantaneous (no delay).

•  Goal: decoder invoked at time t, gets the latest common version among c servers

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

There exists an achievable storage strategy that achieves a storage cost of

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

Insights from multi-version codes over toy model

Achievability “Theorem”:

Converse “Theorem”:

Number of versions ⌫, depends on degree of asynchrony T

There exists an achievable storage strategy that achieves a storage cost of

⇥ size-of-one-version

There exists no achievable storage strategy that achieves a storage cost smaller

T + c� 1

⇥ size-of-one-version� o(size-of-one-version)

literature

Servers

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Servers

Model studied in distributed systems – Key features

•  Arrival at clients: arbitrary

•  Channel from clients to servers: arbitrary delay, reliable

•  Clients and servers are modeled as I/O automata, so their protocols can be designed.

Multi-version coding converse for v=2 can be lifted to this setting.

Future Work – Many open questions •  Less conservative modeling assumptions,

-  Exploiting correlation between versions -  Allow for a “small” number of erroneous states -  Less distributed, knowledge of the state of other nodes.

•  Finer network and node models (beyond toy models). -  Can lead to finer insights in to communication and storage costs -  Allow for the design of protocols, for say, the read client (or the write client)

•  Study of errors/Byzantine adversaries instead of erasures -

useful assumption for ensuring security.s

useful assumption for ensuring security.

Thanks

An Information-Theoretic Perspective of Consistent...

Documents

ON-LINE MANUAL 24D143DB / 24W143DB / 24D343DB LED ...cdn.cnetcontent.com/aa/e1/aae1d6b6-340c-474c-b896... · ON-LINE MANUAL 24D143DB / 24W143DB / 24D343DB LED Backlight LCD

User Manual PVS-8MH-DB/ PVS-12MH-DB/PVS-16MH-DB PV …

B SERIES - EAR Customized Hearing Protection Datasheet 2017-1.pdfBTN 27 dB, Class 5 24 dB 24.3 dB 30 dB HM 26 dB, Class 5 23 dB 19.1 dB 31 dB Power Battery type Rechargeable - lithium

DOING BUSINESS IN KAZAKHSTAN · 76 53 51 35 36 28 25 0 10 20 30 40 50 60 70 80 db 2014 db 2015 db 2016 db 2017 db 2018 db 2019 db 2020 kazakhstan’s ratings in db by year doing business

TheSparseAwakens: StreamingAlgorithmsfor ...dimacs.rutgers.edu/~graham/pubs/papers/matching-esa.pdfTheSparseAwakens: StreamingAlgorithmsfor MatchingSizeEstimationinSparseGraphs Graham

Tugas db akademik & db rs

LCT 340 - Full Compass Systems · 139 dB, 0 dB pre-attenuation 145 dB, 6 dB pre-attenuation 151 dB, 12 dB pre-attenuation 157 dB, 18 dB pre-attenuation 6 dB, 12 dB, 18 dB, switchable

Patrick Huesler wooga GmbH › dl › goto-aar-2012 › slides › ...load balancer app app app app app app app app db db db db db db db db db db db db db db db db app app app app

db-tech MICROWAVE COMMUNICATIONS DB-08

Rapidly Deployable Wireless Networks for Emergency ...dimacs.rutgers.edu/Workshops/RutgersHomeland/... · Rapidly Deployable Networks: Rationale Failure of communication networks

HÉTVÉGE · 1 day ago · 2 MEGTAKARÍTÁS: 5.000 Ft Háztartás 22.990 Ft /db 17.990 Ft /db 4.799/db Ft /db-tól 3.499 Ft /db-tól 6.999 Ft 4.999 Ft /db 5.499 Ft /db-tól 4.499

Modeling Infectious Diseases from a Real World …dimacs.rutgers.edu › ... › Modeling_Infectious_Diseases.pdfBasic Elements deﬁne species: single pop, vectored system, ecological

DB-2D, DB-3D, DB-4D and DB-2TC Manual

N450 DB/N600 DB User Manual

1MA98 6e dB or Not dB

GREAT IDEAS THAT WORK...In A Factory & A Train Loud Office Quiet Office Hospital And Library Noise During Running Operation 120 dB 100 dB 80 dB 60 dB 40 dB 20 dB 0 dB 60.6 dB Roboshot

Pennsylvania - dot7.state.pa.us · PDF file3014 3014 CALN Db 25 Db 252 Db 113 Db 23 Db 926 Db 352 Db 3 Db 352 Db 842 Db 162 Db 100 Db 52 Db Db 82 Db 100 Db 52 Db 926

DAVID BROWN - L.O.G. Christian Heigl - Ferguson … Brown Engine DB 3 Gaskets DB 3 Fuel DB 4 Exhausts DBI 6 Cooling DB 10 Filters DB 16 Fuel Filter Assembly DB 17 Axle Parts DB 18

Duality for Simple Multiple Access Networksdimacs.rutgers.edu/Workshops/Next15/Slides/Duursma.pdf · Outline 1 - Reliability and Security 2 - Efﬁcient repair 3 - Constrained codes

inContact DB Connector Reference Manual DB Connector Manual.pdfLaunch the DB Connector.exe ... MySQL. Since DB Connector utilizes Microsoft’s standardized OLE DB ... DB Connector