199
Causality for the Masses: Luís Rodrigues Offering Fresh Data, Low Latency, and High Throughput

Causality for the Masses - OPODIS 2017opodis2017.campus.ciencias.ulisboa.pt/presentations/...Causality for the Masses: Offering Fresh Data, Low Latency, and High Throughput Manuel

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

  • Causality for the Masses:

    Luís Rodrigues

    Offering Fresh Data, Low Latency, and High Throughput

  • Causality for the Masses:Offering Fresh Data, Low Latency,

    and High Throughput

    Manuel Bravo, Luís Rodrigues, Chathuri Gunawardhana, Peter van Roy

  • Causal consistency

    Strongest
without

    compromising availability

    3

  • Causal consistency

    Key ingredientof several

    consistency criteria

    Parallel Snapshot Isolation 
[SOSP’11]

    RedBlue Consistency 
[OSDI’12]

    Session guarantees 
[SOSP’97]

    Explicit Consistency 
[EuroSys’15]

    4

  • Causal consistency

    5

  • Causal consistency

    5

  • Causal consistency

    5

  • Causal consistency

    Operations may arrive in the “wrong”

    order

    5

  • 6

  • 6

    Alice

    Bob

    Dan

  • 6

    Alice

    Dan is in the 
hospital!

    Bob

    Dan

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital!

    Dan

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    That’s great!

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    That’s great!

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • 6

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • Causal consistency

    7

  • Causal consistency

    Data center should delay the

    visibility of inconsistent operations

    7

  • 8

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    That’s great!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok! That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok! That’s great!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok! That’s great!

    This presentation is about keeping Dan happy

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok! That’s great!

    This presentation is about keeping Dan happy

    Requires maintaing

    and exchangingmetadata!

  • 8

    Alice

    Dan is in the 
hospital!

    Dan is ok!

    Bob

    Dan is in the 
hospital! Dan is ok!

    Dan

    Dan is in the 
hospital!

    That’s great!

    Dan is ok! That’s great!

    This presentation is about keeping Dan happy

    Requires maintaing

    and exchangingmetadata!

    Lots of metadata!

  • Causal consistency

    If causal dependencies are not accurately tracked metadata may generate false positives

    Problem

    9

    False dependencies!

  • 10

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 10

    Alice

    Bob

    Dan

  • 11

    Metadata

    more metadata less metadata

  • 11

    Metadata

    more metadata less metadata

    Matrix/vector clocks

  • 11

    Metadata

    more metadata less metadata

    Matrix/vector clocks

    One vector per item.

    One entry in each vector per DC.

  • 11

    Metadata

    more metadata less metadata

    precise

    expensive

    Matrix/vector clocks

  • 11

    Metadata

    more metadata less metadata

    precise

    expensive

    Matrix/vector clocks Lamport’s clocks

  • 11

    Metadata

    more metadata less metadata

    precise

    expensive

    Matrix/vector clocks Lamport’s clocks

    One scalar.

  • 11

    Metadata

    more metadata less metadata

    precise

    expensive

    false positives

    cheap

    Matrix/vector clocks Lamport’s clocks

  • Problems of the previous state-of-the-art Throughput vs. data staleness tradeoff

    -20-16-12-8-4 0

    3 4 5 6 7

    Th

    rou

    gh

    pu

    t p

    en

    alty

    (%

    )

    GentleRain Cure

    0 20 40 60 80

    100 120

    3 4 5 6 7

    Da

    ta s

    tale

    ne

    ss o

    verh

    ea

    d (

    %)

    Number of datacenters

    GentleRain [SoCC’ 14]: Optimizes throughput 
Compresses metadata into a scalar

    Cure [ICDCS’ 16]: Optimizes data freshness 
Relies on a vector clock with an entry per data center

    12

  • Problems of the previous state-of-the-art Throughput vs. data staleness tradeoff

    -20-16-12-8-4 0

    3 4 5 6 7

    Th

    rou

    gh

    pu

    t p

    en

    alty

    (%

    )

    GentleRain Cure

    0 20 40 60 80

    100 120

    3 4 5 6 7

    Da

    ta s

    tale

    ne

    ss o

    verh

    ea

    d (

    %)

    Number of datacenters

    GentleRain [SoCC’ 14]: Optimizes throughput 
Compresses metadata into a scalar

    Cure [ICDCS’ 16]: Optimizes data freshness 
Relies on a vector clock with an entry per data center

    12

    Metadata size affects throughput

  • Problems of the previous state-of-the-art Throughput vs. data staleness tradeoff

    -20-16-12-8-4 0

    3 4 5 6 7

    Th

    rou

    gh

    pu

    t p

    en

    alty

    (%

    )

    GentleRain Cure

    0 20 40 60 80

    100 120

    3 4 5 6 7

    Da

    ta s

    tale

    ne

    ss o

    verh

    ea

    d (

    %)

    Number of datacenters

    GentleRain [SoCC’ 14]: Optimizes throughput 
Compresses metadata into a scalar

    Cure [ICDCS’ 16]: Optimizes data freshness 
Relies on a vector clock with an entry per data center

    12

    False dependencies damage data

    freshness

    Metadata size affects throughput

  • Problems of the previous state-of-the-art Throughput vs. data staleness tradeoff

    Visibility latencies have direct impact on client response latency

    Partial replication aggravates the problem

    13

  • Our take on the problem

    14

    Saturn Eunomia

    +

  • Our take on the problem

    15

    Saturn Eunomia

    God in ancient Roman religion, that become the god of time

    +

  • Our take on the problem

    16

    Saturn Eunomia

    Greek goddess of law and legislation

    +

  • Our take on the problem

    17

    Saturn Eunomia

    If this couple cannot fix the problem, nobody can…

    +

  • Our take on the problem

    18

    Saturn Eunomia

    +

  • Our take on the problem

    18

    Saturn Eunomia

    Orders events across datacenters

    +

  • Our take on the problem

    18

    Saturn Eunomia

    Orders events in each

    datacenter

    +

  • 19

  • Distributed metadata service

    pluggable to existing geo-distributed data services

    handles the dissemination of operations among data centers

    Ensures thatclients always observe a causally consistent state

    with a negligible performance overhead when compared to an eventually consistency system

    20

  • key features

    21

  • key features

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    21

  • key features

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    to avoid impairing throughput

    21

  • key features

    Mitigates the impact of false dependencies 
by relying on a tree-based dissemination

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    to avoid impairing throughput

    21

  • key features

    Mitigates the impact of false dependencies 
by relying on a tree-based dissemination

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    to avoid impairing throughput

    to enhance
data freshness

    21

  • key features

    Mitigates the impact of false dependencies 
by relying on a tree-based dissemination

    Implements genuine partial replication
data centers only manage data and metadata of the items replicated locally

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    to avoid impairing throughput

    to enhance
data freshness

    21

  • key features

    Mitigates the impact of false dependencies 
by relying on a tree-based dissemination

    Implements genuine partial replication
data centers only manage data and metadata of the items replicated locally

    Requires a constant and small amount of metadata 
regardless of the system’s scale (servers, partitions, and locations)

    to avoid impairing throughput

    to enhance
data freshness

    to take full advantage of partial replication

    21

  • Decoupling data and metadata

    3 421

    22

  • Decoupling data and metadata

    3 421

    data transfer22

  • Decoupling data and metadata

    3 421

    data transfer

    metadata transfer

    22

  • Decoupling data and metadata

    3 421

    data transfer

    metadata transferData centers

    only make remote updates visible when

    they have received both the metadata and its corresponding data

    22

  • Example: write request

  • Example: write request

    data

    labels

    3 N21

    data centers

  • Example: write request

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centersa1

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Example: write request

    …put(a1)

    data

    labels

    3 N21Client 1

    data centers

  • Metadata propagation

    3 421

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    bc

    a a concurrent to both b and cc causally depends on b

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6) ??

    bc

    a a concurrent to both b and cc causally depends on b

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6) ??

    a b c ?

    bc

    a a concurrent to both b and cc causally depends on b

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6) ??

    b a c ?

    a b c ?

    bc

    a a concurrent to both b and cc causally depends on b

    24

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6) ?

    b c a ??

    b a c ?

    a b c ?

    bc

    a a concurrent to both b and cc causally depends on b

    24

  • Metadata propagation

    3 421

    25

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    bc

    a a concurrent to both b and cc causally depends on b

    25

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    a -> 3 
b -> 14c -> 16

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    a -> 3 
b -> 14c -> 16

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    delayed

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    delayed

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    b -> 5 
a -> 12c -> 12

    delayed

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    b -> 5 
a -> 12c -> 12

    delayed

    a
c
b

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    ?

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    b -> 5 
a -> 12c -> 12

    delayed

    a
c
b

    b -> 5 
c -> 7

    a -> 12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    a (2)

    3 421

    b (4)c (6)

    1ms 1ms

    10ms

    b -> 5 
c -> 7

    a -> 12

    c
b
a

    a -> 12 
b -> 12c -> 12

    delayed

    c
a
b

    b -> 5 
a -> 12c -> 12

    delayed

    a
c
b

    b -> 5 
c -> 7

    a -> 12

    ac 
b

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    25

    c 
ba

  • Metadata propagation

    Causal consistency is a partial order

    Saturn exploits this fact by serving possibly different linear extensions of the causal order to each data center

    Each served serialization aims at optimising data freshness, and thus, reducing the impact of false dependencies

    26

  • Metadata propagation

    Causal consistency is a partial order

    Saturn exploits this fact by serving possibly different linear extensions of the causal order to each data center

    Each served serialization aims at optimising data freshness, and thus, reducing the impact of false dependencies

    a c
b

    c
b a

    26

  • Metadata propagation: architecture

    Saturn leverages a set of cooperating servers, namely serializers, forming a tree

    Causal consistency is trivially enforced by the tree assuming 


    FIFO links among serializers 


    Serializers propagate metadata preserving the observed order

    Serializers are geographically distributed to optimize data freshness

    27

  • Metadata dissemination graph

    3 421

    Saturn

    S1 S2 S3 S4

    S5

  • Metadata dissemination graph

    3 421

    Saturn

  • Metadata dissemination graph

    3 421

    Saturn

    S1 S2 S3 S4

    S5 S6

  • Optimal dissemination graph

    Weighted Minimal Mismatch

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    The goal is to build the tree such that metadata-paths latencies (through the tree) match data-paths

    30

  • Optimal dissemination graph

    Weighted Minimal Mismatch

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    The goal is to build the tree such that metadata-paths latencies (through the tree) match data-pathsabsolute

    difference between label-paths and data

    paths

    30

  • Optimal dissemination graph

    Weighted Minimal Mismatch

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    5.4 Configuring SATURN’s Metadata ServiceThe quality of the serialization served by SATURN to eachdatacenter depends on how the service is configured. A SAT-URN’s configuration defines: (i) the number of serializers touse and where to place them; (ii) how these serializers areconnected (among each other and with datacenters); and (iii)what delays (if any) should a serializer artificially add whenpropagating labels (in order to match the optimal visibilitytime). Let �ij denote the artificial delay added by serializeri when propagating metadata to serializer j.

    In practice, when deploying SATURN, one has not com-plete freedom to select the geo-location of serializers. In-stead, the list of potential locations for serializers is limitedby the availability of suitable points-of-presence that resultsfrom business constraints. Therefore, the task of setting-up aserializer network is based on:• The set V of datacenters that need to be connected (we

    denote N = |V | the total number of datacenters).• The latencies of the bulk data transfer service among

    these datacenters; latij denotes the latency between dat-acenters i and j.

    • The set W of potential locations for placing serializers(M = |W |). Since each datacenter is a natural potentialserializer location, M � N . Let dij denote the latencybetween two serializer locations i and j.However, given a limited set of potential locations to

    place serializers, it is unlikely (impossible in most cases) tomatch the optimal label propagation latency for every pair ofdatacenters. Therefore, the best we can aim when setting-upSATURN is to minimize the mismatch among the achievablelabel propagation latency and the optimal label propagationlatency. More precisely, consider that the path for a giventopology between two datacenters, i and j, denoted PMi,j iscomposed by a set of serializers PMi,j = {Sk, ..., So}, whereSk connects to datacenter i and So connects to datacenter j.The latency of this path �M (i, j) is defined by the latencies(d) between adjacent nodes in the path, plus any artificialdelays that may be added at each step, i.e.:

    �M (i, j) =P

    Sk2PMi,j\{So}(dk,k+1 + �k,k+1)

    and the mismatch between the resulting latency and theoptimal label propagation latency is given by:

    mismatchi,j = |�M (i, j)��(i, j)|

    Finally, one can observe that in general, the distributionof client requests, among items and datacenters may notbe uniform, i.e., some items and some datacenters may bemore accessed than others. As a result, a mismatch thataffects the data visibility of a highly accessed item mayhave a more negative effect on the user experience thana mismatch on a seldom accessed item. Therefore, in thescenario where it is possible to collect statistics regardingwhich items and datacenters are more used, it is possible toassign a weight ci,j to each metadata path PMi,j , that reflects

    the relative importance of that path for the business goalsof the application. Using these weights, we can now defineprecisely an optimization criteria that should be followedwhen setting up the serializers topology:

    DEFINITION 2 (Weighted Minimal Mismatch). The config-uration that better approximates the optimal visibility timefor data updates, considering the relative relevance of eachtype of update, is the one that minimizes the weighted globalmismatch, defined as:

    minP

    8i,j2V ci,j · mismatchi,j

    5.5 Configuration GeneratorThe problem of finding a configuration that minimizes theWeighted Minimal Mismatch criteria, among all possibleconfigurations that satisfy the constraints of the problem,is NP-hard.2 Therefore, we have designed a heuristic thatapproximates the optimal solution using a constraint solveras a building block. We have modeled the minimizationproblem captures by Definition 2 as a constraint problemsuch that for a given tree, finds the optimal location ofserializers (for a given set of possible location candidates)and the optimal (if any) propagation delays.

    The proposed algorithm, depicted in Alg. 3, works as fol-lows. Iteratively, starting with a full binary tree with onlytwo leaves (Alg. 3, line 3), generates all possible isomor-phism classes of full binary trees with N labeled leaves (i.e.,datacenters). The algorithm adds one labeled leaf (datacen-ter) at each iteration until the number of leaves is equal tothe total number of datacenters. For a given full binary treeT of f leaves, there exist 2⇤ f �1 isomorphic classes of fullbinary trees with f+1 leaves. One can obtain a new isomor-phic class by either inserting a new internal node within anedge of T from which the new leaf hangs (Alg. 3, line 14),or by creating a new root from which the new leaf and Thang (Alg. 3, line 10). We could iterate until generating allpossible trees of N leaves. Nevertheless, in order to avoida combinatorial explosion (for nine datacenters there wouldalready be 2,027,025 possible trees), the algorithm selectsat each iteration the most promising trees and discards therest. In order to rank the trees at each iteration, we use theconstraint solver. Therefore, given a totally ordered list ofranked trees, if the difference between the rankings of twoconsecutive trees T1 and T2 is greater than a given thresh-old, T2 and all following trees are discarded (Alg. 3, line 18).At the last iteration, among all trees with N leaves, we pickthe one that produces the smallest global mismatch from theoptimal visibility times by relying on the constraint solver.

    Note that Algorithm 3 always returns a binary tree. Nev-ertheless, SATURN does not require the tree to be binary.One can easily fuse two serializers into one if both are di-rectly connected, placed in the same location, and the artif-ical propagation delays among them are zero. Any of these

    2 A reduction from the Steiner tree problem [35] can be used to prove this.

    The goal is to build the tree such that metadata-paths latencies (through the tree) match data-paths

    minimize mismatch of busiest paths

    30

  • Metadata propagation: building the tree

    Finding the optimal tree is modelled as a constraint optimization problem

    Input 


    Data-paths average latencies 


    Candidate locations for serializers (an latencies among them) 
 Access-patterns: to minimize the impact of mismatches

    31

  • Example

    3 421 1ms 1ms

    10ms

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 02

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 02

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 02

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 023

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 023

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    b

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    b

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    b

    32

  • Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a b

    32

  • b

    Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    33

  • b

    Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    33

  • b

    Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    33

  • b

    Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    33

  • b

    Example

    3 421 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    33

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    34

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    34

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    34

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    34

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    34

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    34

    b -> 5 
c -> 7

    a -> 12

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    b

    34

    b -> 5 
c -> 7

    a -> 12

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    b

    34

    b -> 5 
c -> 7

    a -> 12

  • 4b

    Example

    321 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    a

    c

    a

    b

    34

    b -> 5 
c -> 7

    a -> 12

  • 2 4b

    Example

    31 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    a

    b

    35

  • 2 4b

    Example

    31 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14…16

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    a

    b

    c

    35

  • 2 4b

    Example

    31 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14…16

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    a

    b

    c

    35

  • 2 4b

    Example

    31 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14…16

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    a

    b

    c

    35

  • 2 4b

    Example

    31 1ms 1ms

    10ms

    S1 S2 S3 S4

    S5 S6

    S5 @ dc1 (or dc2)

    S6 @ dc4 (or dc3)

    Time: 0234 5 6 7…12 …14…16

    bc

    a a concurrent to both b and cc causally depends on b

    10ms

    a

    c

    a

    b

    c

    35

  • 36

  • …put(a1)

    data

    labels

    3 N21Client 1

    data centersa1

  • …put(a1)

    data

    labels

    3 N21Client 1

    data centersa1

    Saturn requires each datacenterto output a

    SERIAL sequence of updates

  • 38

    each datacenter has many machines

    Eunomia

    how to serialize updates performed on different machines without loosing performance?

    usually one relies on sequencers that limit concurrency

  • conceived to replace sequencers used to serialize updates in replicated storage systems

    39

    totally orders—consistently with causality—local updates, before shipping them to Saturn

    and other dcs

    the ordering is done in the background, out of client’s critical path

    Eunomia

  • 40

    A CB

    Sequencer

  • 41

    A CB

    Eunomia

  • 42

    put(a) put(b)

    A CB

  • 42

    put(a) put(b)

    A CB

  • 42

    put(a) put(b)

    1 1

    A CB

  • 42

    put(a) put(b)

    1

    1 1

    1

    A CB

  • 43

    put(c)

    1 1

    A CB

  • 43

    put(c)

    1 1

    2

    2

    A CB

  • 44

    1 1

    put(d)

    2

    A CB

  • 44

    1 1

    put(d)

    1

    12

    A CB

  • 45

    1 1 12

    put(e)

    A CB

  • 45

    1 1 12

    put(e)

    2

    2

    A CB

  • 46

    1 1 12 2

    A CB

  • 47

    1 1 12 2

    A CB

  • 48

    1 1 1 2 2

    A CB

    48

  • 49

    1

    1

    1

    2 2

    A CB

  • 50

    2 2

    put(f)

    A CB

  • 50

    2 2

    put(f)

    3

    3

    A CB

  • 51

    +

    Evaluation

  • • How important is the configuration of Saturn?

    • Can Saturn+Eunomia optimize both throughput and data freshness simultaneously?

    Evaluation: goals

    52

    • How Eunomia compares to sequencers?

    • Why Saturn’s genuine partial replication matters?

  • • How important is the configuration of Saturn?

    • Can Saturn+Eunomia optimize both throughput and data freshness simultaneously?

    Evaluation: goals

    53

    • How Eunomia compares to sequencers?

    • Why Saturn’s genuine partial replication matters?

  • Saturn vs state-of-the-art

    Partitioning among 7 data centers 
Replication factor min. 2 max. [2,5] 
[J. M. Pujol et al. The little engine(s) that could: Scaling online social networks. SIGCOMM ’10]

    Workload based on real social network workloads 
[F. Benevenuto et al. Characterizing user behavior in online social networks. ICM ’09]

    Public Facebook dataset 
[B. Viswanath et al. On the evolution of user interaction in Facebook. WOSN ’09]

    N.

    (O

    T

    (N.

    OS

    (N. (Ir

    N. Ir Fr

    54

  • Saturn vs state-of-the-art 
Throughput

    0 20000 40000 60000 80000

    100000 120000

    5 4 3 2Th

    rou

    gh

    pu

    t (o

    ps/s

    ec) Eventual

    SaturnGentleRain

    Cure

    max replication degree

    Thro

    ughp

    ut (o

    ps/s

    ec)

    55

  • Saturn vs state-of-the-art 
Throughput

    0 20000 40000 60000 80000

    100000 120000

    5 4 3 2Th

    rou

    gh

    pu

    t (o

    ps/s

    ec) Eventual

    SaturnGentleRain

    Cure

    max replication degree

    Thro

    ughp

    ut (o

    ps/s

    ec)

    Saturn exhibits a throughput very close to

    eventual consistency55

  • Saturn vs state-of-the-art 
Throughput

    0 20000 40000 60000 80000

    100000 120000

    5 4 3 2Th

    rou

    gh

    pu

    t (o

    ps/s

    ec) Eventual

    SaturnGentleRain

    Cure

    max replication degree

    Thro

    ughp

    ut (o

    ps/s

    ec)

    Saturn exhibits a throughput very close to

    eventual consistency

    Cure significantly penalizes throughput due

    to the metadata size

    55

  • Saturn vs state-of-the-art 
Data freshness: average

    0 20 40 60 80

    100 120 140 160

    Rem

    ote

    vis

    ibili

    ty la

    tency

    (m

    s)

    56

    Even

    tual

    Satur

    n

    Gentl

    eRainCu

    re

  • Saturn vs state-of-the-art 
Data freshness: average

    0 20 40 60 80

    100 120 140 160

    Rem

    ote

    vis

    ibili

    ty la

    tency

    (m

    s)

    56

    Even

    tual

    Satur

    n

    Gentl

    eRainCu

    re

    Still better than Curethat uses vector clocks

    and very close to eventual

  • • How important is the configuration of Saturn?

    • Can Saturn+Eunomia optimize both throughput and data freshness simultaneously?

    Evaluation: goals

    57

    • How Eunomia compares to sequencers?

    • Why Saturn’s genuine partial replication matters?

  • Saturn
Genuine partial replication matters

    58

    N. Virgina Ireland Sydney

    • Sydney receives updates from Ireland • Ireland updates depend on N. Virginia updates • Sydney does not replicate all N. Virginia data

  • Saturn
Genuine partial replication matters

    59

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 40 80 120 160 200 240 280

    CD

    F

    Remote update visibility (milliseconds)

    SaturnVector

    Latency of updates from Ireland when applied at Sydney

  • Saturn
Genuine partial replication matters

    59

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 40 80 120 160 200 240 280

    CD

    F

    Remote update visibility (milliseconds)

    SaturnVector

    Latency of updates from Ireland when applied at Sydney

    Updates from Ireland are stalled until metadata is received from N. Virginia

  • • How important is the configuration of Saturn?

    • Can Saturn+Eunomia optimize both throughput and data freshness simultaneously?

    Evaluation: goals

    60

    • How Eunomia compares to sequencers?

    • Why Saturn’s genuine partial replication matters?

  • Saturn configuration matters


    NC O I F T SNV 37 ms 49 ms 41 ms 45 ms 73 ms 115 msNC - 10 ms 74 ms 84 ms 52 ms 79 msO - - 69 ms 79 ms 45 ms 81 msI - - - 10 ms 107 ms 154 msF - - - - 118 ms 161 msT - - - - - 52 ms

    Table 1: Average latencies (half RTT) among Amazon EC2regions: N. Virginia (NV), N. California (NC), Oregon (O),Ireland (I), Frankfurt (F), Tokyo (T), and Sydney (S)

    In order to compare SATURN with other solutions fromthe state-of-the-art (data services), we attached SATURN toan eventually consistent geo-replicated storage system wehave built. Throughout the evaluation, we use this eventu-ally consistent data service as the baseline, as it adds nooverheads due to consistency management, to better under-stand the overheads introduced by SATURN. Note that thisbaseline represents a throughput upper-bound and a latencylower-bound. Thus, when we refer to the optimal visibilitylatency throughout the experiments, we are referring to thelatencies provided by the eventually consistent system.

    Implementation. Our SATURN prototype implements allfunctionality described in the paper. It has been built usingthe Erlang/OTP programming language. In our prototype,gears rely on physical clocks to generate monotonically in-creasing timestamps. To balance the load among frontends ateach datacenter, we use Riak Core [14], an open source dis-tribution platform. The optimization problem (Definition 2)used to configure SATURN is modeled using OscaR [44], aScala toolkit for solving Operations Research problems.

    Setup. We use Amazon EC2 m4.large instances run-ning Ubuntu 12.04 in our experiments. Each instance hastwo virtual CPU cores, and 8 GB of memory. We use sevendifferent regions in our experiments. Table 1 lists the aver-age latencies we measured among regions. Our experimentssimulate one datacenter per region. Clients are co-locatedwith their preferred datacenter in separate machines. Eachclient machine runs its own instance of a custom version ofBasho Bench [13], a load-generator and benchmarking tool.Each client eagerly sends requests to its preferred datacen-ter with zero thinking time. We deploy as many clients asnecessary in order to reach the system’s maximum capacity,without overloading it. Each experiment runs for more than5 minutes. In our results, the first and the last minute of eachexperiment are ignored to avoid experimental artifacts. Wemeasure the visibility latencies of remote update operationsby storing the physical time at the origin datacenter when theupdate is applied locally, and subtracting it from the physi-cal time at the destination datacenter when the update be-comes visible. To reduce the errors due to clock skew, phys-ical clocks are synchronized using the NTP protocol [43]before each experiment, making the remaining clock skewnegligible in comparison to inter-datacenter travel time.

    0

    0.2

    0.4

    0.6

    0.8

    1

    0 50 100 150 200 250

    CD

    F

    Remote update

    P-conf

    0 100 200 300 400 500visibility (milliseconds)

    M-conf S-conf

    Figure 4: Left: Ireland to Frankfurt (10ms); Right: Tokyo toSydney (52ms)

    7.1 SATURN Configuration MattersWe compare different configurations of SATURN to bet-ter understand their impact on the system performance.We compare three alternative configurations: (i) a single-serializer configuration (S), (ii) a multi-serializer configu-ration (M), and (iii) a peer-to-peer version of SATURN thatrelies on the conservative label’s timestamp order to applyremote operations (P). We focus on the visibility latenciesprovided by the different implementations.

    For the S-configuration, we placed the serializer in Ire-land. For the M-configuration, we build the configurationtree by relying on Algorithm 5.4. We run an experiment witha read dominant workload (90% reads). Figure 4 shows thecumulative distribution of the latency before updates origi-nating in Ireland become visible in Frankfurt (left plot) andbefore updates originating in Tokyo become visible in Syd-ney (right plot). Results show that both the S and M config-urations provide comparable results for updates being repli-cated in Frankfurt. This is because we placed the serializerof the S-configuration in Ireland, and therefore, the prop-agation of labels is done efficiently among these two re-gions. Unsurprisingly, when measuring visibility latenciesbefore updates originating in Tokyo become visible in Syd-ney, the S-configuration performs poorly because labels haveto travel from Tokyo to Ireland and then from Ireland toSydney. Plus, results show that the P-configuration, that re-lies on the label’s timestamp order, is not able to providelow visibility latencies in these settings. This is expected as,when tracking causality with a single scalar, latencies tend tomatch the longest network travel time (161ms in this case)due to false dependencies. In turn, the M-configuration isable to provide significantly lower visibility latencies to alllocations (deviating only 8.2ms from the optimal on aver-age).

    7.2 Impact of Latency Variability on SATURNThe goal of this section is to better understand how changesin the link latency affect SATURN’s performance. We havejust seen that the correct configuration of the serializers’ treehas an impact on performance. Therefore, if changes in thelink latencies are large enough to make the current configu-ration no longer suitable, and these changes are permanent,a reconfiguration of SATURN should be triggered. In prac-tice, transient changes in the link latencies are unlikely to

    Ireland Frankfurt Tokyo Sydney

    Ireland to Frankfurt Tokyo to Sydney

  • Saturn configuration matters


    0

    20000

    40000

    60000

    80000

    100000

    120000

    8 32 128 512 2048

    (a) Value size in bytes

    50:50 75:25 90:10 99:1

    Eventual Saturn

    (b) Read:Write ratio

    exponential proportional uniform full

    GentleRain Cure

    (c) Correlation distribution

    0% 5% 10% 20% 40%

    (d) Percentage of remote reads

    Figure 5: Dynamic workload throughput experiments.

    0 10 20 30 40 50 60

    0 25 50 75 100 125

    Ext

    ra v

    isib

    ility

    late

    ncy

    (m

    s)

    Injected delay (ms)

    T1T2

    Figure 6: Impact of latency variability on remote updatevisibility in SATURN.

    justify a reconfiguration; therefore we expect their effect onperformance to be small.

    To validate this assumption, we set a simple experimentwith three datacenters, each located in a different EC2 re-gion: N. Carolina, Oregon and Ireland. For the experiment,we artificially inject extra latency between N. Carolina andOregon datacenters (averaged measured latency is 10ms).From our experience, we expect the latency among EC2regions to deviate from its average only slightly and tran-siently. Nevertheless, to fully understand the consequencesof latency variability, we also experimented with unrealisti-cally large deviations (up to 125ms).

    Figure 6 shows the extra remote visibility latency that twodifferent configurations of SATURN add on average whencompared to an eventually consistent storage system whichmakes no attempt to enforce causality. Both configurations,T1 and T2, use a single serializer: configuration T1 placesthe serializer in Oregon, representing the optimal configura-tion under normal conditions and configuration T2, instead,places the serializer in Ireland.

    As expected, under normal conditions, T1 performs sig-nificantly better than T2, confirming the importance ofchoosing the right configuration. As we add extra latency,T1 degrades its performance, but only slightly. One can ob-serve that, in fact, slight deviations in the averaged latencyhave no significant impact in SATURN: even with an extradelay of 25ms (more than twice the average delay), T1 onlyadds 14ms of extra visibility latency on average. Interest-ingly, it is only with more than 55ms of injected latencythat T2 becomes the optimal configuration, exhibiting lowerremote visibility latency than T1. Observing a long and sus-tained increase of 55ms of delay on a link that averages10ms is highly unlikely. Indeed, this scenario has the sameeffect of migrating the datacenter from N. Carolina to SãoPaulo. Plus, if such large deviation becomes the norm, sys-

    tem operators can always rely on SATURN’s reconfigurationmechanism to change SATURN configuration.

    7.3 SATURN vs. the State-of-the-artWe compare the performance of SATURN against eventualconsistency and against the most performant causally con-sistent storage systems in the state-of-the-art.

    7.3.1 GentleRain and CureWe consider GentleRain [26] and Cure [3] the current state-of-the-art. We have also experimented with solutions basedon explicit dependency checking such as COPS [39] andEiger [40]. Nevertheless, we concluded that approachesbased on explicit dependency checking are not practicalunder partial geo-replication. Their practicability dependson the capability of pruning client’s list of dependenciesafter update operations due to the transitivity rule of causal-ity [39]. Under partial geo-replication, this is not possible,causing client’s list of dependencies to potentially grow upto the entire database.

    At its core, both GentleRain and Cure implement causalconsistency very similarly: they rely on a background stabi-lization mechanism that requires all partitions in the systemto periodically exchange metadata. This equips each parti-tion with sufficient information to locally decide when re-mote updates can be safely–with no violation of causality—made visible to local clients. In our experiments, GentleRainand Cure’s stabilization mechanisms run every 5ms follow-ing the authors’ specifications. The interested reader can findmore details in the original papers [3, 26]. We recall thatSATURN does not require such a mechanism, as the order inwhich labels are delivered to each datacenter already deter-mines the order in which remote updates have to be applied.

    The main difference between GentleRain and Cure re-sides in the way causal consistency is tracked. While Gen-tleRain summarizes causal dependencies in a single scalar,Cure uses a vector clock with an entry per datacenter. Thisenables Cure to track causality more precisely—loweringremote visibility latency—but the metadata managementincreases the computation and storage overhead—harmingthroughput. Concretely, by relying on a vector, Cure remoteupdate visibility latency lower-bound is determined by thelatency between the originator of the update and the remotedatacenter. Differently, in GentleRain, the lower-bound is

    N. Carolina Oregon Ireland N. Carolina Oregon Ireland

    N. Carolina Oregon

  • • How important is the configuration of Saturn?

    • Can Saturn+Eunomia optimize both throughput and data freshness simultaneously?

    Evaluation: goals

    63

    • How Eunomia compares to sequencers?

    • Why Saturn’s genuine partial replication matters?

  • 64

    Evaluation

    0 50

    100 150 200 250 300 350 400

    Thro

    ughp

    ut (K

    ops/

    sec) Eunomia 15

    Eunomia 30Eunomia 45Eunomia 60Eunomia 75Sequencer

    maximum throughput achievable by Eunomia vs a classical sequencer

  • 64

    Evaluation

    0 50

    100 150 200 250 300 350 400

    Thro

    ughp

    ut (K

    ops/

    sec) Eunomia 15

    Eunomia 30Eunomia 45Eunomia 60Eunomia 75Sequencer

    maximum throughput achievable by Eunomia vs a classical sequencer

    x7.7

  • Other features…

    • On-line reconfiguration protocols• Efficient migration of clients among data centers

    65

    +

    • Impact of stragglers• More experiments

    • Support for fault-tolerance

  • Check the papers for more details.

    66

    +

    EUROSYS 2017 ATC 2017

  • Take-away messages

    • It is possible to preserve concurrency inside a DC, and to capture causality with a single scalar, using an ordering service that is out the the client’s critical path

    • These scalar clocks can be propagated using tree-based dissemination, that can effectively mitigate the impact of false dependencies

    • Combined, they allow the optimisation of both throughput and data freshness

    67

    +