View
214
Download
0
Tags:
Embed Size (px)
Citation preview
PhD Dissertation Defense1
Composable Consistency for Wide Area Replication
Sai Susarla
Advisor: Prof. John Carter
PhD Dissertation Defense2
Overview
Goal: middleware support for wide area caching in diverse distributed applications
Key Hurdle: flexible consistency management
Our Solution: novel consistency interface/model - Composable Consistency
Benefit: supports broader set of sharing needs than existing models. Examples:
file systems, databases, directories, collaborative apps – wider variety than any existing consistency model can support
Demo Platform: novel P2P middleware data store - Swarm
PhD Dissertation Defense3
Caching: Overview
The Idea: cache frequently used items locally for quick retrieval
Benefits Within cluster: load-balancing, scalability Across WAN: lower latency, improved throughput & availability
Applications Data stored in one place, accessed from multiple locations Examples:
» File system: personal files, calendars, log files, software, …» Database: online shopping, inventory, auctions, …» Directory: DNS, LDAP, Active Directory, KaZaa, …» Collaboration: chat, multi-player games, meetings, …
PhD Dissertation Defense4
Centralized Service
userclient
Internet
Primaryserver cluster
PhD Dissertation Defense5
Proxy-based Caching
userclient
Internet
Caching proxyServer cluster
Consistencyprotocol
Primaryserver cluster
PhD Dissertation Defense6
Caching: The Challenge
Applications have diverse consistency needs
Application Sharing Characteristics Consistency needs
Static web content, media, s/w updates
Read-mostly Stale data, manual reload ok
Chat, whiteboard Concurrent appends Real-time sync, causal msg order
Auctions, ticket sales, Financial DB
Write-sharing, conflicts, varying contention
Serializability, latest data, atomicity (ACID)
… … …
PhD Dissertation Defense7
Caching: The Problem
Consistency requirements are diverse
Caching is difficult over WANs Variable delays, node failures, network partitions, admin domains, …
Thus, most WAN applications either: Roll their own caching solution, or
Do not cache and live with the latency
Can we do better?
PhD Dissertation Defense8
Thesis
"A consistency management system that provides
a small set of customizable consistency mechanisms
can efficiently satisfy the data sharing needs of
a wide variety of distributed applications."
PhD Dissertation Defense9
Outline
Further Motivation
Application study new taxonomy to classify application sharing needs
Composable Consistency (CC) model Novel interface to express consistency semantics for each access Small option set can express more diverse semantics
Evaluation
PhD Dissertation Defense10
Existing Models are Inadequate
Provide a few packaged consistency semantics for specific needs:
e.g., optimistic/eventual, close-to-open, strong
Or, lack enough flexibility to support diverse needs TACT (cannot express weak consistency or session semantics) Bayou (cannot support strong consistency)
Or, leave consistency management burden on applications
e.g., Oceanstore, Globe
PhD Dissertation Defense11
Existing Middleware is Inadequate
Existing middleware support specific sharing needs Read-only data: PAST, BitTorrent Rare write-sharing: file systems (NFS, Coda, Ficus …) Master-slave (read-only) replication: storage vendors, mySQL Scheduled (nightly) replication: storage and DB services Read-write replication in a cluster: commercial DB vendors, Petal
PhD Dissertation Defense12
Application Survey
40+ applications with diverse consistency needs
Application Sharing Characteristics Consistency needs
Static web content, media, s/w updates
Read-mostly Stale data, manual reload ok
Stock quotes Read-only Limit max. staleness to T secs
Chat, whiteboard Concurrent appends Real-time sync, causal msg order
Multiplayer game Heavy write-sharing Real-time sync, totally order play moves
Auctions, ticket sales, Financial DB
Write-sharing, conflicts, varying contention
Serializability, latest data, atomicity (ACID)
Personal file access Rare write-sharing Eventual consistency
Mobile file access, collaborative sharing
Sequential write-sharing Latest data, session semantics
Directory, calendars, groupware
Write-sharing, mergeable writes Tight sync within campus, relaxed sync across campuses
PhD Dissertation Defense13
Survey Results
Found common issues, overlapping choices Are parallel read and writes ok? How often should replicas synchronize? Does update order matter? What if some copies are inaccessible? …
Can we exploit this commonality?
PhD Dissertation Defense14
Composable Consistency:Novel interface to express consistency
semantics
Access mode Concurrent Exclusive
Sync frequency Manual push, pull
T-seconds stale, N missed writes
Strength Hard Soft
Causality Yes No
Atomicity Yes No
Update ordering None Total Serial
Inaccessible copy Ignore Fail access
Accept updates Session Immediately
Reveal updates On close() Immediately
Concurrency control
Replica synchronization
Failure handlingView IsolationUpdate Visibility
PhD Dissertation Defense15
Example: Close-to-open (AFS)
Access mode Concurrent Exclusive
Sync frequency Manual push, pull
0 seconds stale
Strength Hard Soft
Causality Yes No
Atomicity Yes No
Update ordering None Total Serial
Inaccessible copy Ignore Fail access
Accept updates Session Immediately
Reveal updates On close() Immediately
Allow parallel reads and writes
Latest data guaranteed at open()
Fail access when partitioned
Accept remote updates only at open()
Reveal local updates to others only on close()
PhD Dissertation Defense16
Example: Eventual Consistency (Bayou)
Access mode Concurrent Exclusive
Sync frequency Manual push, pull
10 minutes stale
Strength Hard Soft
Causality Yes No
Atomicity Yes No
Update ordering None Total Serial
Inaccessible copy Ignore Fail access
Accept updates Session Immediately
Reveal updates On close() Immediately
Allow parallel reads and writes
Sync copies at most once every 10 minutes
Syncing should not block or fail operations
Accept remote updates as they arrive
Reveal local updates to others as they happen
PhD Dissertation Defense17
Handling Conflicting Semantics
What if two sessions have different semantics? If conflicting, block a session until conflict goes away (serialize) Otherwise, allow them in parallel
Simple rules for checking conflicts (conflict matrix)
Examples: Exclusive write vs. exclusive read vs. eventual write: serialize Write-immediate vs. session-grain isolation: serialize Write-immediate vs. eventual read: no conflict
PhD Dissertation Defense18
Using Composable Consistency
Perform data access within a session e.g., session_id = open(object, CC_option_vector); read(session_id, buf); write(session_id, buf);
OR, update(session_id, incr_counter(value)); close(session_id);
Specify consistency semantics per-session at open() via the CC option vector
Concurrency control, replica synchronization, failure handling, view isolation and update visibility.
System enforces semantics by mediating each access
PhD Dissertation Defense19
Composable Consistency Benefits
Powerful: Small option set can express diverse semantics
Customizable: allows different semantics for each access
Effective: amenable to efficient WAN implementation
Benefit to middleware Can provide read-write caching to a broader set of apps.
Benefit for an application Can customize consistency to diverse and varying sharing needs Can simultaneously enforce different semantics on the same data for
different users
PhD Dissertation Defense20
Evaluation
PhD Dissertation Defense21
Swarm: A Middleware Providing CC
Swarm: Shared file interface with CC options Location-transparent page-grained file access Aggressive P2P caching Dynamic cycle-free replica hierarchy per file
Prototype implements CC (except causality & atomicity) Per-file, per-replica and per-session consistency
Network economy (exploit nearby replicas)
Contention-aware replication (RPC vs caching)
Multi-level leases for failure resilience
PhD Dissertation Defense22
Client-server BerkeleyDB Application
InternetPrimaryApp server
App logic
DB
kernel
App users
FS
LAN
LAN
App users
PhD Dissertation Defense23
BerkeleyDB Application using Swarm
InternetPrimaryApp server
App logic
DB
kernel
App users
FS
LAN
LAN
RDB plugin
Swarm server
DB
RDB wrapper
App users
PhD Dissertation Defense24
Caching Proxy App Server using Swarm
InternetPrimaryApp server
App logic
DB
kernel
App users
FS
LAN
LAN
RDB plugin
Swarm server
DB
RDB wrapper
App logic
DB
kernel FS
RDB plugin
Swarm server
DB
RDB wrapper
ProxyApp server
App users
PhD Dissertation Defense25
Swarm-based Applications
SwarmDB: Transparent BerkeleyDB database replication across WAN
SwarmFS: wide area P2P read-write file system
SwarmProxy: Caching WAN proxies for an auction service with strong consistency
SwarmChat: Efficient message/event dissemination
No single model can support the sharing needs of all these applications
PhD Dissertation Defense26
SwarmDB: Replicated BerkeleyDB
Replication support built as wrapper library
Uses unmodified BerkeleyDB binary
Evaluated with five consistency flavors: Lock-based updates, eventual reads Master-slave writes, eventual reads Close-to-open reads, writes Staleness-bounded reads, writes Eventual reads, writes
Compared against BerkeleyDB-provided RPC version
Order-of-magnitude throughput gains over RPC by relaxing consistency
PhD Dissertation Defense27
SwarmDB Evaluation
BerkeleyDB B-tree index replicated across N nodes
Nodes linked via 1Mbps links to common router 40ms RTT to each other
Full-speed workload 30% Writes: inserts, deletes, updates 70% Reads: lookups, cursor scans
Varied # replicas from 1 to 48
PhD Dissertation Defense28
SwarmDB Write Throughput/replica
Locking writes, eventual reads
Close-to-open
Master-slave writes, eventual reads
10msec stale
Optimistic
20msec stale
RPC over WAN
Local SwarmDB server
PhD Dissertation Defense29
SwarmDB Query Throughput/replica
RPC over WAN
Local SwarmDB server
Optimistic
10msec stale
Close-to-open
PhD Dissertation Defense30
SwarmDB Results
Customizing consistency can improve WAN caching performance dramatically
App can enforce diverse semantics by simply modifying CC options
Updates & queries with different semantics possible
PhD Dissertation Defense31
SwarmFS Distributed File System
Sample SwarmFS path /swarmfs/swid:0x1234.2/home/sai/thesis.pdf
Performance Summary Achieves >80% of local FS performance on Andrew Benchmark More network-efficient than Coda for wide area access Correctly supports fine-grain collaboration across WANs Correctly supports file locking for RCS repository sharing
PhD Dissertation Defense32
SwarmFS: Distributed Development
PhD Dissertation Defense33
Replica Topology
PhD Dissertation Defense34
SwarmFS vs. Coda Roaming File Access
Compile Latency from Cold Cache
0
50
100
150
200
250
300
U1 I1,24ms
C1,50ms
T1,160ms
F1,130ms
seco
nd
s
Coda-s SwarmFS
Coda-s always gets files from distant U1.
SwarmFS gets files from nearest copy.
Network Economy
PhD Dissertation Defense35
SwarmFS vs. Coda Roaming File Access
Compile Latency from Cold Cache
0
50
100
150
200
250
300
U1 I1,24ms
C1,50ms
T1,160ms
F1,130ms
LAN#-node#, RTT to Home (U1)
seco
nd
s
Coda-s SwarmFS
Coda-s writes files through to U1 for close-to-open semantics.
Swarm’s P2P pull-based protocol avoids this.
Hence, SwarmFS performs better for temporary files.
P2P protocol more efficient
PhD Dissertation Defense36
SwarmFS vs. Coda Roaming File Access
Compile Latency from Cold Cache
0
50
100
150
200
250
300
U1 I1,24ms
C1,50ms
T1,160ms
F1,130ms
seco
nd
s
Coda-s SwarmFS Coda-wEventual consistency inadequate
Coda-w behaves incorrectly
`make’ skipped files
linker found corrupt object files.
Trickle reintegration pushed huge obj files to U1, clogging network link.
Coda-w Compile errors
PhD Dissertation Defense37
Evaluation Summary
SwarmDB: gains of customizable consistency
SwarmFS: network economy under write-sharing
SwarmProxy: strong consistency over WANs under varying contention
SwarmChat: update dissemination in real-time
By employing CC, Swarm middleware data store can support diverse app needs effectively
PhD Dissertation Defense38
Related Work
Flexible consistency models/interfaces Munin, WebFS, Fluid Replication, TACT
Wide area caching solutions/middleware File systems and data stores:
AFS, Coda, Ficus, Pangaea, Bayou, Thor, … Peer-to-peer systems:
Napster, PAST, Farsite, Freenet, Oceanstore, BitTorrent, …
PhD Dissertation Defense39
Future Work
Security and authentication
Fault-tolerance via first-class replication
PhD Dissertation Defense40
Thesis Contributions
Survey of sharing needs of numerous applications
New taxonomy to classify application sharing needs
Composable consistency model based on taxonomy
Demonstrated CC model is practical and supports diverse applications across WANs effectively
PhD Dissertation Defense41
Conclusion
Can a storage service provide effective WAN caching support for diverse distributed applications? YES
Key enabler: a novel flexible consistency interface called Composable consistency
Allows an application to customize consistency to diverse and varying sharing needs
Allows middleware to serve a broader set of apps effectively
PhD Dissertation Defense42
PhD Dissertation Defense43
SwarmDB Control Flow
PhD Dissertation Defense44
Composing Master-slave
Master-slave replication serialize updates
» Concurrent mode writes (WR)» Serial update ordering (apply updates at central master)
eventual consistency for queries» Options mentioned earlier
Use: mySQL DB read-only replication across WANs
PhD Dissertation Defense45
Clustered BerkeleyDB
PhD Dissertation Defense46
BerkeleyDB Proxy using Swarm
PhD Dissertation Defense47
A Swarm-based Chat Room
callback(handle, newdata){ display(newdata);}main(){ handle = sw_open(kid, "a+"); sw_snoop(handle, callback); while (! done) {
read(&newdata); display(newdata); sw_write(handle, newdata); } sw_close(handle);}
Chat transcript: WR mode, 0 second soft staleness, immediate visibility, no isolation
21
3
4
Update propagation path
P
Sample Chat client code