Upload
marat-zhanikeev
View
123
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Cloud platforms today are increasingly resorting to vertical integration to save on costs and effort in general. This leads to famous cases like June 2013 when power outage in one of Amazon data centers led to prolonged outage in entire ecosystems like heroku. When it comes to storage, businesses are extremely sensitive to prolonged outages in service. This paper proposes a client-side solution which rectifies this problem by distributing storage among multiple service providers -- referred to as substores, where the abstraction allows for any storage technology like over-the-network APIs, local HDD or SSD disks, etc. Using the opportunity, the tool implements other useful features like social metadata layer and throughput awareness which allow for a brand-new formulation of smart distribution.
Citation preview
High Availability
.
Mission Statement
1. high availability business-level cloud data store2. federated clouds = diversification3. many DCs and/or cloud providers
4. we care mostly about performance = high availability
5. practical solutions are needed
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 2/21...
2/21
.
haStore : The Short Story
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 3/21...
3/21
.
haStore: One DC is Not Enough
• rememberJune 2013?• most services today use vertical intergration -- no diversity
• Hitachi does not share DCs with NEC
• regional diversity of one provider is bad◦ how many Amazon DCs in Japan?
.(the only possible) Solution..
.
... is to sign contracts with multiple DCs and manage on
client side
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 4/21...
4/21
.
haStore: One DC is Not Enough
Kansai
DC1
Okinawa Locations
Data Centers
DC2
Kyushu
Osaka Office DC1
DC1 DC2 Naha Office
Network distance
Network distance
storage network
Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 ….
DC1 DC2 Business trip
Store APIs
Proposed Software
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 5/21...
5/21
.
haStore: Store Diversification• store = sum ofmultiple substores• in software: not a priority list -- optimization engine!• realtime performance monitoring, read/write optimization, etc.
• sub-file data unit -- chunks
SSD Growing network
distance User
HDD DC1 DC2 …
Network
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 6/21...
6/21
.
haStore: Socially Aware Store• content relevance based on
social graph• relevance is a distribution• individual redundancy based on distribution
• other link types: same time, location,filetype, ...
• link strength != 1Descending
order
Relevance
Distribution
Redundancy (user setting)
Physical limit of redundancy
End of content
There is a link
When a file is …
Between Created Viewed Edited Deleted
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 7/21...
7/21
.
hsStore: Software Design
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 8/21...
8/21
.
Design: Specs
• many substores, heterogeneous e2e performance and capacity• each substore has its own API (Dropbox, GDrive, SSD, etc.), but haStore exports a
generic API• data unit: sub-file blobs, for now fixed 100kb size
• social graph is used to define priority lists of files◦ different for each user
• optimization is key element of software engines
1. sync logic2. redundancy logic
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 9/21...
9/21
.
Design: API Stack
• Generic API starts fromLevel 2, similar to drivers
• the stack is implemented by each client = each user
Employee A …. Content / Social Metadata High Availability Data Store DC1 DC2 …. Store
Proposed Software
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 10/21...
10/21
.
Design: Sync Engine• optimization for throughput minimization• same logic for SSD, HDD and over-the-network
haStore
Storage SyncEngine
Optimization
LocalCache
Check1 2
Use
GUI,Clients
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 11/21...
11/21
.
Design: Sync Engine Logic
Bulk
Thro
ughp
ut History Data
Increase timeout
PerformanceTradeoff
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 12/21...
12/21
.
Design: Redundancy Logic (1)
Descending order
Relevance
Distribution
Redundancy (user setting)
Physical limit of redundancy
End of content
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 13/21...
13/21
.
Design: Redundancy Logic (2)
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 14/21...
14/21
.
haStore: Social Graph
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 15/21...
15/21
.
Social Graph : Basics• current version: only simple types of links
• no link strength
There is a link
When a file is …
Between Created Viewed Edited Deleted
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 16/21...
16/21
.
Social Graph : Advanced
• community detection
• files that could be linked:
1. touched at roughly the same time2. touched by the same user3. same location, filetype, size, etc.
• link strength, different for each kind of relation, variable e2e cost onpaths
• discovery based on e2e cost, not hop count
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 17/21...
17/21
.
Implementation, Tests
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 18/21...
18/21
.
Performance : Demo
A-san B-san
DBX GDR
2014-01-22 12:13:30 Block DONEBlock UPLOADBlock DOWNLOAD
• also demo
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 19/21...
19/21
.
Wrapup
• haStore: high availability cloud store
• main features
◦ throughput-aware sync/redundancy optimization◦ sub-file blocks, smart distribution
◦ social graph• current status: v1.0 in operation, v2.0 on the way
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 20/21...
20/21
.
That’s all, thank you ...
Marat Zhanikeev -- [email protected] High Availability Cloud Storage: Social, Throughput, Smart -- http://tinyurl.com/marat140417 21/21...
21/21