Upload
serverscom
View
358
Download
1
Embed Size (px)
Citation preview
Talk structure
Why object storage;
How Openstack Swiftis made;
How to make Swift serve these purposes better
About me
COO of servers.com;
Take joy of life and happiness from our developers an engineersand give it to our customers;
Still use vim in my daily life
Our values
We stand for three principles in product development and customer communication
Creativeengineering
Accessibleperformance
Customer power
Dallas
AmsterdamMoscow
Luxembourg
Hyderabad
Singapore
About us
Global Private Network
40 Gbps to each server
Global presence and standards
Demanding and professional customers
Why do we need storage
CDN origin;
Backups (serversand infrastructure);
Content delivery without CDN
Anything else our customerscan think of.
?
Current workloads
~ 10 petabytes of data
110 Gbps traffic
60% of data is storedin multiple locations
Backups of thousandsof servers
CDN Origin
When we sell CDN, we sell a hassle-free premium service;
Global CDN requires not only distributed caching, but distributed storage as well;
We’re taking responsibility for quality of service –that is why we need to take control over storage.
Backups
We should be capable to accepting large volumesof data within short timeframe;
Data should be protected from accidental loss of any nature;
Why Swift
Popular API, plenty of toolsand developers;
Scales horizontally;
Fault tolerance;
The most mature producton market.
Swift Drawbacks
Some add-ons are poorly tested;
Has some seriousdesign flaws;
Requires an experienced operations team.
Implementation in servers.com Two roles
Storage – object node + account node + container node
Proxy– swift-proxy + nginx + haproxy + ftp-cloudfs
Partitions
Their amount is set once in a cluster’slife and can not be changed;
Must be a power of 2;
Should be between 100and 1000 per drive
Best thing you can guarantee: 500% scaling limit
Scaling
500% scaling limit;
Adding drives to clustershould be slow;
You should not let cluster be over 80% full;
Hardware failure
Losing a zone is not a disaster. It’s resurrecting it that really hurts.
With such amount of drives you should automate things. We’ve written a tool automatically requesting datacenter engineers to replace dead drives.
“1 copy left” is an ultimate alert
Hardware failureMean 95th percentile
Reads per second 100/88/65 48/38/9
Writes per second 100/76/36 24/18/7
Mean 99.95% SLA failures
Failed reads 0/0/0.03 0/0/45%
Failed writes 0/0/0.04 0/0/63%
5 healthy zones / 4 healthy zones / 4 healthy and 1 recovering zone
Intensive rebalancing yields to 3-5 times increase in response time. Without it, 20% of requests are served slower
SQLite
Container and account data is stored in SQLite
SSD saves the situation: on SATA you can’t have morethan 1 mln. of objects in container. With SSD, 100 mln.is not the limit (using proper naming)
1 TB of SSD per 1 PB of SATA costs nothingand enhances performance dramatically
Operational notes
Replication traffic and client’s traffic co-existin the same network and no QoS is possible
Internal network and CPU should beunder-used in a normal mode
Post-accident recovery and scaling mustbe very gradual
Syncing
By default, syncing runs in a single-threadper region;
We’ve completely re-written it and madeit multi-threaded;
Than we’ve added monitoring to sync tooland a watchdog to control over stuck syncs;
For deletes we have a synchronous mode –instead of eventual replication, this oneis truly synchronous
Content distribution vs. backup storage
Economically - it is a match made in heaven;
Technically - you have to process both GETsand PUTs efficiently;
Content distribution requires accounting –a high-performance tool, analysing trafficat an application level.
Content distribution vs. backup storage
Keystone
BillingHAProxy
Delete Proxy
GET, HEAD
Authentication
Endpoint: HAProxy
ACCOUNTING DATAGET,HEAD
PUT, POSTDELET
E
DELETE
nginxswift-proxy
FTP
FTP is not dead, unfortunately. FTP is still by farthe most popular upload path (thanks to legacy code)
ftp-cloudfs works and scales horizontally – you canthink of object storage as of unlimited FTP;
We’ve added large files deletion and renaming, part-filehiding and returned it back to community;