38
OpenStack Swift tailored for heavy-duty workloads

OpenStack Swift tailored for high-load duties

Embed Size (px)

Citation preview

OpenStack Swifttailored for heavy-duty workloads

What’s the story

What we’ve done to builda large storage Why we’ve done that

Talk structure

Why object storage;

How Openstack Swiftis made;

How to make Swift serve these purposes better

About me

COO of servers.com;

Take joy of life and happiness from our developers an engineersand give it to our customers;

Still use vim in my daily life

Our values

We stand for three principles in product development and customer communication

Creativeengineering

Accessibleperformance

Customer power

Dallas

AmsterdamMoscow

Luxembourg

Hyderabad

Singapore

About us

Global Private Network

40 Gbps to each server

Global presence and standards

Demanding and professional customers

Why do we need storage

CDN origin;

Backups (serversand infrastructure);

Content delivery without CDN

Anything else our customerscan think of.

?

Current workloads

~ 10 petabytes of data

110 Gbps traffic

60% of data is storedin multiple locations

Backups of thousandsof servers

CDN Origin

When we sell CDN, we sell a hassle-free premium service;

Global CDN requires not only distributed caching, but distributed storage as well;

We’re taking responsibility for quality of service –that is why we need to take control over storage.

CDN Origin

Data replication across the globe;

Content protection

(Pseudo)-streaming support

Backups

We should be capable to accepting large volumesof data within short timeframe;

Data should be protected from accidental loss of any nature;

Why Swift

Popular API, plenty of toolsand developers;

Scales horizontally;

Fault tolerance;

The most mature producton market.

Swift Drawbacks

Some add-ons are poorly tested;

Has some seriousdesign flaws;

Requires an experienced operations team.

Swiftarchitecture

Swift architecture

Account

Container

Objects

Swift architecture

Storage

Zone 1

Region 1

Storage

Zone 2

Storage

Zone 1

Region 2

Storage

Zone 2

Swift architecture

Disk

Storage Node

Partition

Partition Space Storage

Swift architecture

Storage

Storage

Storage

Replicator

checkhash

checkhash

checkhash

Swift architecture

Swift architecture

Clientrequest

Storagerequest

Proxy servers Storage

servers.com implementation

Implementation in servers.com Two roles

Storage – object node + account node + container node

Proxy– swift-proxy + nginx + haproxy + ftp-cloudfs

Partitions

Their amount is set once in a cluster’slife and can not be changed;

Must be a power of 2;

Should be between 100and 1000 per drive

Best thing you can guarantee: 500% scaling limit

Scaling

500% scaling limit;

Adding drives to clustershould be slow;

You should not let cluster be over 80% full;

Hardware failure

Losing a zone is not a disaster. It’s resurrecting it that really hurts.

With such amount of drives you should automate things. We’ve written a tool automatically requesting datacenter engineers to replace dead drives.

“1 copy left” is an ultimate alert

Hardware failureMean 95th percentile

Reads per second 100/88/65 48/38/9

Writes per second 100/76/36 24/18/7

Mean 99.95% SLA failures

Failed reads 0/0/0.03 0/0/45%

Failed writes 0/0/0.04 0/0/63%

5 healthy zones / 4 healthy zones / 4 healthy and 1 recovering zone

Intensive rebalancing yields to 3-5 times increase in response time. Without it, 20% of requests are served slower

SQLite

Container and account data is stored in SQLite

SSD saves the situation: on SATA you can’t have morethan 1 mln. of objects in container. With SSD, 100 mln.is not the limit (using proper naming)

1 TB of SSD per 1 PB of SATA costs nothingand enhances performance dramatically

Operational notes

Replication traffic and client’s traffic co-existin the same network and no QoS is possible

Internal network and CPU should beunder-used in a normal mode

Post-accident recovery and scaling mustbe very gradual

Dallas

AmsterdamMoscow

Luxembourg

Hyderabad

Singapore

Synchronisation

Syncing

By default, syncing runs in a single-threadper region;

We’ve completely re-written it and madeit multi-threaded;

Than we’ve added monitoring to sync tooland a watchdog to control over stuck syncs;

For deletes we have a synchronous mode –instead of eventual replication, this oneis truly synchronous

Content distribution vs. backup storage

Economically - it is a match made in heaven;

Technically - you have to process both GETsand PUTs efficiently;

Content distribution requires accounting –a high-performance tool, analysing trafficat an application level.

Content distribution vs. backup storage

Keystone

BillingHAProxy

Delete Proxy

GET, HEAD

Authentication

Endpoint: HAProxy

ACCOUNTING DATAGET,HEAD

PUT, POSTDELET

E

DELETE

nginxswift-proxy

FTP

FTP is not dead, unfortunately. FTP is still by farthe most popular upload path (thanks to legacy code)

ftp-cloudfs works and scales horizontally – you canthink of object storage as of unlimited FTP;

We’ve added large files deletion and renaming, part-filehiding and returned it back to community;

Backup tools

Duplicity

Veeam

Cloudberry Backup

Nick Dvas

COO @ Servers.com

E-mail: [email protected] Skype: dvas.nicholas Phone: +357 99 32 28 16

Edge Node

Origin

Edge Node

Origin

Content protection

Data

.Trash

Data-versions

.Trash-versions