22
Building AuroraObjects

Building AuroraObjects- Ceph Day Frankfurt

  • Upload
    inktank

  • View
    1.583

  • Download
    3

Embed Size (px)

DESCRIPTION

Wido den Hollander, 42on.com

Citation preview

Page 1: Building AuroraObjects- Ceph Day Frankfurt

Building AuroraObjects

Page 2: Building AuroraObjects- Ceph Day Frankfurt

Who am I?

● Wido den Hollander (1986)● Co-owner and CTO of a PCextreme B.V., a

dutch hosting company● Ceph trainer and consultant at 42on B.V.● Part of the Ceph community since late 2009

– Wrote the Apache CloudStack integration

– libvirt RBD storage pool support

– PHP and Java bindings for librados

Page 3: Building AuroraObjects- Ceph Day Frankfurt

PCextreme?

● Founded in 2004● Medium-sized ISP in the Netherlands● 45.000 customers● Started as a shared hosting company● Datacenter in Amsterdam

Page 4: Building AuroraObjects- Ceph Day Frankfurt

What is AuroraObjects?

● Under the name “Aurora” my hosting company PCextreme B.V. has two services:– AuroraCompute, a CloudStack based public cloud

backed by Ceph's RBD

– AuroraObjects, a public object store using Ceph's RADOS Gateway

● AuroraObjects is a public RADOS Gateway service (S3 only) running in production

Page 5: Building AuroraObjects- Ceph Day Frankfurt

The RADOS Gateway (RGW)

● Service objects using either Amazon's S3 or OpenStack's Swift protocol

● All objects are stored in RADOS, the gateway is just a abstraction between HTTP/S3 and RADOS

Page 6: Building AuroraObjects- Ceph Day Frankfurt

The RADOS Gateway

Page 7: Building AuroraObjects- Ceph Day Frankfurt

Our ideas

● We wanted to cache frequently accessed objects using Varnish– Only possible with anonymous clients

● SSL should be supported● Storage between Compute and Objects

services shared● 3x replication

Page 8: Building AuroraObjects- Ceph Day Frankfurt

Varnish

● A caching reverse HTTP proxy– Very fast

● Up to 100k requests/s

– Configurable using the Varnish Configuration Language (VCL)

– Used by Facebook and eBay

● Not a part of Ceph, but can be used with the RADOS Gateway

Page 9: Building AuroraObjects- Ceph Day Frankfurt

The Gateways

● SuperMicro 1U– AMD Opteron 6200 series CPU

– 128GB RAM

● 20Gbit LACP trunk● 4 nodes● Varnish runs locally with RGW on each node

– Uses the RAM to cache objects

Page 10: Building AuroraObjects- Ceph Day Frankfurt

The Ceph cluster

● SuperMicro 2U chassis– AMD Opteron 4334 CPU

– 32GB Ram

– Intel S3500 80GB SSD for OS

– Intel S3700 200GB SSD for Journaling

– 6x Seagate 3TB 7200RPM drive for OSD

● 2Gbit LACP trunk● 18 nodes● ~320TB of raw storage

Page 11: Building AuroraObjects- Ceph Day Frankfurt

Our problems

● When we cache Objects in Varnish, they don't show up in the usage accounting of the RGW– The HTTP request never reaches RGW

● When a Object changes we have to purge all caches to maintain cache consistency– User might change a ACL or modify a object with a

PUT request

● We wanted to make cached requests cheaper then non-cached requests

Page 12: Building AuroraObjects- Ceph Day Frankfurt

Our solution: Logstash

● All requests go from Varnish into Logstash and into ElasticSearch– From ElasticSearch we do the usage accounting

● When Logstash sees a PUT, DELETE or PUT request it makes a local request which sends out a multicast to all other RGW nodes to purge that specific object

● We also store bucket storage usage in ElasticSearch so we have an average over the month

Page 13: Building AuroraObjects- Ceph Day Frankfurt

Our solution: Logstash

● All requests go from Varnish into Logstash and into ElasticSearch– From ElasticSearch we do the usage accounting

● When Logstash sees a PUT, DELETE or PUT request it makes a local request which sends out a multicast to all other RGW nodes to purge that specific object

● We also store bucket storage usage in ElasticSearch so we have an average over the month

Page 14: Building AuroraObjects- Ceph Day Frankfurt

LogStash and ElasticSearch

● varnishncsa → logstash → redis → elasticsearchinput {

pipe {

command => "/usr/local/bin/varnishncsa.logstash"

type => "http"

}

}

● And we simply execute varnishncsavarnishncsa -F '%{VCL_Log:client}x %{VCL_Log:proto}x %{VCL_Log:authorization}x %{Bucket}o %m %{Host}i %U %b %s %{Varnish:time_firstbyte}x %{Varnish:hitmiss}x'

Page 15: Building AuroraObjects- Ceph Day Frankfurt

%{Bucket}o?

● With %{<header>}o you can display the output of the return header <header>:– %{Server}o: Apache 2

– %{Content-Type}o: text/html

● We patched RGW (is in master) that it can optionally return the bucket name in the response:200 OK

Connection: close

Date: Tue, 25 Feb 2014 14:42:31 GMT

Server: AuroraObjects

Content-Length: 1412

Content-Type: application/xml

Bucket: "ceph"

X-Cache-Hit: No

● 'rgw expose bucket = true' in ceph.conf returns Bucket

Page 16: Building AuroraObjects- Ceph Day Frankfurt

Usage accounting

● We only query RGW for storage usage and also store that in ElasticSearch

● ElasticSearch is used for all traffic accounting– Allows us to differentiate between cached and

non-cached traffic

Page 17: Building AuroraObjects- Ceph Day Frankfurt

Back to Ceph: CRUSHMap

● A good CRUSHMap design should reflect the physical topology of your Ceph cluster– All machines have a single power supply

– The datacenter has a A and B powercircuit● We use a STS (Static Transfer Switch) to create a third

powercircuit

● With CRUSH we store each replica on a different powercircuit– When a circuit fails, we loose 2/3 of the Ceph cluster

– Each powercircuit has it's own switching / network

Page 18: Building AuroraObjects- Ceph Day Frankfurt

The CRUSHMaptype 7 powerfeed

host ceph03 {

alg straw

hash 0

item osd.12 weight 1.000

item osd.13 weight 1.000

..

}

powerfeed powerfeed-a {

alg straw

hash 0

item ceph03 weight 6.000

item ceph04 weight 6.000

..

}

root ams02 {

alg straw

hash 0

item powerfeed-a

item powerfeed-b

item powerfeed-c

}

rule powerfeed {

ruleset 4

type replicated

min_size 1

max_size 3

step take ams02

step chooseleaf firstn 0 type powerfeed

step emit

}

Page 19: Building AuroraObjects- Ceph Day Frankfurt

The CRUSHMap

Page 20: Building AuroraObjects- Ceph Day Frankfurt

Testing the CRUSHMap

● With crushtool you can test your CRUSHMap● $ crushtool -c ceph.zone01.ams02.crushmap.txt -o /tmp/crushmap

● $ crushtool -i /tmp/crushmap --test --rule 4 --num-rep 3 –show-statistics

● This shows you the result of the CRUSHMap:rule 4 (powerfeed), x = 0..1023, numrep = 3..3

CRUSH rule 4 x 0 [36,68,18]

CRUSH rule 4 x 1 [21,52,67]

..

CRUSH rule 4 x 1023 [30,41,68]

rule 4 (powerfeed) num_rep 3 result size == 3: 1024/1024

● Manually verify those locations are correct

Page 21: Building AuroraObjects- Ceph Day Frankfurt

A summary

● We cache anonymously accessed objects with Varnish– Allows us to process thousands of requests per

second

– Saves us I/O on the OSDs

● We use LogStash and ElasticSearch to store all requests and do usage accounting

● With CRUSH we store each replica on a different power circuit

Page 22: Building AuroraObjects- Ceph Day Frankfurt

Resources

● LogStash: http://www.logstash.net/● ElasticSearch: http://www.elasticsearch.net/● Varnish: http://www.varnish-cache.org/● CRUSH: http://ceph.com/docs/master/

● E-Mail: [email protected]● Twitter: @widodh