Architecting An Enterprise Storage Platform Using Object Stores

Preview:

DESCRIPTION

Presented at SNIA SDC 2013

Citation preview

Architecting an Enterprise Storage

Platform Using Object Stores

© mekuria getinet / www.mekuriageti.net

Niraj Tolia

Chief Architect, Maginatics

@nirajtolia

A Whirlwind Tour

Awesome Questions == Awesome T-shirts

80% YoY Growth in Unstructured Data

41% Growth in IaaSSystems through 2016

Sources:

Gartner, IT Marketing Clock for Storage, Sep 2011

Gartner, Forecast Overview: Public Cloud Services, Worldwide, 2011-2016, Feb 2013

MagFS –The File System for the Cloud

Consistent, Elastic, Secure, Mobile-Enabled

Layered on Object Stores

“Software-Defined”

No (Initial) Legacy Support (NFS/CIFS)

Native Clients: Push Intelligence to Edges

Strong Consistency w/ Full-Spectrum Caching

File System Design Goals

Low Cost, High Scale

Intelligent Clients

Span Devices and Networks

Support Rapid Iteration

In-CloudFile System

NAS Replacement and Consolidation

Enterprise File Sharing

Use Cases

Object Storage(public, on-premises, or hybrid)

Data

Metadata

Metadata Servers

Clients

10,000 Foot View

Koukouvaya / flickr.com/photos/jackoughton/6535137981/

Heavy (Data) Lifting via Clients

Encryption

Inline Deduplication

Compression

Persistent Data Caching

Bulk Data Transfers

Cloud Object Storage

Scale Out, Low Cost

Handles Placement + Replication

Tolerates Failures

High Aggregate Performance

Virtualized Metadata Servers

Enforce Strong Consistency

Enforce Authentication and Integrity

Runtime Performance Optimization

Share-level Deduplication

Data Scrubbing & Garbage Collection

Architecture

Client

Architecture

Client Architecture

Application

Redirector

(e.g., FUSE)

File System

OS Glue

Data Manager

Metadata Transport

Layer

Local Remote

Userspace

Kernel

Deduplication Encryption Compression

Locking Leases

Data Manager

File System Layer

Simplified Write: Deduplication + Encryption

Write Request

Plaintext

Variable-Length

Chunking

Encrypted Text (E)

AES-256 (K)

Object Name (N)SHA-256

Local Cache Remote Transfer

Encryption Key (K)SHA-256

Data Manager

File System Layer

Simplified Write: Deduplication + Encryption

Write Request

Plaintext

Variable-Length

Chunking

Encrypted Text (E)

AES-256 (K)

Object Name (N)SHA-256

<File, Offset, N, K>

Optional(<URI>)Local Cache Remote Transfer

<N, E>

<URI, E>

No Encryption Keys

in the Cloud

No Encryption Keys

in Local Cache

Encryption Key (K)SHA-256

<E>

Data Manager

File System Layer

Simplified Read: Deduplication + Encryption

Read Request

<File, Offset, Range>

Local Cache Remote Transfer

<N, URI>

Encryption Key (K)

<N, K, URI>

Encrypted Text (E)

<E>

<URI>

<E>

<URI>

<E>

Plaintext

AES-256 (K)

The Client in Real Life Does a Lot More!

• File and Directory Leases (data and metadata caching)

• Asynchronous Operations (including writes)

• Operation Compounding

• Runtime Optimizations (e.g., read ahead)

• Optimizing for High Bandwidth Delay Product (BDP)

• …

Object Storage(public, on-premises, or hybrid)

Data

Metadata

Metadata Servers

Clients

Communication Details

Thrift

(HTTPS)

REST

(HTTPS)

Server

Architecture

Metadata Server Internals

Metadata Storage Layer

Storage Core

Backups

Production Development

GC

Scrubbing

Quotas Dedup Leases Security

HA

MagFS

Ext. Sharing

Multi-Cloud Versioning Offline Mode

Cloud Abstraction Layer

Legend

Bootstrapping: Virtualized Namespaces

\\server.example.com\share

HOST FQDN FOLDER

Legacy

\\server.example.com\shareMagFS

Dynamic mapping to host:port

Discovery Service

Metadata

Server

Metadata

Server (HA)

Metadata

Server

ZooKeeper

ZooKeeperZooKeeper

MonitoringManagement

Console

Config +

Scheduler

Virtual Filer Host:Port Mapping

Leases: Performance and Strong Consistency

Read Write HandleLease Types

ReadRead + Handle

Read + Write + Handle

Lease States

Valid File Leases

Valid Directory Leases

Cloud Storage

Interaction

Object Storage(public, on-premises, or hybrid)

Object Storage systems

are like snowflakes!

Object Store API Compatibility

Q: Has anyone come across a near 100%

Amazon S3 API compatible object storage

system?

A: It is hard to find a near-100% compatible

product…

- Vendor w/ S3 Compatible Product

Object Storage(public, on-premises, or hybrid)

Data

Metadata

Metadata Servers

Clients

Direct Client Access: Security Problem?

Request Signing

Server-Driven Request Signing

SignString = HTTP-Verb + "\n"

+ Content-MD5 + "\n"

+ Content-Type + "\n"

+ Date + "\n"

+ Resource + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ Content-MD5 + "\n"

+ Content-Type + "\n"

+ Date + "\n"

+ Resource + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ Content-Type + "\n"

+ Date + "\n"

+ Resource + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ image/jpeg + "\n"

+ Date + "\n"

+ Resource + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ image/jpeg + "\n"

+ Tue, 11 Jun 2013 00:27:41 + "\n"

+ Resource + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ image/jpeg + "\n"

+ Tue, 11 Jun 2013 00:27:41 + "\n"

+ /container/example.jpeg + "\n"

+ ...

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ image/jpeg + "\n"

+ Tue, 11 Jun 2013 00:27:41 + "\n"

+ /container/example.jpeg + "\n"

+ ...

HMAC-SHA1( , SignString)

Server-Driven Request Signing

SignString = PUT + "\n"

+ 07BzhNET7exJ6qYjitX/AA== + "\n"

+ image/jpeg + "\n"

+ Tue, 11 Jun 2013 00:27:41 + "\n"

+ /container/example.jpeg + "\n"

+ ...

Signature = Base64(HMAC-SHA1( , SignString))

Object Storage(public, on-premises, or hybrid)

Data

Metadata

Metadata Servers

Clients

Safe Direct Client Access via Request Signing

1. Read/Write Request

3. HTTP Request +

Signature +

Encrypted Data

2. HTTP Request + Signature

Dealing with Lost Client Writes

• Clients can lose connectivity or, in the worst case, be malicious

• Naïvely trusting client writes can “corrupt” w/ global dedup

• MagFS server scrubs all writes:• Client acknowledges write

• Server verifies object existence (object store performed MD5 at PUT)

• Server can also read and verify object data (stronger SHA-256 check)

• The object will be available for deduplication only after scrubbing

Handling Object Store Eventual Consistency

• Treat objects as immutable (even if modifications are allowed)

• Use content-based names (generated using cryptographic hashes)

• Tombstone names after Garbage Collection• Suffix generation number to content-based names in case of resurrection

Security

Architecture

Recap: On-Premises Security Model

• User authentication and permissions derived from native Active Directory setup

• Encryption keys are never exposed to the cloud

• Data and metadata is always encrypted: At-Rest and In-Flight

Slides (with speaker notes) at http://tolia.org

Try MagFS at http://maginatics.com

Recommended