27
The Road to a Hyper-Converged OpenStack OpenStack Israel, June 2015 Muli Ben-Yehuda Chief Scientist

Muli Ben-Yehuda, Stratoscale - The Road to a Hyper-Converged OpenStack, OpenStack Israel 2015

Embed Size (px)

Citation preview

The Road to a Hyper-Converged OpenStackOpenStack Israel, June 2015

Muli Ben-YehudaChief Scientist

A Brief History of TimeDatacenter Architectures

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Standalone “Converged” Servers

PerformanceReliabilityPrice

• Reliability (locality)• Manageability• Efficiency

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Split Infrastructure

LAN

SAN

PerformanceReliabilityPrice

• VM Admin• Storage Admin• Expensive Fabric• Expensive Appliances

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Hyper-Converged Infrastructure

PerformanceReliabilityPrice

PerformanceReliabilityPrice

Every node is both a compute nodeand a storage node

The InterConnect commoditization allowseveryone to build a cluster of servers

A Briefer History of TimeHyper-Converged Infrastructure

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

The Recipe for 1st Gen Hyper-Convergence

Storage

RAM DISK NIC

ComputeCompute

Two Black Boxes:

1) Control Plane2) Data Plane

• Distributed Storage• Virtualization

But, Shared Fabric

Let’s build a GreatHyper-Converged SolutionBased on OpenStack

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Features/Benefits

1. Software-Only

2. Run Anything (VMs & Containers)

3. Store Everything (Enterprise-grade Storage)

4. Single Infrastructure (Anti-Silo, Cloud-like)

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

Hyper-Converged Control and Data Planes

WORKLOAD WORKLOADWORKLOAD

1. Performant Data Center/Cloud

2. Efficient Resource Utilization

3. Single Pane of Glass (Manageability)

4. Scalability & Reliability

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Considerations

Failure Domain

Storage dictates the Failure Domain

Hardware Heterogeneity

Initial cluster needs to have similar node types so storage is balanced equally across nodes

Large deployments: Heterogeneity works well; e.g. Blades for compute and large servers for storage

Storage Pools can be used to separate failure domains as well:

Flash and HDD – speed/density of storage

Persistent and ephemeral storage // cold and hot: Smarter allocation of the physical storage yet adhering to failure domain rules.

Pass-through to Directly-Attached-Storage (for ephemeral use-cases optimizations)

Containers

Most Containers are stateless/ephemeral

Performance:

Serving storage requires CPU cycles, memory and bandwidth.

A disproportionate storage node should be configured with more CPU capacity, more memory & NICs

Topology

A disproportionate storage node will be hit with more storage requests so there are topology considerations too

The Building Blocks(a bit more than that, actually)

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

The Sub-Systems

Storage

Compute

Networking

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Storage

Server-side Storage

Performant & Scalable (No Meta Data)

Predictable Performance

Robust & Resilient

Heterogeneity (SSD/HDD)

Storage Tiering

Predominately Volume-Only Management

Fine-Grained Control of the system

Virtual

Node Node Node Node Node

Physical

Local mount point

Workload

Single Name Space

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Compute

Predictable Performance

Robust & Resilient

Heterogeneity (Windows/Linux/Containers)

Workload SLAs

Scalable (Eliminate Meta Data Bottlenecks)

Predominately VM-Only Management

Fine-Grained Control of the system

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Inter-Connect (It’s Shared!)

The Control Plane(or, how things are really done)

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Traits of the Solution

1.Scalable Installation Process

2.Distributed Systems Best Practices (Control Plane)

1.No single point of failure

2.Service Discovery

3.Load Balancing

4.Self-Healing

5.Eventual Consistency

3.Cluster-Wide Resource Load-Balancing

1.Interference

2.Contention

3.Optimizations

4.Managing Interference (Analytics)

5.Multiple Data Centers

Run Anything

Software Driven

VM

Store everything

Single infrastructure

Developer Friendly

Open Platform

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Single Image

Management plane not dedicated to a specific VM or bare-metal server

No sizing exercise to figure out how to deploy management systems

Consensus based decision making

Even in the event that more than half the cluster was lost, all management processes are still available!

All nodes running a subset of the process required for the cloud to function

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Distributed

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

ServerServerServerServerServer

Distributed Storage

VMVirtual disk

Virtual disk

VirtIO Driver

VMVirtual disk

Virtual disk

VirtIO Driver

VMVirtual disk

Virtual disk

VirtIO Driver

Block Storage Server – runs on all physical servers

Block Storage Client – runs on all physical serversBlock Storage Management

Storage Pool

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

QoS & Traffic aggregation

WorkloadSLA -

min/max bandwidth

WorkloadSLA - max latency

WorkloadSLA -

absolute priority

Block Storagerequires

bandwidth

Re-Balancingrequires min

latency

Rate Shaper Link SchedulerPolicy Arbiter

Traffic Aggregater

High-Speed Data Path

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Topology/Failure Domain

Rack A/Datacenter 1 Rack B/Datacenter 1 Rack C/Datacenter 2

“Rack-Scale Computing”

failure should notcreate havoc

Async Replication

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Cluster Load-Balancing & Workload Profiling

Admission

Profiling

ClassificationEngine

Running

PlacementEngine

Workload

● Analytics Layer● Collecting time-series

based performance metrics

● CPU Scheduling● Throttling Networking● Low Latency Live-

Migration

Copyright 2015. Confidential – Distribution prohibited without permission and subject to Non-Disclosure Agreement

Config Space

Consul.io

• Consensus based• Key/Value• Exposed as DNS• Provides Local Cache• Used for HA• Used for LB• Problem: Strongly Consistent

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

Post Copy Live Migration (PCLM)

Mechanism that Stratoscale used to migrate VMs from one node to another node.

Only moves the “Working Set” memory, that is, the pages that are actively being utilized by CPU threads at that time.

Moves very small amounts of memory at a timeExample, 200MB of RAM now, then the rest of the memory images at a later

time: Network has freed up, Page fault over the network

Wrapping Up

® Copyright 2015 www.stratoscale.com @stratoscale +1 877 420 3244

To Conclude

1.Software-Only

2.Run Anything (VMs & Containers)

3.Store Everything (Enterprise-grade Storage)

4.Single Infrastructure (Anti-Silo, Cloud-like)

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

RAM DISK NIC

SERVER

Hyper-Converged Control and Data Planes

WORKLOAD WORKLOADWORKLOAD

1.Performant Data Center/Cloud

2.Efficient Resource Utilization

3.Single Pane of Glass (Manageability)

4.Scalability & Reliability