44
WINDOWS SERVER 2016 Microsoft Storage Spaces Direct – the future of Hyper-V and Azure Stack (EN) Carsten Rachfahl Microsoft Cloud & Datacenter MVP Microsoft Regional Director Germany

WINDOWS SERVER 2016 Microsoft Storage Spaces Direct … · WINDOWS SERVER 2016 Microsoft Storage Spaces Direct –the future of Hyper-V and Azure Stack (EN) Carsten Rachfahl Microsoft

Embed Size (px)

Citation preview

WINDOWS SERVER

2016

Microsoft Storage Spaces

Direct – the future of Hyper-V

and Azure Stack (EN)

Carsten Rachfahl

Microsoft Cloud & Datacenter MVP

Microsoft Regional Director Germany

WINDOWS SERVER

2016

Carsten RachfahlMicrosoft CDM MVP

Microsoft Reginal Director

Organisator of the Cloud & Datacenter

Conference Germany http://cdc-gemany.de

@hypervserver

one of the Hyper-V Amigos

I blog, do screencast and interviews at

https://www.hyper-v-server.de

WINDOWS SERVER

2016 Agenda

• S2D overview

• S2D in depth

• Deployment options

• Performance

• S2D in the {X} Stack

• Q&A

WINDOWS SERVER

2016

WINDOWS SERVER

2016

Storage Overview

WINDOWS SERVER

2016

Storage Array

Traditional Storage Array

Compute

Fibre Channel / iSCSI / FCoE / SAS

WINDOWS SERVER

2016

Storage Array

Disks

Traditional Storage Array

Backplane

Compute

Fibre Channel / iSCSI / FCoE / SAS

Controller Controller

Storage Software Storage Software

WINDOWS SERVER

2016

Scale-out File Server

Enclosure (JBOD)

Shared Storage Spaces

SAS

Compute

SMB3

Storage Software

WINDOWS SERVER

2016

Scale-out File Server

Enclosure (JBOD)

Shared Storage Spaces

SAS

Compute

SMB3

Storage Software

WINDOWS SERVER

2016

WINDOWS SERVER

2016

S2D Overview

WINDOWS SERVER

2016

Scale-out File Server

Enclosure (JBOD)

Shared Storage Spaces

SAS

Compute

SMB3

Storage Software

WINDOWS SERVER

2016

Scale-out File Server

Converged with Storage Spaces Direct

Compute

SMB3

Storage Software

WINDOWS SERVER

2016

Hyper-converged with Storage Spaces Direct

Compute and Storage

Storage Software

WINDOWS SERVER

2016Microsoft Storage Spaces Direct

What is Storage Spaces Direct?

■ Software-defined storage

■ Highly available and scalable

■ Storage for Hyper-V and Private Cloud

Why Storage Spaces Direct?

■ Servers with local storage

■ Industry standard hardware

■ Lower cost flash with SATA SSDs

■ Better flash performance with NVMe SSDs

■ Ethernet/RDMA network as storage fabric

Hyper-V cluster with local attached storage

WINDOWS SERVER

2016

WINDOWS SERVER

2016

S2D in Depth

WINDOWS SERVER

2016Storage Stack

File System (CSVFS with ReFS)

■ Fast VHDX creation, expansion and checkpoints

■ Cluster-wide data access

Storage Spaces

■ Scalable pool with all disk devices

■ Resilient virtual disk

Software Storage Bus

■ Storage Bus Cache

■ Leverages SMB3 and SMB Direct

Servers with local disks

■ SATA, SAS and NVMe

Storage Spaces Storage Pool

Storage Spaces Virtual Disk

Scale-Out File Server

CSVFS Cluster File System

Software Storage Bus

Virtual Machines

2 1

ReFS On-Disk File System

Virtual Machines

Storage Spaces Virtual Disk

2

1

Converged

Hyper-Converged

SMB 3

WINDOWS SERVER

2016Software Storage Bus

Virtual storage bus spanning all servers

Virtualizes physical disks and enclosures

Consists of:

■ Clusport: Initiator (virtual HBA)

■ ClusBflt: Target (virtual disk / enclosures)

SMB3/SMB Direct transport

■ RDMA enabled networks for latency and CPU

Bandwidth management

■ Fair device access from any server

■ IO prioritization (App vs System)

■ De-randomization of random IO

■ Drives sequential IO pattern on rotational media

ClusPort

SpacePort

Virtual Disks

File System

Cluster Shared Volumes File System (CSVFS)

Application

Node 1

Block over SMB

ClusBflt

Node 2

Physical Devices

ClusPort

SpacePort

Virtual Disks

File System

ClusBflt

Physical Devices

WINDOWS SERVER

2016Built-In Cache

Integral part of Software Storage Bus

Cache scoped to local machine

Agnostic to storage pools and virtual disks

Automatic configuration when enabling S2D

■ Special partition on each caching device

■ Leaves 32GB for pool and virtual disks metadata

■ Round robin binding of SSD to HDD

■ Rebinding with topology change

Cache behavior

■ All writes up to 256KB are cached

■ Reads of 64KB or less are cached on first miss

■ Reads of 64+ KB are cached on second miss (<10 minutes)

■ Sequential reads of 32+KB are not cached

■ Write cache only on all flash systems

Caching Devices

Capacity Devices

WINDOWS SERVER

2016Storage Pool

Metadata on select devices

■ Improves pool scalability

■ Improved pool update performance

Device selection

■ Faster media is preferred

■ Metadata on up to 10 devices

■ Evenly spread across fault domain

■ Dynamic update on node or device failure

Potential metadata

devices

WINDOWS SERVER

2016Volume Types

■ Performance volumes (mirror)

■ Usually 3-way Mirror or 2-way Mirror

■ Capacity volumes (parity)

■ Should be double Parity

■ Hybrid volumes

■ hybrid of 3-way mirror and double parity

Mirror Parity

Mirror

Parity

WINDOWS SERVER

2016

Server 1

X1

Server 2

X2

Server 3

PX

Server 4

Y1

Server 5

Y2

Server 6

PY

Server 7

Q

Server 8

Hybrid VolumesVolume with mirror and parity

Requires at least 4 nodes

Requires ReFS

Mirror for hot data

Optimized for write performance

Little CPU or storage churn

Parity for cold data

Erasure coding storage efficiency

CPU or storage churn only on cold data

Local Reconstruction Codes (LRC) algorithm

Nodes Mirror

Efficiency

Parity

Efficiency

SSD + HDD

Parity

Efficiency

All-Flash

Resiliency

4 33% 50% 50% 2 node

8 33% 66% 66% 2 node

12 33% 72% 75% 2 node

16 33% 72% 80% 2 node

A A’ A’’B B’ B’’Mirror

Parity

WINDOWS SERVER

2016LRC data reconstruction

Most common failure is 1 fault domain

1 disk failure (X2)

■ Read X1 and PX

■ Recalculate X2

■ Write X2 to different disk

■ Total of 2 reads and 1 write

Traditional Reed Solomon

■ 4 data, 2 parity

■ Total of 4 reads and 2 write

LRC requires 50% less disk IO

Tolerant to failure of 2 fault domains

2 disk failure (X1 and X2)

■ Read PX, Y1, Y2 and Q

■ Recalculate and write X1 to a different disk

■ Recalculate and write X2 to a different disk

■ Total of 4 reads and 2 writes

Traditional Reed Solomon

■ 4 data and 2 parity

■ Total of 4 reads and 2 writes

LRC and RS requires the same disk IO

Server 1

X1

Server 2

X2

Server 3

PX

Server 4

Y1

Server 5

Y2

Server 6

PY

Server 7

Q

Server 8

WINDOWS SERVER

2016ReFS Real-Time Tiering

Writes go to mirror tier (hot data)

Rotate data into parity tier as needed (cold data)

Erasure Code calculation only on rotation

Updates to data stored in parity tier

■ Updated data is written to mirror tier

■ Old data in parity tier is invalidated (metadata operation)

Mirror tier Parity tier

W

ReFS

WINDOWS SERVER

2016ReFS VM Optimizations

Basics

■ Metadata checksums with optional user data checksum

■ Data corruption detection and repair

■ On-volume backup of critical metadata with online

repair

Efficient VM Checkpoints and Backup

■ VHD(X) checkpoints cleaned up without physical data

copies

■ Data migrated between parent and child VHD(X) files as

a ReFS metadata operation

Reduction of I/O to disk

Increased speed

■ Reduces impact of checkpoint clean-up to foreground

workloads

Accelerated Fixed VHD(X) Creation

■ Fixed VHD(X) files zeroed with metadata

operations

■ Minimal impact on workloads

■ Decreases VM deployment time

Quick Dynamic VHD(X) Expansion

■ Dynamic VHD(X) files zeroed with metadata

operations

■ Minimal impact on workloads

■ Reduces latency spike for foreground

workloads

WINDOWS SERVER

2016

WINDOWS SERVER

2016

S2D Deployment Options

WINDOWS SERVER

2016Scale

2 node (minimum)

■ Only 2-way Mirror

4 node to 16 node (maximum)

■ 2-way and 3-way mirror

■ Parity possible

■ Hybrid Disk

3 node

■ 2-way and 3-way mirror

With 16 nodes a

Maximum of 416 devices

Minimum 6 devices

(2 cache + 4 capacity drives)

WINDOWS SERVER

2016Deployment Options

SQL 2016 and storage resources together

Easy deployment and management (I hope )sql sql sql sql

WINDOWS SERVER

2016Deployment Options

Compute and storage resources together

Easy deployment and management

WINDOWS SERVER

2016Deployment Options

Hyper-ConvergedCompute and storage resources together

Easy deployment and management

Compute and storage resources separate

Scaling for larger deployments

SMB 3 Fabric

WINDOWS SERVER

2016

Vendors who are commited

to Storage Spaces Direct

WINDOWS SERVER

2016

DELL PowerEdge R730XD

HPE ProLiant DL380 Gen9

Cisco UCS C240 M4

Intel MCB2224TAF3

Quanta D51B-2U (MSW6000)

DataON S2D-3110

Fujitsu Primergy RX2540 M2 Inspur NF5280M4

Lenovo X3650 M5 NEC Express5800 R120f-2M

RAID Inc. Ability™ HCI Series S2D200 SuperMicro SYS-2028U-TRT+

WINDOWS SERVER

2016

Mini-ITX Motherboard

Intel Xeon E3v5 1235L 4C 2.00 GHz

2 x 16 GB ECC DDR4

6 x 4TB SATA HDDUSB3 DOM 2 x 200GB SATA SSD

2 Node PoC Project Kepler-47

WINDOWS SERVER

2016

Mini-ITX Motherboard

Intel Xeon E3v5 1235L 4C 2.00 GHz

2 x 16 GB ECC DDR4

6 x 4TB SATA HDDUSB3 DOM 2 x 200GB SATA SSD

U-NAS

NSC-800

2 Node PoC Project Kepler-47

WINDOWS SERVER

20162 Node PoC Project Kepler-47

WINDOWS SERVER

2016

Server and drive fault tolerance

20+ TB of mirrored storage capacity

50+ GB of memory for 5-10 mid-sized VMs

Great for remote/branch office!

2 Node PoC Project Kepler-47

WINDOWS SERVER

2016

WINDOWS SERVER

2016

S2D Performance

WINDOWS SERVER

2016

Microsoft and Intel showcase at IDF’15

Load Profile Total IOPS IOPS/Server

100% 4K Read 4.2M IOPS 268K IOPS

90%/10% 4K Read/Write 3.5M IOPS 218K IOPS

70%/30% 4K Read/Write 2.3M IOPS 143K IOPS

Showcase Hardware16 Intel® Server System S2600WT(2U) nodes

• Dual Intel® Xeon® processor E5-2699 v3 Processors

• 128GB Memory (16GB DDR4-2133 1.2V DR x4 RDIMM)

Storage per Server

• 4 - Intel® SSD DC P3700 Series (800 GB, 2.5” SFF)

• Boot Drive: 1 Intel® SSD DC S3710 Series (200 GB, 2.5” SFF)

Network per server

• 1 Chelsio® 10GbE iWARP RDMA Card (CHELT520CRG1P10)

• Intel® Ethernet Server Adapter X540-AT2 for management

Load Generator (8 VMs per Compute Node => 128 VMs)

• 8 virtual cores and 7.5 GB memory

• DISKSPD with 8 threads and Queue Depth of 20 per thread

WINDOWS SERVER

2016Performance Video auf Channel9

Configuration:

4x Dell R730XD

2x Xeon E5-2660v3 2.6Ghz (10c20t)

256GB DRAM (16x 16GB DDR4 2133 MHz DIMM)

4x Samsung PM1725 3.2TB NVME SSD (PCIe 3.0 x8 AIC)

Dell HBA330 ■ 4x Intel S3710 800GB SATA SSD

■ 12x Seagate 4TB Enterprise Capacity 3.5” SATA HDD

2x Mellanox ConnectX-4 100Gb (Dual Port 100Gb PCIe 3.0 x16) ■ Mellanox FW v. 12.14.2036

■ Mellanox ConnectX-4 Driver v. 1.35.14894

■ Device PSID MT_2150110033

■ Single port connected / adapter

WINDOWS SERVER

2016My own Benchmarks

Benchmark:

■ Microsoft VMFleet with 60 VMs on 4 Nodes

■ Diskspd testing 64kb Blöcke and

70% Read / 30% Write

Top: Fujitsu

2x E5-2680 CPUs with 2x 800GB NVMe + 4x

1.9TB SSD

Mid: Dell

2x E5-2640 with 18x 800GB SSDs

Bottom: HPE 2x E5-2660 2x 800GB SSDs + 4x 4TB HDD

WINDOWS SERVER

2016

WINDOWS SERVER

2016

S2D in the {X} Stack

WINDOWS SERVER

2016Azure Stack Integrated System

BMC Switch

ToR Switch

ToR Switch

Architecture, hardware, and topology

Security and privacy

Deployment, configuration, provisioning

Validation Monitoring, diagnostics

Business continuity

Patching and updating

Field replacement of parts

WINDOWS SERVER

2016S2D in Open Stack

Based on

S2D

WINDOWS SERVER

2016

WINDOWS SERVER

2016

Q&A

WINDOWS SERVER

2016

WINDOWS SERVER

2016

Thank you!

WINDOWS SERVER

2016

<Volgende sessie 16:00 – 17:00 uur>

Get your work/life balance in check with Hyper-V 24/7/365 High Availability

Didier Van Hoye

Microsoft Cloud & Datacenter Management