34
Expert Panel: Cloud Storage Initiatives: A Storage Developer Conference Preview August 4, 2015

Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Expert Panel: Cloud Storage Initiatives:

A Storage Developer Conference Preview

August 4, 2015

Page 2: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved. 2

David Slik – SNIA Cloud Storage TWG

Co-Chair NetApp

Today’s Presenters

Mark Carlson Cloud Storage TWG Co-Chair

Toshiba

Yong Chen Assistant Professor Texas Tech

University

Alex McDonald, SNIA Cloud Storage

Initiative Chair - NetApp

Page 3: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

SNIA Legal Notice

r  The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted.

r  Member companies and individual members may use this material in presentations and literature under the following conditions: r  Any slide or slides used must be reproduced in their entirety without modification r  The SNIA must be acknowledged as the source of any material used in the body of any

document containing material from these presentations. r  This presentation is a project of the SNIA Education Committee. r  Neither the author nor the presenter is an attorney and nothing in this

presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney.

r  The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK.

3

Page 4: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Storage Developer Conference 2015

r  Santa Clara, CA – September 21-24, 2015 r  The conference for storage developers since 1998

r 400 developers, engineers and technical professionals worldwide from the storage community

r  Three focused tracks dedicated to disruptive technologies: r Persistent Memory r Object Drives r SMR Drives

r  Visit http://www.snia.org/events/storage-developer 4

Page 5: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Mobile and Secure:

Cloud Encrypted Objects using CDMI

David Slik NetApp, Inc.

Page 6: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Data Security in the Cloud

r  Organizations (and individuals) are increasingly concerned about storing unencrypted data in the cloud r What happens if the cloud provider is compromised? r What happens if the cloud account is compromised? r What happens if a system that can access the cloud

account is compromised? r What happens if the cloud provider goes out of

business? r  All of these scenarios can result in massive data

breeches, and are often undetected due to lack of audit 6

Page 7: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

7

Page 8: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

What do we need to standardize? r  How are encrypted objects recognized by clouds?

r  In CDMI, based on a mimetype of “application/cms” r  How are requests for plaintext are distinguished from

requests for ciphertext? r  In CDMI, based on HTTP accept header

r  How are requests that require a key made? r  In CDMI, based on metadata attached to the object

r  How does the cloud determine where to get the key? r  In CDMI, based on metadata attached to the object

8

Page 9: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

What do we need to standardize?

r  How does a cloud request a key? r New work to standardize a key request message r  Includes information about the client to allow access

control decisions and audit r  How does the cloud receive a key?

r New work to standardize a key response message r  Includes information about how the key can be used,

and what is to be passed to the client r  How is audit requested and managed?

9

Page 10: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Why is this important?

r  Allows edge systems to use clouds as untrusted repositories

r  Allows trusted edge clouds to federate with untrusted clouds

r  Allows clients and clouds be able be abstracted from the remote access control provider

r  Allows access control decisions for distributed cloud operations to be locally controlled

r  Allows audit for distributed cloud operations to be locally collected

10

Page 11: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Object Drives: A New Architectural

Partitioning

Mark Carlson Toshiba

Page 12: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

What are Object Drives?

r  Key/Value semantics (Object store) among others

r  Hosted software in some cases r  Interface changed from SCSI based to IP based

(TCP/IP, HTTP) r  Channel (FC/SAS/SATA) interconnect moves to

Ethernet network

12

Page 13: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Object Drive use Interface r  Key Value example is Kinetic

r  Metadata is not part of the interface r  Metadata understood by higher levels would be stored as values r  Not expected to be used directly by end user applications (similar

to existing Block I/F in that regard) r  Higher level example is CDMI

r  Includes rich metadata (both user and system) r  Expected to be used by end user applications (and is)

r  Object Drives, because of their abstraction allow many more degrees of innovation freedom in the implementations behind this abstraction r  Hosts needn’t manage the mapping of objects to a block storage

device r  An object drive may manage how it places objects on the media

without reference to host specified locations

13

Page 14: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Types of Object Drives

•  Key Value Protocol (Object Drive) •  Minimal incremental CPU/Memory requirements •  Simple mapping to underlying storage

•  In-Storage Compute (Object Drive)* •  Enough CPU/Memory for Object Node Software to be

embedded on the drive •  General purpose download or factory installed •  May have additional requirements such as solid state

media and more/higher bandwidth networking connections

•  In both cases, the interface abstracts the recording technology

*Jim Gray memorial

14

Page 15: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Key Value Protocol Object Drives •  Eliminates existing parts of the usual storage stack

•  Block drivers, logical volume manager, file system •  And the associated bugs and maintenance costs

•  Existing applications need to be re-written, or adapted •  Mainly used by green field developed applications •  Firmware is upgraded as an entire image •  Hyperscale customers are already doing this

•  Using open source software and creating their own “apps” – Facebook, Google, etc.

•  Key Value organization of data is growing in popularity •  Examples: Cassandra, NoSQL

15

Page 16: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

In-Storage Compute Object Drives

•  Same advantages as Key Value protocol plus •  No need for a separate server to run Object Node

service (other services still need a server but scale separately)

•  scaling is smoother – only adding drives

•  Additional features of the Object Node software can be deployed independently

•  Fewer hardware types that need to be maintained (for selected use cases)

•  Failure domains are more fine grained, thus overall data availability is enhanced 16

Page 17: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Ethernet Connectivity

r  Ethernet speeds standardized at 1Gbps, 10 Gbps r  Object Drives are high volume, low margin devices

r 10 Gbps too expensive for the drive form factor for the foreseeable future

r  Desire to have standardized speeds of 2.5 Gbps, 5 Gbps r With auto-negotiation among them

r  SFF 8601 effort to specify auto-negotiation using existing silicon implementations (switch firmware upgrade – done in a few months)

r  802.3 proposed effort to standardize single lane 2.5 and 5 Gbps speeds in silicon (multi-year effort)

17

Page 18: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Yong Chen Associate Director, Cloud and Autonomic Computing Site Director, Data-Intensive Scalable Computing Laboratory

Assistant Professor, Computer Science Department Texas Tech University

(in collaboration with Nimboxx, Inc)

Unistore: A Unified Storage Architecture for

Cloud Computing

Page 19: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

r  Enterprise computing apps are highly data intensive r  Facebook: O(100PB) of storage, with 220M+ photos uploaded/

25+TB growth per week r  Google: 1 trillion pages by 2008, 1 billion pages growth per day r  Info. retrieval, data mining, online business, social network, etc.

r  Scientific computing apps follow the similar trend r  Scientific computing in Cloud r  “Research as a service”

r  Pressure on the storage system capability increases

19

Background

PI Project On-Line Data Off-Line Data Lamb, Don FLASH: Buoyancy-Driven Turbulent Nuclear Burning 75TB 300TB Fischer, Paul Reactor Core Hydrodynamics 2TB 5TB Dean, David Computational Nuclear Structure 4TB 40TB Baker, David Computational Protein Structure 1TB 2TB Worley, Patrick H. Performance Evaluation and Analysis 1TB 1TB Wolverton, Christopher Kinetics and Thermodynamics of Metal and 5TB 100TB Washington, Warren Climate Science 10TB 345TB Tsigelny, Igor Parkinson's Disease 2.5TB 50TB Tang, William Plasma Microturbulence 2TB 10TB Sugar, Robert Lattice QCD 1TB 44TB

Source: R. Ross et. al., Argonne National Laboratory

Page 20: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Traditional Storage Media r  Over 90% of all data in the world is being

stored on magnetic media (hard disk drives, HDDs)

r  IBM invented in 1956

r  Mechanism remains the same since then

r  Various mechanical moving parts

r  High latency, slow random access performance, unreliable, power hungry

r  Large capacity, low cost (USD 0.10/GB), impressive sequential access performance 20

Source: online

Page 21: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Emerged Storage Media r  Non-volatile storage-class memory (SCM) r  Flash-memory based Solid State Drives

(SSDs), PCRAM, NRAM, … r  Use microchips which retain data in non-

volatile memory (array of floating gate transistors isolated by an insulating layer)

r  High throughput, low latency (esp. random accesses), less susceptible to physical shock, power efficient

r  Low capacity, high cost (USD 0.90-2/GB), block erasure, memory wear out (10K-100K P/E cycles)

Intel® X25-E SSD 1  1  1  1 

1  0  1  1 

1  0  0  1 

1  1  0  1 21

Page 22: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Storage System Needs and Challenges

r  Conventional HDDs and emerged SCM complement each other well

r  Traditional parallel/distributed file systems designed for HDDs not managing SCM well

r  Straightforward solution not optimal r  Placing SCM as additional memory

hierarchy in the current storage tier

r  Intensive writes go through SCM r  Inclusive setting limits capacity

r  Do not distinguish criticalness of data 22

Page 23: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Our Solution - Unistore: A Unified Storage Architecture

r  A distributed file system store r  Distinguish workload and access patterns r  Consider SCM and HDD features

r  Block erasure, internal concurrency, wear-leveling

r  High performance, high cost SCM keeps r  Semantically critical data, e.g. metadata

blocks r  Performance critical data, e.g. frequently

accessed

r  High capacity, low cost HDD keeps r  Low-priority large data sets

r  In-situ processing runtime optimizations

Unistore

n  Goals p  Performance close to

SCM devices p  Capacity close to the

SCM+HDDs p  Combines merits and

avoids drawbacks

23

Cloud Applications

Page 24: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Unistore: High-level View

24

Workloads Access patterns

Devices

Bandwidth Throughput

Block erasure Concurrency Wear-leveling

Characterization Component I/O Pattern

Random/sequential Read/write Hot/cold

Data Placement Component

Placement Algorithm

e.g. modified consistent hashing

In-situ Processing Component

Active storage

Deduplication

r  We are developing a prototype based on Sheepdog, a distributed object storage system for volume and container services

r  https://sheepdog.github.io/sheepdog/

Sheepdog architecture

Page 25: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

r  SUORA: Scalable and Uniform storage via Optimally-adaptive and Random number Addressing

r  Divide heterogeneous devices into multi-dimensional space

r  Assigns devices to different segments in each dimension r  Map data among segments and dimensions

r  Model

SUORA Algorithm

25

Page 26: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

r  SUORA models devices as buckets and segments in numbered lines

r  Different lines (dimensions) represent different factors considered (e.g. throughput, capacity, etc.)

SUORA Algorithm (cont.)

26

Node � Bucket� Capacity � Assigned Segments �A � HDD � 1TB � HSegment0 (-1, 0) �B� HDD � 0.8TB � HSegment3 (-3.8, -3) �C� HDD � 1.5TB � HSegment1 (-2, -1) �

HSegment2 (-2.5, -2) �D � SSD � 0.6TB � SSegment0 (0, 1) �E� SSD � 0.3TB � SSegment1 (1, 1.5) �F � SSD � 0.8TB � SSegment2 (2, 3)

SSegment3 (3, 3.3) �!

SSDBucket

HDD Bucket

1 2 3 4

SSegment0

-4 -3 -2 -1

HSegment0 SSegment2HSegm-

ent3

0

(1.5)

SSegm-ent1

(3.3)

SSegm-ent3HSegment1

(-3.8) (-2.5)

SSegm-ent2

A�B� C�C� D� E� F�F�

Page 27: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

SUORA Algorithm (cont.)

27

r  Initial comparison with CRUSH, consistent hashing, ASURA different algorithms

r  In terms of computation time (of data distribution), memory footprint, uniform distribution, adaptation to device additions and removal

r  The initial comparison has shown the promise of SUORA algorithm in Unistore architecture

r  More details will be in the SNIA SDC presentation

Page 28: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

r  Unistore is an ongoing effort of attempting to build a unified storage architecture for Cloud storage system

r  With the co-existence and efficient integration of heterogeneous HDDs and SCM devices

r  An initial prototype being developed based on Sheepdog

r  A SUORA algorithm is being developed to manage heterogeneous devices in a multi-dimensional space

Summary

28

Thank You! Please visit: http://discl.cs.ttu.edu/, http://cac.ttu.edu/

Page 29: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Using CDMI to Manage Swift, S3, and

Ceph Object Repositories

David Slik NetApp, Inc.

Page 30: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Converged Data Management

r  Objects, Files, LUNs, NoSQL databases, and many other types of persistent data entities can all be managed through the Cloud Data Management Interface (CDMI)

r  This presentation focuses on how CDMI provides a common management interface that sits beside the predominant data access interfaces, such as NFS, CIFS, iSCSI, S3, Swift, etc.

r  This is especially valuable for converged storage infrastructure that allows data objects to move between platforms and interfaces

30

Page 31: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

CDMI Namespaces

r  CDMI supports arbitrary tree-based hierarchies r  CDMI references allow symlink like cross-tree links r  Superset of Swift, S3, Azure and other cloud namespaces r  Superset of standard filesystem namespaces r  Superset of LUN/Zone hierarchies r  This allows a single CDMI namespace to encapsulate all

of these different data types into a single namespace for management purposes:

31

Page 32: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Varied Namespace Restrictions

32

Root Container

Container

Data Object (or Queue)

Data Object Capabilities / Requirements

Container Capabilities

System Capabilities Root

Account

Object

Container

CDMI Object Model Overlay for Swift Object Model

Object

Bucket

Overlay for S3 Object Model

Account

Root Root Directory

Directory

File

Overlay for File System Model

CDMI Root Container

CDMI Container

CDMI Data Object

CDMI Root Container

CDMI Container (Representing

Account)

CDMI Container

CDMI Data Object

CDMI Root Container

CDMI Container

CDMI Data Object

Page 33: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

After This Webcast

r  This webcast and a copy of the slides will be posted to the SNIA-CSI website and available on-demand

r http://www.snia.org/forum/csi/knowledge/webcasts r  A full Q&A from this webcast, including answers to questions we couldn't get to today, will be posted to the SNIA Cloud blog

r http://www.sniacloud.com/ r  Follow us on Twitter @SNIACloud r  Google Groups:

r http://groups.google.com/group/snia-cloud

33

Page 34: Expert Panel: Cloud Storage Initiatives: A Storage ... · Info. retrieval, data mining, online business, social network, etc. ! Scientific computing apps follow the similar trend

2015 Storage Developer Conference. © All Rights Reserved.

Questions?

r  Thank You r  Please rate this talk!

34