21
Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO

Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

Trends in ManagingData at the PetabyteScale

Steve KleimanSr. VP & CTO

Page 2: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

2

Before we begin…

Disk reliability– SIGMETRICS ’07:

An Analysis of Latent Sector Errors in Disk Drives• Lakshmi Bairavasundaram, Garth Goodson, Jiri Schindler,

Shankar Pasupathy

– Symposium on Reliability and Maintainability’03,’04,’05

• John Elerath and Sandeep Shah

Page 3: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

3

Petabyte Environments are Here!

2006 Q1-Q3 NAS+SANPetabytes Shipped

~25 NetApp customerswith >1PB

Largest: ~33PB

214EMC

199NetApp*

180HP

247Other

996Total

64Hitachi

92IBM

PBVendor

Source: IDC, Dec 2006

*Current quarterly run rate >100PB YoY Growth >100%

Page 4: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

4

The Growing Burden of Data Ownership

Operational burdens– Managing the data explosion

• 50-100%• Unstructured, semi-structured, structured

– Increasing dependence on data• Ensuring 100% availability• Protecting data from disasters

– Rapidly deploying new applications– Global operations

• Multiple data centers• Many remote offices

Financial burdens– Controlling costs

• Equipment, people, processes• Utilization

Page 5: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

5

New/Hidden Burdens of Data Ownership

Legal burdens– Complying with regulations

• Discovery• Preventing unauthorized access• Retention

Social burdens– Protecting your reputation

• Disclosing data loss

Geo-political burdens– Multiple cultures & legal systems

Page 6: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

6

Traditional Infrastructure Build-out:Application-centric Silos

Incompatible hardware

Incompatible software

Different processes

Lots of experts

Low utilization

Applications

Primary Storage

Good Quality of Service

Tier 1 Tier 2 Tier 3Tier 1

Page 7: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

7

It’s Not Just the Primary Storage

Primary

DR Test & Dev Backup Archive

Page 8: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

8

Separation of Data from Physical Containers

Global NamespaceScale-out

Multiple Tiers

DataDataData

Data

DAS

Data

NetworkedStorage

Data

SnapshotsClones

Thin-provisioningData mirroring

DataData

Page 9: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

9

Unified Protection & Enablement Environment

Backup

Data

DR

Archive

Test/Dev

Mining

Backup

Data

DR

Archive

Backup

Data

DR

Archive

Test/Dev

Backup

Data

DR

Archive

Test/Dev

Mining

DataProtection

ApplicationEnablement

Page 10: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

10

Backup

Data

Archive

Backup

Data

Archive

Test/Dev

Unified Protection & Enablement Environment

Backup

Data

DR

Archive

Test/Dev

Mining

Backup

Data

DR

Archive

Test/Dev

Mining

DataProtection

ApplicationEnablement

Page 11: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

11

Non-copy Data Properties

Data

Security Classification

AccessControl

QOS

Compliance

Namespace

Page 12: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

12

Backup

Data

Archive

Backup

Data

Archive

Test/Dev

Unified Protection & Enablement Environment

Backup

Data

DR

Archive

Test/Dev

Mining

Backup

Data

DR

Archive

Test/Dev

Mining

DataProtection

ApplicationEnablement

Page 13: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

13

The Storage Admin’s Challenge

StorageManager

ApplicationManagers

Page 14: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

14

Managing The Copies

1 Oracle database:17 tables on Primary+ 17 tables on remote DR site+ 17 mirror relationships between primary and DR+ 17 tables on secondary dev & test+ 17 mirror relationships between primary and

secondary+ Backups+ Archive copies

Or1 Dataset

Page 15: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

15

What’s a “Dataset”?

Dataset: A collection of data meaningful to the user ordata administrator having similar properties– A set of database tables– A home directory– A server root LUN

Datasets have properties– Redundancy, Disaster recovery– Compliance, Saved versions– QOS– Security, Access control– ???

Datasets can span storage servers– A higher level of abstraction allows automation

Page 16: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

16

Simplification ThroughIntegrated Data Management

Application admins set data properties

Properties assigned to logical sets of data

Properties define business requirements for data

Storage admins create & manage processes

Processes deliver on data requirements

Automation & service delivery become possible

Data

Page 17: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

17

Simplification ThroughIntegrated Data Management

Recovery time objective: 0 sec

Applicable regulations: SEC-17A

Security level: high

Properties

Policies

Low RTO: use synchronous mirroring

SEC-17A: enable SnapLock; delete after 7 years

Hi Security: enable encryption

Data

Page 18: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

18

Simplification ThroughIntegrated Data Management

Right decisions are made by theright people

Easier to change and automate– Goal: Automate 80% of workflow

Data properties can remainconstant while processes adapt tonew technologies

Data

Page 19: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

19

“Two Worlds” vs. Storage VirtualizationArchitecture

Vendor 1 Vendor 2

Page 20: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

20

Long Term Trends

Unification of capabilities in a single storageinfrastructure

Property-based dataset management adoptedfor simplification and automation

It’s starting to happen now

Unified modelScale-out & GridValue-added copies

VirtualizationData sets & propertiesHeterogeneous replication

Page 21: Trends in Managing Data at the Petabyte Scale · Trends in Managing Data at the Petabyte Scale Steve Kleiman Sr. VP & CTO. 2 ... –Managing the data explosion ... Separation of Data

21

Summary

It’s good to be in storage!