35
System Management Planners: Transforming (High-level Specifications) (Configuration Actions) Sandeep Uttamchandani IBM Almaden Research Center

System Management Planners: Transforming (High-level Specifications) (Configuration Actions)

  • Upload
    shakti

  • View
    17

  • Download
    0

Embed Size (px)

DESCRIPTION

System Management Planners: Transforming (High-level Specifications)  (Configuration Actions). Sandeep Uttamchandani IBM Almaden Research Center. Jim Gray's Turing award speech “What next? - A dozen IT research goals”, 1999. Build a system used by millions of people each day - PowerPoint PPT Presentation

Citation preview

Page 1: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

System Management Planners: Transforming (High-level Specifications) (Configuration Actions)

Sandeep Uttamchandani

IBM Almaden Research Center

Page 2: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

2

Jim Gray's Turing award speech “What next? - A dozen IT research goals”, 1999

Build a system• used by millions of people each day• administered and managed by a ½ time person.

• On hardware fault, order replacement part• On overload, adjust automatically

Page 3: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

3

Automated Management: A Growing Necessity!

Demand for IT Management :

– Growing number of applications and data footprint (988 exabytes in 2010 compared to 161 in 2006 -- IDC)

– Government regulatory compliance (e.g., HIPAA, Sarbanes-Oxley), Disaster Recovery Planning, Application Performance requirements, Provisioning Planning

– Growing number of heterogeneous devices, management protocols, application requirements and policies

Supply of Administrators

– 1 Storage Admin manages approximately 300GB- 1000GB of storage -- enterprises moving towards petabyte scale systems

– Lack of end-to-end knowledge: Application + Servers + Networks + Storage

– Skilled administrators are scarce and costly

Page 4: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

4

Talk Outline

Problem Drill-down: Understanding the System Administrative tasks

Taxonomy of Approaches for Automation

Management Planners

– Building Blocks

– Putting it together

Page 5: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

5

Deploying a new Application within an Enterprise Data-Center

1. Find a server – create a new Virtual Machine

2. Install and configure application

3. Select a Storage controller

4. Find a storage pool -- create a new volume with required capacity

5. Select a FC switch with available ports -- connect to server and storage controller

6. Zone the switch followed by LUN Masking and mapping

Page 6: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

6

- read/write ratio

- rand/seq ratio

- request-size

- …

Workload access variations

Failures

Heterogeneous Component Models

Load SurgesSPC: OLTP

0

2000

4000

6000

8000

10000

12000

12 24 42

Number of HDDs

TpmC

DAS CPU=1 DAS CPU =2 NAS CPU = 1

NAS CPU = 2 iSCSI CPU=1 iSCSI CPU=2

DASNASiSCSI

- Hardware failures

- Software bugs

- Operator errors

Observe

Analyze

Act

Time

Time

Re

qu

est

siz

e

IOP

S

Application Performance Management (I)

Page 7: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

7

Application Performance Management (II)

Problem Determination

– Impact Analysis

– Root-cause diagnosis

– Event mining and configuration changes

Load balancing

– Which application to move? Where? When?

Adding Hardware

– Servers, Network, Storage? Where?

Page 8: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

8

Post deployment Tasks

Performance Management

Disaster Recovery (Availability Management)

Regulatory Compliance

Security

Hardware changes

Changing applications and IT goals

Page 9: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

9

Administrator’s Dream

?

Current State

Task Requirements

Objective

Functions

System Configuration Details/Corrective Actions

Page 10: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

Taxonomy of Existing Approaches

Page 11: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

11

Expert Systems: Capturing Human Problem Solving

Mycin

Expert

System

Series of disease

symptoms

Pin-point Bacteria/

Medication

IF the infection is pimary-bacteremia

AND the site of the culture is one of the sterile sites

AND the suspected portal of entry is the gastrointestinal tract

THEN there is suggestive evidence (0.7) that infection is bacteroid.

Rule-based Inference

Page 12: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

12

Knowledge Representation (facts)

Knowledge Usage (formalisms)

Limitations/ Challenges

Policy-based - Event-Condition-Action Rules

- “Canned recipes”

Scanning for applicable rules

- Complexity

- Brittleness

Pure Feedback-based

- Little or no information about system details

- Uses instantaneous reaction as a basis for future action

Incrementally explore different permutations with the state-space

Infeasible for production systems with a large solution-space

Empirical/ Learning-based

Recording system behavior in different states

Finding a recorded state that is “closest” to the current state

Error-prone and infeasible in real-world systems with large number of parameters

Model-based - Mathematical or logical functions -- Predictors of system behavior

- Originally proposed for system diagnostics

Optimizing based on predicted system-state for different permutations of parameters

- Representation of models- Creation and evolution of models- Formalisms for reasoning- Inaccuracies in predicted values

Taxonomy of existing approaches

Page 13: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

Management Planners:Model-based + Declarative Specifications

Page 14: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

14

Predictors of

System Behavior

Knowledge-base

Reasoning

Engine

Managed [Storage]System

Declarative Specification

Minimize high-priority workloads violating SLOs

Curre

nt Sta

tus

Objective

Capability

Configuration/

Action Selection

- Component capabilities

- Workload dependencies on individual components

- Effects of action invocation

Constrained

Optimizer

Corrective Action TriggerSelf-evolving

predictors using machine

learning

Page 15: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

15

Building Blocks

Requirements (Declarative Specifications)

Collecting data from devices

Creating device performance models

Formalizing the optimization problem

Page 16: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

16

Knowledge-base: Intuition for generating models

Models are mathematical functions e.g. r = ax + by

Input variables = {x, y}; Output variables = {r}; Constants {a, b}

Generating a function (curve-fitting approach):

Step 1: Designer Specification: Enumerates related parameters (r is a function of x, y, and z)

Step 2: Creating a Baseline model: Off-line data collection; for values of r, x, y, z, determine a best-fit curve (i.e. values of coefficients, and the form of function)

Step 3: Continuous on-line refinement of the functions with additional monitored data

Page 17: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

17

Component Models

Representation:

Response time = c( req_size, r/w_ratio, rand/seq_ratio, req_rate, cache_hit_rate)

Bootstrapping:

– Offline calibration tests OR

– Performance specifications from vendor

Challenges:

– Interleaving of workload streams

– Caching effects due to sharing

Related Work:

– CART model [CMU]

– Table-based approach [HP]

Linear fit (Non-saturated case): S = 0.2509, r = 0.989

Quadratic Fit: S= 3.284, r = 0.838

Page 18: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

18

Workload Models

Representation:

Component Load = wn(application request_rate)

e.g. load at the storage controller and switch for 1000 database (OLTP) transactions

Bootstrapping:– Initial monitoring phase OR– Libraries for application workloads (e.g. OLTP,

Decision support, E-mail)

Challenges:– Mean-value not sufficient for real-world workloads**;

Using Cummulative Distribution Functions (CDF)

Related Work:– ClockWork (trend prediction) [IBM]– Using ARIMA for Predictive IO

prefetching [UIUC]

0

5

10

15

20

25

0 1000 2000 3000 4000

Request rate at controller (iops)

Application request-rate

(transactions/sec)

Capturing Mean

Capturing Variance

SPC OLTP

Harvard Campus

Page 19: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

19

Optimization Formalism: Linear Programming

Objective function: While solving the SLA violation, minimize the throttling of high priority workloads

Variables: Throttle value for each workload

Constraints:

– The response time of the components for a given component load

– The request-rate at the component arriving from the workload streams

– Change in the application request-rate with throttling

– Latency SLA of the workloads

Page 20: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

20

Management Planners: A reality!

Throttling Planner [Usenix’05]

SMART: Performance Management Planner [Usenix’06]

SAN Planner in IBM TotalStorage Productivity Center

Disaster Recovery Planner

End-to-end Provisioning Planner

Page 21: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

21

Ongoing RADLab Research

Page 22: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

22

Summary…

Data-centers are becoming growing to petabyte scale and beyond

Need for Automation

– Administrative Tasks range from simple firmware upgrades to complex provisioning and disaster recovery planning

– Back-of-the-envelop calculations are no longer feasible

Management Planners

– Map high-level declarative specification to configuration commands

– Hide the underlying device configuration, performance, event details

– Automatically create and continuously refine device models

Page 23: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

23

Food for thought…

Representation & creation of domain knowledge

- Understanding of system details

- Feature-set selection

- Machine learning techniques

Formalisms for selection & execution of actions

- Constrained optimization techniques

- Handling uncertainty and inaccuracies

- Variably aggressive action execution

“How accurate can the models

be?”

“How accurate the models need

to be?”Research Spectrum

Pragmatic rules-of-thumb

- Models don’t need to perfectly accurate

- Not critical to select the most “optimal” action invocation, but rather to avoid the worst ones

- Creating domain knowledge is not a one-time activity – incremental addition and evolution

- Automate the common-case

Page 24: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

24

Thankyou!

Sandeep Uttamchandani ([email protected])

http://www.almaden.ibm.com/StorageSystems/Storage_Management_and_Solutions/

Page 25: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

25

Backup

Page 26: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

26

IBM Systems and Technology Group

Fall 2005 STG IT Analyst Event

Policy-Based Interface

Term Policy means different things to different peopleService Class

Goals

Constraints

Best Practices

Rules of thumb

If-Then-Scope-Priority (IBM’s PMAC model)

Some users want a lot of control over the policy specification

Other users want pre-packaged service classes (like Gold/Silver/Bronze) and they subsequently want to fine-tune the parameters and create customized service classes

Page 27: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

27

Aperi: Open standard initiatives

Stor

ageSystem

“Inn

ovat

ors” Fabric

AperiCommon

Open SourcePlatform

Acade

mia

VC

’s

Startups

Aperi’s goal Delivers an open-source common

management platform through the contribution and development of actual code.

The common platform will implement SNIA’s SMI-S specification for management of heterogeneous devices.

Targeted Benefits Improve speed to market of new

advanced tools designed for ease of use

Reduce the need for customers to replace storage management platforms when purchasing new hardware or software

Encourage vendors to support industry standards for management in their hardware implementations

AperiAn open-source storage management community

Initial members

Page 28: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

28

Optimization Formalism (cont.)

Minimize ∑paipbiAi /SLAi

where pai= Workload priority pbi = Quadrant priority

SLAi – a(current_throughputi, ti) if SLAi > a(current_throughputi, ti) Ai = 0 otherwise

Minimize ∑paipbi[ SLAi – a(current_throughputi, ti)] SLAi

Constraints:

cachehiti*hittimei+(1-cachehiti) c(∑a(current_throughputi, ti) SLAi

0 ti 1

Objective function: FAILED EXCEED

LUCKYMEET

0 1

1

IOps(%)

Late

ncy(

%)

Page 29: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

29

Related Fields: A Lot to Learn!

• Low Road: Dendral• Middle Road: Mycin, R1• High Road: Sophie

Expert Systems Research

- Architecture: Knowledge-base & Reasoning engine- Knowledge-base encodes (generic) domain knowledge - Reasoning engine can interpret knowledge in multiple ways

Non-Procedural Specifications Research

- “Procedural-is-best” controversy- Separation of facts and formalisms- Strategies such as “backtracking” to search the knowledge-base

• Logic-based • Network-based• Relational model

• Supervised/ Re-enforcement • Boosting

Machine learning Research- Correlating observed behavior with system parameters- Statistical Learning techniques: Neural Networks, SVMs- Gray-box approaches such as the Snowball project

Page 30: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

IBM Research

© 2005 IBM Corporation30

A Typical Data-center Application

(SAP Application Server)Executables

NTFS File System

DB Server WINDOWSIBM DB2 (Database Managed Storage)

DB Server AIXIBM DB2 (System Managed Storage)

DB Server WINDOWSOracle (Database Managed Storage)

DB DB DB

Volume Volume Volume Volume Volume Volume Volume

Data Log Data Log Data Log Temp

Logical Volume Manager

Logical Volume Logical Volume

JFS JFS

Page 31: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

31

Application Downtime = $$$ Losses

Applications require 24 x 7 availability of business critical applications

Application Availability = Ensuring availability of multiple tiers Storage Controllers SAN Appliances Servers Virtual Machines Databases/File-systems

Failures come in several flavors Virus failures Mis-configuration errors Subsystem failures Site failures (hurricanes, planes)

Page 32: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

32

Preparing for IT Disasters: Administrator’s Task-list Planning

Understand DR requirements Evaluate replication services available at storage and other levels Analyze existing copy services configuration (if any); Generate a DR plan

Deployment Configure various replication technologies from IBM and non-IBM vendors Replication at different levels namely the database, server, operating system

and storage level (e.g., RM, HACMP, SRDF, MSCS, VCS)

Validation Validate DR plans for changes in configuration changes (e.g., changes in

zoning) and application characteristics (e.g., write rate)

Continuous Optimization Optimize DR plans for unused copy relationships Recommend updates to existing configuration based on hardware and software

changes

Page 33: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

33

Preparing for IT Disasters: A Consultant’s Gold-mine

Planning: Complex Search Space, Manual & Error-prone DR Requirements (RMAF questionnaire); Storage and Server Characteristics

Replication Technology Characteristics

Constraints: Interoperability, # sites, dollar cost

Best Practices

Deployment: Requires expertise in multiple replication technologies Vendor-specific CLI commands/API for creating/updating/deleting copy pairs,

sessions, consistency groups; done manually today by administrators

Validation and Continuous Optimization: 24X7 impact analysis Analyze impact of configuration changes and application properties

Periodic sampling at primary and secondary storage for Recovery Point Objective (RPO)

Page 34: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

34

Related Work: Creating models

Analytical

Approaches

Black-box

Approaches

Spectrum

MonitorMining

- John Wilkes Ecosystem: Minerva, Ergastrulum, Hippodrome

- Modeling disk behavior, formulas for data prefetching , modelling migration

- [Bre94], [Bre95], [Mat97], [Agr00], [Nob97], [Men01], [Laz84], [Vap95], [Vud01]

- Case-base reasoning

- Multi-relational mining

- Table-based models

- CART [CMU], CMiner [UIUC]

Brittle, error-prone Convergence, AccuracyRepresentation of models?Evolution of models?Incomplete designer specifications?

y = ax5 + bx11 y = a1x1 + a2x2 +… a100x100

y = f(x5,x11) f monitored

data

Page 35: System Management Planners:  Transforming  (High-level Specifications)    (Configuration Actions)

35

Related Work: Policy-based Management (Pattern-based Procedure Invocation [Hewitt67])

Complexity

– Level of details in terms of thresholds and invocation values

– Deciding among the action set

– Number of rules and conflicts analysis:

O( Resource-state x Workload x Action-sets x Current-behavior)

Brittleness

– Closely tied with system configurations, workloads and action-sets

– No systematic model/approach for refining specifications

Example: Rules for the Prefetch knob

[Event]: Latency_violation[Condition]: If ((Memory_available > 70) && (access_pattern < 0.4 sequential) &&

(read/write > 0.4)) [Action]: Prefetch = 1.2*Prefetch

Event: Latency_violationIf {(15 < Memory_available > 70 && FC_interconnect_available > 60 )

&& ( access_pattern > 0.7 sequential && read/write > 0.4)}Prefetch = 1.4*Prefetch

 Event: Latency_violationIf {(Memory_available > 70 && FC_interconnect_available > 60 )&& ( 0.4 < access_pattern < 0.7 sequential && read/write > 0.4)}

Prefetch = 1.3*Prefetch Event: Latency_violationIf {(Memory_available < 15) && ( access_pattern > 0.8 sequential && read/write > 0.4)}

Prefetch = 1.2*Prefetch Event: Latency_not_metIf {(Memory_available < 15 ) && ( access_pattern < 0.8 sequential && read/write > 0.4)}

Prefetch = 1.05*PrefetchEvent: Latency_not_metIf {(FC_interconnect_available < 20) && ( access_pattern > 0.8 sequential && read/write > 0.4)}

Prefetch = 1.3*Prefetch

 ………AND MORE…………………..