1-NYS Forum PPT Template2 sw v2 · % Uptime % Downtime Downtime Per Year Downtime Per Week 98% 2% 7.3 Days 3 Hours, 22 Minutes 99% 1% 3.65 Days 1 Hour, 41 Minutes ... – Redundant

NYS ForumBusiness Continuity

Architecting a Cost Effective Business Continuity Solution for Physical and Virtual Environments

March 2nd 2011

Joe D’Angelo, Director of Technical Services

2

Agenda

– Welcome & Introductions

– Defining Business Continuity– Addressing Downtime

– Data Protection + Application/Data Availability

– Backup & Restore– Making Disk Backups Affordable with De-Duplication and Snapshots

– Storage Technologies– Thin Provisioning, Data Migration & Storage Virtualization

– Server Virtualization – Embedded Tools vs. Third Party Options & Managing Virtual Server Sprawl

– High Availability & Disaster Recovery Concepts– Cluster Topologies & Multi-Platform Support

– Latest Trends in IT Energy Efficiency – Reducing Cooling Costs and Increasing MTBF

– Principles of High Availability Design

– Q & A

3

3

Defining Business Continuity

– Maintaining SLA’s, RPO & RTO– How Many Nines are Necessary?

– Planned or Unplanned, Downtime is still Downtime

– Clearly define SLA’s, otherwise how you will ever achieve them

– As SLA increases so does the cost (Good, Fast and Cheap, Pick 2)

% Uptime % Downtime Downtime Per Year Downtime Per Week

98% 2% 7.3 Days 3 Hours, 22 Minutes

99% 1% 3.65 Days 1 Hour, 41 Minutes

99.8% 0.2% 17 Hours, 30 Minutes 20 Minutes, 10 Seconds

99.9% 0.1% 8 Hours, 45 Minutes 10 Minutes, 5 Seconds

99.99% 0.01% 52 Minutes, 30 Seconds 1 Minutes

99.999% 0.001 5 Minutes, 15 Seconds 6 Seconds

99.9999% 0.0001 31.5 Seconds 0.6 Seconds

– Data Protection + Application & Data Availability = Business Continuity

– Backup & Restore Down Time Concerns– Failed Backup’s

– Time to Recover & Backup Window

– Retention of data can outlast the life expectancy of media (Compliance)

– Restore Tests are Cumbersome with Offsite Managed Backups

– Tape Rotation and Management

– High Availability and Disaster Recovery Considerations– Protection for localized and catastrophic system and application failure

– Not all HA Solutions are created equal (Storage vs. Application)

– Multiplatform/OS Dependency & Visibility

– Configuration Drift (Why did My Failover Fail?)

– Complexity is the last thing you want during a DR scenario

– Manageability

4

Defining Business Continuity (cont…)

5

Business Continuity Options

6

Disked Based Data Protection

– Reducing Outage Windows– What does downtime really cost?

– Revenue, SLA’s & Image

– “It’s all in the Restore”

– Distinct advantage over Tape

– Snapshot backups can reduce system overhead

– Test & Development Provisioning

– Isn’t Disk more expensive?

– De-duplication (Where and When)

– Leveling the playing field against Tape Solutions

– Reduce Total Storage Need for Backups

– Less Bandwidth Consumption

– Reduces Infrastructure Costs

– Future Proofing

– Integrates with most Enterprise Backup Software Solutions

7

Cost Effective Storage Strategies

– Centralized Storage Management– Multiprotocol Access (NAS and SAN)

– Multitier Storage Options (The Right Storage for the Right Application)

– Heterogeneous Manufacturer Environments

– Array Based Features– Storage Reconciliation

– How much do I own vs. How much am I using

– Thin Provisioning & Does my file system support It?

– Storage forecasting (Physical and Virtual)

– Non disruptive upgrades?

– Disk drive hibernation (Energy Savings for Long Term Storage)

– “Out of the Puddle and Into the Pool”– Combining Storage Resources with Enterprise wide visibility

– Advanced features that are array agnostic

– Transparent Data Migrations when storage is EOL & EOSL

System & Service Virtualization

8

– Driving Factors– Improve Utilization & Management

– Reduce Administrative Costs

– Decrease Deployment Time

– Advantages– Management Capabilities

– Dashboard View of a expansive Server Environment

– Portability & Provisioning

– Reduction in Maintenance Costs

– Considerations– Virtual server sprawl (OS footprint vs. server footprint)

– Application visibility from embedded HA tools?

– What am I really trying Virtualize? Server or Service?

– Storage Impact on your SAN: Did I “Remove“ the cost or just “Move” it?

– Disruption factor, outage impact & multi-OS dependency (P & V)

9

– Virtualize without Compromise– Expand upon both Native and Third Party Virtualization tools

– Bare Metal Virtualization (Server)

– OS Based Virtualization (Service)

– Minimal Cost Options with significant ROI

– Storage Impact– De-Duplication of multiple OS images

– Ideal for Virtual Desktop

– Array Bases Replication does not account for Application Consistency

– Reconcile your Storage Needs: Application & Virtual Environemnts

– Centralize Reporting and Provisioning

– IP based Storage

– Keep Production Data and Production Traffic isolated

– Considerations– What does an outage now mean?

– Redirect Systems savings to software tools to help increase availability

System & Service Virtualization (cont..)

High Availability Concepts

10

– Localized Hardware Redundancy– OS mirrors & hot swap boot drives

– Boot from SAN (Less moving Parts)

– Redundant power supplies

– Removing Single Points of Failure

– Proper Airflow and Cabling

– Storage Availability– Multipathing (Dual HBA’s)

– Native vs. Third-Party

– Redundant Fabric/SAN

– Active/Active vs. Active/Passive Storage Controllers

– Non-Disruptive upgrades and storage growth

– Cloud (SAS) models

– Data migration support

– What happens when you change vendors?

– How transparent is it? What is my outage?

11

High Availability Concepts (cont…)

– Host Based Clustering (Local & Metro)– Native

– Easy to deploy (Integration)

– Learning curve can be less

– Limited coverage from applications

– Typically customizations are complex or unsupported

– Point solution that varies between all platforms

– FTE Resource Needs are Greater

– Third-Party

– Cross platform standardization (Operational Support)

– Central Management, reporting and visibility of all applications

– Physical to Virtual Clustering

– Cross Platform & multi-tier dependencies (Servers, Apps and Storage)

– Easily adaptable to custom applications

– Training Costs

– Implementation costs

– Reduce FTE Costs

– Local HA and remote DR in the same solution

Disaster Recovery Concepts

12

– Backups vs. Replication– Protection from Data Corruption vs. Resuming Business Processing

– One Does not relieve the need for the other

– G.I.G.O. Model

– RPO & RTO

– 24 Hours, 24 Minutes or 24 Seconds means the same solution

– Host Based vs. Array Based

– Heterogeneous Options

– Management and Visibility– Dove tail off of HA design (One tool for both)

– Integrate with storage solutions

– Active/Active design coupled with Virtualization

– Non-Disruptive testing procedures

– Bi-Directional

– Remediate Infrastructure Issues Proactively not Reactively

Latest Trends in IT Energy Efficiency

13

Current Issue:

50% of IT data center power consumption

comes from cooling the IT equipment

What if we could significantly reduce this value?

How:

Utilize Precision Cooling Technologies at the

rear of the IT cabinet over traditional computer room air conditioning methods

Advantages of Precision Cooling

14

–Save up to 95% on Cooling Costs

-Reduced footprint in data center

• Floor space efficiency (less racks required)

• Rack space efficiency (densification

-NYSERDA grant funding available

-New York State Energy Research and Development Authority

-No raised floor design required (Save $$)

-New technologies are refrigerant-based

• No water near electronics

-Enhanced server life

• Cooler equipment = higher availability

• More time between equipment refreshes

• Longer Duty Cycle

15

20. The Total Cost of Ownership

– Availability is a driven by a business need and financial cost is only one factor

19. Assume Nothing

– Availability is not necessarily bought from a vendor but architected by a customer

18. Remove SPOF”s

– Weakest Link Effect

17. Enforce Security

– Know who is doing what and where

16. Consolidate your Systems

– Go from the “Fog to the Cloud.”

– Redundancy can be superfluous.

15. Watch your Speed

– Ensure Availability efforts do not jeopardize performance (Monitor)

14. Enforce Change Control

– Mitigate User Error

13. Document Everything

– Don’t rely on FTE knowledge.

– Proper Audience

12. SLA’s

– Uptime, Hours of Service and Support

– Application Priority

11. Plan Ahead

– Anticipate overruns on projects and schedule Fire Drills.

HA Design Principles

HA Design Principles (cont…)

16

10. Test Everything

– Backups, Restores and DR Scenarios

9. Separate Your Environments

– Dev, Test & Staging vs. Prod & DR (Use Snapshots and VM’s)

8. Learn from History

– Audit system state information to diagnose root cause of downtime

7. Design for Growth

– Gold Fish Effect

6. Choose Mature Software

– Avoid Bleeding-edge whenever possible (Lack of R & D)

5. Choose Mature, Reliable Hardware

– MTBF and Parts replacement is key to maintaining Availability

4. Reuse Configurations

– “Rinse and Repeat” Do not design on the Fly

3. Exploit External Resources

– NYSIT Forum, User Groups, Vendor Conferences

2. One Problem, One Solution

– Software to address Software issues and Hardware to Address Hardware issues

– When all you have is Hammer…..

1. K.I.S.S.

– Remove guess work by automating daily tasks & routines

– Review Business Process first before deciding on a technology

– Avoid ambiguity by clearly identifying responsible parties

17

Questions

Joseph B. D’Angelo

Serverware Corp.

585-785-6100

[email protected]

www.serverwarecorp.com

Documents

1-NYS Forum PPT Template2 sw v2 · % Uptime % Downtime Downtime Per Year Downtime Per Week 98% 2% 7.3 Days 3 Hours, 22 Minutes 99% 1% 3.65 Days 1 Hour, 41 Minutes ... – Redundant