Upload
dinhkien
View
220
Download
0
Embed Size (px)
Citation preview
NYS ForumBusiness Continuity
Architecting a Cost Effective Business Continuity Solution for Physical and Virtual Environments
March 2nd 2011
Joe D’Angelo, Director of Technical Services
2
Agenda
– Welcome & Introductions
– Defining Business Continuity– Addressing Downtime
– Data Protection + Application/Data Availability
– Backup & Restore– Making Disk Backups Affordable with De-Duplication and Snapshots
– Storage Technologies– Thin Provisioning, Data Migration & Storage Virtualization
– Server Virtualization – Embedded Tools vs. Third Party Options & Managing Virtual Server Sprawl
– High Availability & Disaster Recovery Concepts– Cluster Topologies & Multi-Platform Support
– Latest Trends in IT Energy Efficiency – Reducing Cooling Costs and Increasing MTBF
– Principles of High Availability Design
– Q & A
3
3
Defining Business Continuity
– Maintaining SLA’s, RPO & RTO– How Many Nines are Necessary?
– Planned or Unplanned, Downtime is still Downtime
– Clearly define SLA’s, otherwise how you will ever achieve them
– As SLA increases so does the cost (Good, Fast and Cheap, Pick 2)
% Uptime % Downtime Downtime Per Year Downtime Per Week
98% 2% 7.3 Days 3 Hours, 22 Minutes
99% 1% 3.65 Days 1 Hour, 41 Minutes
99.8% 0.2% 17 Hours, 30 Minutes 20 Minutes, 10 Seconds
99.9% 0.1% 8 Hours, 45 Minutes 10 Minutes, 5 Seconds
99.99% 0.01% 52 Minutes, 30 Seconds 1 Minutes
99.999% 0.001 5 Minutes, 15 Seconds 6 Seconds
99.9999% 0.0001 31.5 Seconds 0.6 Seconds
– Data Protection + Application & Data Availability = Business Continuity
– Backup & Restore Down Time Concerns– Failed Backup’s
– Time to Recover & Backup Window
– Retention of data can outlast the life expectancy of media (Compliance)
– Restore Tests are Cumbersome with Offsite Managed Backups
– Tape Rotation and Management
– High Availability and Disaster Recovery Considerations– Protection for localized and catastrophic system and application failure
– Not all HA Solutions are created equal (Storage vs. Application)
– Multiplatform/OS Dependency & Visibility
– Configuration Drift (Why did My Failover Fail?)
– Complexity is the last thing you want during a DR scenario
– Manageability
4
Defining Business Continuity (cont…)
5
Business Continuity Options
6
Disked Based Data Protection
– Reducing Outage Windows– What does downtime really cost?
– Revenue, SLA’s & Image
– “It’s all in the Restore”
– Distinct advantage over Tape
– Snapshot backups can reduce system overhead
– Test & Development Provisioning
– Isn’t Disk more expensive?
– De-duplication (Where and When)
– Leveling the playing field against Tape Solutions
– Reduce Total Storage Need for Backups
– Less Bandwidth Consumption
– Reduces Infrastructure Costs
– Future Proofing
– Integrates with most Enterprise Backup Software Solutions
7
Cost Effective Storage Strategies
– Centralized Storage Management– Multiprotocol Access (NAS and SAN)
– Multitier Storage Options (The Right Storage for the Right Application)
– Heterogeneous Manufacturer Environments
– Array Based Features– Storage Reconciliation
– How much do I own vs. How much am I using
– Thin Provisioning & Does my file system support It?
– Storage forecasting (Physical and Virtual)
– Non disruptive upgrades?
– Disk drive hibernation (Energy Savings for Long Term Storage)
– “Out of the Puddle and Into the Pool”– Combining Storage Resources with Enterprise wide visibility
– Advanced features that are array agnostic
– Transparent Data Migrations when storage is EOL & EOSL
System & Service Virtualization
8
– Driving Factors– Improve Utilization & Management
– Reduce Administrative Costs
– Decrease Deployment Time
– Advantages– Management Capabilities
– Dashboard View of a expansive Server Environment
– Portability & Provisioning
– Reduction in Maintenance Costs
– Considerations– Virtual server sprawl (OS footprint vs. server footprint)
– Application visibility from embedded HA tools?
– What am I really trying Virtualize? Server or Service?
– Storage Impact on your SAN: Did I “Remove“ the cost or just “Move” it?
– Disruption factor, outage impact & multi-OS dependency (P & V)
9
– Virtualize without Compromise– Expand upon both Native and Third Party Virtualization tools
– Bare Metal Virtualization (Server)
– OS Based Virtualization (Service)
– Minimal Cost Options with significant ROI
– Storage Impact– De-Duplication of multiple OS images
– Ideal for Virtual Desktop
– Array Bases Replication does not account for Application Consistency
– Reconcile your Storage Needs: Application & Virtual Environemnts
– Centralize Reporting and Provisioning
– IP based Storage
– Keep Production Data and Production Traffic isolated
– Considerations– What does an outage now mean?
– Redirect Systems savings to software tools to help increase availability
System & Service Virtualization (cont..)
High Availability Concepts
10
– Localized Hardware Redundancy– OS mirrors & hot swap boot drives
– Boot from SAN (Less moving Parts)
– Redundant power supplies
– Removing Single Points of Failure
– Proper Airflow and Cabling
– Storage Availability– Multipathing (Dual HBA’s)
– Native vs. Third-Party
– Redundant Fabric/SAN
– Active/Active vs. Active/Passive Storage Controllers
– Non-Disruptive upgrades and storage growth
– Cloud (SAS) models
– Data migration support
– What happens when you change vendors?
– How transparent is it? What is my outage?
11
High Availability Concepts (cont…)
– Host Based Clustering (Local & Metro)– Native
– Easy to deploy (Integration)
– Learning curve can be less
– Limited coverage from applications
– Typically customizations are complex or unsupported
– Point solution that varies between all platforms
– FTE Resource Needs are Greater
– Third-Party
– Cross platform standardization (Operational Support)
– Central Management, reporting and visibility of all applications
– Physical to Virtual Clustering
– Cross Platform & multi-tier dependencies (Servers, Apps and Storage)
– Easily adaptable to custom applications
– Training Costs
– Implementation costs
– Reduce FTE Costs
– Local HA and remote DR in the same solution
Disaster Recovery Concepts
12
– Backups vs. Replication– Protection from Data Corruption vs. Resuming Business Processing
– One Does not relieve the need for the other
– G.I.G.O. Model
– RPO & RTO
– 24 Hours, 24 Minutes or 24 Seconds means the same solution
– Host Based vs. Array Based
– Heterogeneous Options
– Management and Visibility– Dove tail off of HA design (One tool for both)
– Integrate with storage solutions
– Active/Active design coupled with Virtualization
– Non-Disruptive testing procedures
– Bi-Directional
– Remediate Infrastructure Issues Proactively not Reactively
Latest Trends in IT Energy Efficiency
13
Current Issue:
50% of IT data center power consumption
comes from cooling the IT equipment
What if we could significantly reduce this value?
How:
Utilize Precision Cooling Technologies at the
rear of the IT cabinet over traditional computer room air conditioning methods
Advantages of Precision Cooling
14
–Save up to 95% on Cooling Costs
-Reduced footprint in data center
• Floor space efficiency (less racks required)
• Rack space efficiency (densification
-NYSERDA grant funding available
-New York State Energy Research and Development Authority
-No raised floor design required (Save $$)
-New technologies are refrigerant-based
• No water near electronics
-Enhanced server life
• Cooler equipment = higher availability
• More time between equipment refreshes
• Longer Duty Cycle
15
20. The Total Cost of Ownership
– Availability is a driven by a business need and financial cost is only one factor
19. Assume Nothing
– Availability is not necessarily bought from a vendor but architected by a customer
18. Remove SPOF”s
– Weakest Link Effect
17. Enforce Security
– Know who is doing what and where
16. Consolidate your Systems
– Go from the “Fog to the Cloud.”
– Redundancy can be superfluous.
15. Watch your Speed
– Ensure Availability efforts do not jeopardize performance (Monitor)
14. Enforce Change Control
– Mitigate User Error
13. Document Everything
– Don’t rely on FTE knowledge.
– Proper Audience
12. SLA’s
– Uptime, Hours of Service and Support
– Application Priority
11. Plan Ahead
– Anticipate overruns on projects and schedule Fire Drills.
HA Design Principles
HA Design Principles (cont…)
16
10. Test Everything
– Backups, Restores and DR Scenarios
9. Separate Your Environments
– Dev, Test & Staging vs. Prod & DR (Use Snapshots and VM’s)
8. Learn from History
– Audit system state information to diagnose root cause of downtime
7. Design for Growth
– Gold Fish Effect
6. Choose Mature Software
– Avoid Bleeding-edge whenever possible (Lack of R & D)
5. Choose Mature, Reliable Hardware
– MTBF and Parts replacement is key to maintaining Availability
4. Reuse Configurations
– “Rinse and Repeat” Do not design on the Fly
3. Exploit External Resources
– NYSIT Forum, User Groups, Vendor Conferences
2. One Problem, One Solution
– Software to address Software issues and Hardware to Address Hardware issues
– When all you have is Hammer…..
1. K.I.S.S.
– Remove guess work by automating daily tasks & routines
– Review Business Process first before deciding on a technology
– Avoid ambiguity by clearly identifying responsible parties
17
Questions
Joseph B. D’Angelo
Serverware Corp.
585-785-6100
www.serverwarecorp.com