Upload
scott-simmons
View
39
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Continuous Availability
Citation preview
© 2014 IBM Corporation
Raise Expectations for the Always On Enterprise
Bertrand Portier - Executive IT Architect,
Member of the IBM Academy of Technology
Scott Simmons - Executive IT Architect,
Member of the IBM Academy of Technology
Please Note
IBM’s statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM’s sole discretion.
Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision.
The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code or functionality. Information about potential future products may not be incorporated into any contract. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user’s job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
No Words Needed ….
The Need for Always On
Background & Findings
Recommendations
Design Principles & Patterns
Agenda
Natural DisastersIT Failures
Human “Attacks”Human “Mishaps”
The Imperative for Continuous Availability
Flat world - no time for planned outage
Social - reputation damage
Mobile - raised expectations in an hyper-connected world
Analytics - increased availability demands with real-time analytics
Today’s environment and growth initiativesaccentuate the need for Always On.
The Need for Always On
Background & Findings
Recommendations
Design Principles & Patterns
Agenda
8
What is the IBM Academy of Technology?
• The IBM Academy of Technology is a society of about 1,000 technical leaders in IBM organized to
– advance the understanding of key technical areas,
– improve communications in and development of IBM's global technical community, and
– engage clients in technical pursuits of mutual value.
14 case studies
• Case studies represent different industries and patterns of continuous availability.
• Broad set of hardware technologies, infrastructure implementations, software vendors and solutions.
• Solutions represents single site, dual site, and multi-site solutions.
Key Findings
• Focus on the application implementation for Continuous Availability
• Design for non-disruptive maintenance• Databases is the key Continuous Availability focus area• Use scale out for certain workloads, instead of clustering• Mainframe GDPS is a success story• No solutions are “Bullet proof”• Governance is mandatory
Our approach to the Always On challenge focused on client case studies
Not all business functions are fit for Always On.
Continuous Availability lies in the ability to:
1. Transparently withstand component failures
2. Withstand a catastrophe transparently
3. Introduce changes non-disruptively
Continuous Availability Principles
• The speed of light is a limiting factor– Replication can happen at ½ speed of light– Data consistency constraints
• Brewer CAP theorem holds true– Absolute data consistency (RPO=0) requires
metro distances (100km or less)– Only “eventual data consistency” is possible
with out of region
• Continuous Availability is perceived as expensive – Reality: Between 2* and 8* cost of HA*– But if HA and DR are considered,
Continuous Availability doesn’t look as expensive…
Myth or Fact? … No surprises!
Architecture Description
Active / Standby •Traditional DR or warm standby environment• RTO = hours to days• RPO=0?
Partitioned Active (No WAN Clustering, unidirectional DB replication w/failover)
•Each site application cluster runs independently, as do the DB’s. Users are directed to one or the other sites. DB’s send records to Source of Record DB• RTO=hours• RPO=0?
Active / Query(WAN replication, unidirectional DB replication w/failover)
•Each site application cluster live, reads performed from local DB, writes performed on primary DB only. aka active/query• RTO = minutes to hours• RPO=0 to seconds
Active / Active(WAN replication & bidirectional DB replication)
•All applications uncoupled and databases read writeable• RTO = seconds to minutes• RPO = 0 to seconds
Active / Standby
DB RWDB RW
Active / Active
DB RODB RW
Active/Query
Partitioned Active
Assess Application Patterns (App vs Data Maturity)
“Continuous Availability
infers that solutions consumers wish to use
are available whenever they desire to use them
... independent of the time of day, or day of the week.”
In other words, Continuous Availability means “Always On”
Factors in Assessing “Always On”• Perceived Availability• Response Time
Conversation is moving away from # of 9’s or ROI and towards competitive advantage.
Chief Marketing Officer becomes a sponsor for Continuous Availability
Further reading: IBM Redguide: The business aspects of Continuous Availability
What does Continuous Availability mean to a non-technical person?
Continuous Availability becomes part of the organization’s DNA.
• Any employee can link Continuous Availability to the mission.
• Continuous Availability is part of all decisions.
• Continuous Availability is the only accepted design strategy.
The end goal for Always On Adoption is changing the culture.
The Need for Always On
Background & Findings
Recommendations
Design Principles & Patterns
Agenda
Our Recommendations: What Organizations Should Consider for Always On
17IBM Confidential
IBM Confidential
Need Action Owner
Business Demand Internal AnalysisBusiness Case
LOB Sponsors3rd Party Research
Awareness ConferencesLunch & LearnPapers
CTOTechnical Team
Governance Create enduring governance structure TBD
Strategy Define business strategy for CADefine technical strategy for CA
Strategy TeamTechnical Leadership
Developing an Always On Program Communicate with the product teamsIncentivesEducationCertification
Governance BodyProduct teams
Commission a team to look at the business case for Continuous Availability (for your company to invest in Continuous Availability)
• Market Analysis• This effort is about applications and end-to-end environment, not infrastructure.• Enabler to Social, Mobile, Analytics, and Cloud.• Example drivers:
• Exceptional Client Experiences• Brand reputation
• Continuous Availability needs for CMO-led initiatives for these clients.• Understand ROI model for Continuous Availability.• Create a compelling value proposition for Continuous Availability.
Evaluating Market Demand for Always On
Publishing internal technical point of view (e.g., business aspects, Ref Arch, Implementing Continuous Availability)
Host lunch and learn internally and with clients
Develop and support client workshops
Participate in internal and external conferences
Always On Awareness
Create an enduring Continuous Availability leadership and governance structure.
• “Volunteer" Centre of Competency model is not enough.
• Support client experiences of access to service of choice, anywhere, any time.
• Facilitate an end-to-end approach for Continuous Availability.
• Lead with:• Policies & Standards• Principles & Guidelines• Product/Offering/Service Road Map Prioritizations
Always On Governance
• Depth and breadth of capability in an integrated and easily consumable hardware and software platform with patterns to support client success with Continuous Availability
• Enterprise qualities of service as a core principle of your company's brand
• Using cloud as a springboard for Continuous Availability enabling on premise and off premise cloud integration to simplify access and to provide enterprise resiliency
• Advance and compliment Continuous Availability approach by strong partnerships with software partners, hardware partners and service integrators to ensure a comprehensive Continuous Availability approach
• Providing leadership and innovation across standards/open solutions with ISO, OMG, OASIS, OpenStack and other key organizations driving open resilient solution approaches
• Continue to deliver on an evolving ecosystem built around the platforms based on our industry leading hardware, software and services
Always On StrategyAddressing the End-to-End Aspects of Continuous Availability
The Need for Always On
Background & Findings
Recommendations
Design Principles & Patterns
Agenda
1.Core Principles – transparently withstand component failures, provide non-disruptive changes, and enable disaster transparency
2.Think Differently – legacy architectural practices no longer apply
3.KISS – Keep It Simple Stupid, complexity adds obfuscation and prolonged service recovery
4.Concurrent Versioning – non-disruptive changes is the ability to run two versions at once
5.Continuous Operations – design in platform concurrency to enable non-disruptive changes
6.Design each “cloud” identically – best practices should be followed per “cloud”, then interconnect
7.Fail Small – everything breaks, minimize the impact in design
8.Virtualize Nearly Everything – Virtualization provides flexibility and mobility, both essential
9.Automate Nearly Everything – avoid human error and inconsistency
10.Design For Failure – know how it works, know how it breaks and how to mitigate it’s impact
11.Applications Must be Designed for Failure – fail gracefully, minimize impact to consumer
12.Avoid HA Takeover – service parallelism (clustering) is more reliable and faster
13.Availability is provided by peer “clouds” – failure in one “cloud doesn’t” impact the others, the fault domain is isolated to each “cloud”, service is still functional in the other(s)
14.Share Nothing – each cloud must be able to provide the business service independently
15.Availability Zones – CA, near CA, and HA environments have their own architectural requirements and change windows, keep them separate, share nothing
Guiding Principles for Continuous Availability Design
16.Add Global Traffic Management – routes consumers to the best “cloud” to consume the service. Domain Name Service based, closely coupled with DNS service
17.If application must maintain state across “clouds”, use in memory application grid – fast & tolerant and sessions must be small to take advantage of this technology, else don’t use sessions beyond individual “cloud”
18.Add Application Level Data Replication – capture and apply changes to all peers. In order to provide fast failover or transparent service bypass, logical data replication is required to avoid human tasks. Bi-directional peer-to-peer allows writes anywhere, but OoR induces eventual data consistency.
19.Never stretch a cluster across “clouds” – extends fault domain beyond individual cloud
20.Include Out of Region – must mitigate 3-F’s (Fire, Flood, and Fools) outside region
21.Don’t Forget Security – the “Fools” can cause unexpected damage
22.Don’t Forget Performance Engineering – Development must embrace performance engineering. Business must make development and operations aware of any planned media events that may bring “flash mobs” very early. Applications must be efficient. IT must be sufficient.
These guiding principles build upon the many guiding principles common in HA and DR design and are here to guide practitioners beyond core HA design.
Guiding Principles for Continuous Availability Design (cont.)
Architecture Description
Active Standby Metro / OoR DR 300%? Capacity
• HA: 100% Active, 100% Standby, <100% DR• RTO = minutes within metro, hours to days for OoR DR• RPO=0?
2 Active Metro / OoR DR300%? Capacity
• nCA: <100% Active, <100% Active, <100% DR• RTO = seconds within metro, hours to days for OoR DR• RPO=0?
2 Active Metro / OoR Query300%? Capacity
• nCA: <100% Active, <100% Active, <100% Standby• RTO = seconds within metro, minutes to hours to warm OoR• RPO=0 to seconds OoR
3 Active OoR (or 2-Active metro, Active OoR)150% Capacity
• CA: 50% Active, 50% Active, 50% Active• RTO = seconds to minutes• RPO = 0 to seconds Oor• RISK = Eventual Data Consistency 2 Active/Query
OoR3 Active w/OoR
2 Active w/OoR DR
Active Standby w/OoR DR
Design: Availability Platform Patterns
Active-Standby Metro Pair with OoR Disaster Recovery
Active Active Metro Pair with OoR Disaster Recovery
Active Active Metro Pair with OoR Query/FastFailover
Active geographically distributed
“This isn’t just a change in tools.
It’s a change in mindset and organizational culture”
– Ginni Rometty, Council on Foreign Relations speech (March, 2013)
A SOCIAL BUSINESS is an organization that is engaged, transparent and nimble
Q&A
31IBM Confidential
We Value Your Feedback
Don’t forget to submit your Impact session and speaker feedback! Your feedback is very important to us – we use it to continually improve the conference.
Use the Conference Mobile App or the online Agenda Builder to quickly submit your survey
• Navigate to “Surveys” to see a view of surveys for sessions you’ve attended
32
Domain Area/ IBM Group # Recs
Organizational 4
Product Des/Dev (General) 6
Product Des/Dev (SWG) 4
Product Des/Dev (C&SI) 8
Product Des/Dev (AIM) 8
Product Des/Dev (IM) 17
Product Des/Dev (Rational) 3
Product Des/Dev (ICS) 2
Product Des/Dev (Pure App) 3
Product Des/Dev (STG) 9
Product Des/Dev (Cloud) 6
Client Success 13
Cost/Economic 2
Approach
Define Challenges
Recommend Approaches
Prioritize Key Requirements
Raise to executives for sponsorship
Developing a CA Program - Example:IBM recommendations classified by domain and assigned to a receiving organization.
IBM.com’s electronic support services
System Z Continuous Availability
Legal Disclaimer
• © IBM Corporation 2014. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained
in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.
• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.
• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.
• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.
• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.
• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.
• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.
• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.
• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:UNIX is a registered trademark of The Open Group in the United States and other countries.
• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.