Upload
talor
View
35
Download
0
Tags:
Embed Size (px)
DESCRIPTION
NoCOUG Fall Conference 2005. High Availability & Disaster Recovery: A 360 o View Alok Pareek. Agenda. Introduction High Availability - 2005 Industry Shift from MTTF to MTTR Understanding transaction failure models Evaluating HA technologies Real-World HA Examples About GoldenGate - PowerPoint PPT Presentation
Citation preview
NoCOUG Fall Conference 2005
High Availability & Disaster Recovery: A 360o View Alok Pareek
Agenda
IntroductionHigh Availability - 2005Industry Shift from MTTF to MTTRUnderstanding transaction failure modelsEvaluating HA technologies Real-World HA ExamplesAbout GoldenGateQuestions & Answers
Speaker Introduction/Background
GoldenGate Software ArchitectTransactional Data Management and High Availability Solutions
10 years in Oracle kernel developmentData Storage TechnologiesRecovery and High Availability GroupHigh Speed Data Movement
Key FeaturesCross platform/Heterogeneous transportable tablespaces (10g)Multi threaded redo generation (9.2)Implementation of multiple block size caches (9.0)(TSPITR) Tablespace point in time recovery (8.0)Owner of LGWR process (redo generation) algorithms (8i
10.2)
High Availability, Disaster Recovery
DefinitionRatio of system uptime to sum of uptime and downtime
Availability = MTTF/(MTTF+MTTR)
ChallengesAddressing Performance vs. Reliability in computer systemsHardware Faults, Software Bugs, Human errors are realities in any
complex system deploymentEnterprise applications need to function 24x7Disasters are no longer a distant threatInadequate planning to handle outages
Shift in industry (and academic research) focusFrom fault tolerance to reducing MTTR
HA/DR – Systematic View
Unplanned outage
System Failure
Data Failure
System Changes
Data Changes
Active
Database
Planned outage
•Physical Media•Logical corruption
•Node death•Power failure
•Upgrades•Migrations
•Maintenance
1
2 3
1) High Availability Concerns – No Outage
Active
Database
1
• Performance
• DSS vs. OLTP conflicting requirments
• Mixed workload
• Data validation
• Data Transformation
2) Availability Concerns – Unplanned
Unplanned outage
System failure
Data failure
Database
•Database Restore/Recovery
•RAID
•Shared Disk Clusters
•Standby database
Common Approaches
Understanding Database Failures
Failure pointsStatementProcessInstanceDatabaseSite
Failure typesPhysical (Media, corruption, inconsistency amongst redundant
copies)Logical (Incorrect DML, out-of-synch, accidental table drop)
Failure HandlingAutomaticManual
3) Availability Concerns – Planned Outage
System Changes
Data Changes
Database
Planned Outage
•Selected windows of downtime
•Phased approach to maintenance
Key HA Evaluation Criteria
Availability Is the failover/DR solution available for real use?
MTTR (RTO) In the event of a failure, how soon can the data be recovered?
Performance Speed and support for high transaction volumes
Data Loss (RPO) What is the impact of an unplanned outage in terms of lost data?
Manageability Configuration, Install, Monitoring
Impact on Deployed Systems
How intrusive? What is the impact on data itself?
Cost Licensing, Maintenance, Deployment
Differentiating HA Technologies
Conventional Backup/Recovery
RAID multiple hard disks behaving as a single large fast drive (mirrors/stripes/duplexing/parity)
Snapshots
Block Level Database Replication
Change Level Database Replication
Remote Mirroring Solutions
Transactional Data Management
Roll Forward / File Protection High Availability and Disaster Recovery
HA Technologies & Tradeoffs
Block based database replication Standby kept in constant recovery (inactive) mode
Useful for strict disaster recovery only, not HACannot be used for reporting in recovery modeNo write access for distributed load balancingApplication response times suffer after failoverCannot address availability across heterogeneous systems
Change based database replication Trigger or log based
Not optimized for real time performanceIntrusive, ComplexCannot address availability across heterogeneous systems
HA Technologies & Tradeoffs
Remote mirroring solutions Volume managers maintain mirrors of local writes on a set of remote volumes
Useful for file protectionPhysical distance to remote volumes is a critical limitationNo protection from logical corruption, or storage stack corruption
Message based logical writes sent by primary host over IP to remote hosts (synchronously/asynchronously)
Write ordering must be maintained by primary hostRemote volumes are standby-only, applications cannot access themNo protection from logical corruption
Hardware basedStorage arrays propagate IOs to storage arrays at a secondary siteSecondary arrays are inaccessible during replicationNo protection from logical corruptionOnly useful for block availability during DR
HA Technologies & Tradeoffs
Transactional Data Management Addresses low-latency in HA hybrid computing environments (built on 1 Safe)Management of transactional streams -- captures, transforms, routes, delivers and verifies data transactions in real time
Real time, heterogeneous, data integrity, low impactUse cases for HA, DR, data integration, distributed computingNot for file-level replication
Log-based data
capture
(can be selective)
RouteTransform (if
needed)
Source Target
Apply
Sub-second latency
HA/DR – Solution Examples
Unplanned outage
System Failure
Data Failure
System Changes
Data Changes
Active
Database
Planned outage
•Physical Media•Logical corruption
•Node death•Power failure
•Upgrades•Migrations
•Maintenance
1
2 3
Real-World HA Configuration (Active)
Primary “Master”
Secondary “Master”
• Uni/Bi-directional configuration – if one system fails, the other is immediately available and ready to take over.
• For…• Highest Availability• Maximized ROI on hardware (transaction balancing)•Data Centers with Active/Passive
• Example areas:• 24x7 (ATMs, Online Banking)• Online Retail•Real-time Analytics, Warehousing
• Uni/Bi-directional configuration – if one system fails, the other is immediately available and ready to take over.
• For…• Highest Availability• Maximized ROI on hardware (transaction balancing)•Data Centers with Active/Passive
• Example areas:• 24x7 (ATMs, Online Banking)• Online Retail•Real-time Analytics, Warehousing
Active
Live/Active
Real-World HA Configuration (Transaction Off-Loading)
Processing Commit Transactions
Lookup Query Handling
• Improve availability and performance of transaction processing by offloading query load to lower-cost DBs/platforms
• For…• Horizontal scalability• Improved performance
• Example areas:• Online Reservations• Online Lookups
• Improve availability and performance of transaction processing by offloading query load to lower-cost DBs/platforms
• For…• Horizontal scalability• Improved performance
• Example areas:• Online Reservations• Online Lookups
Active
Live/Active
Real-World DR Configuration (Failover)
Unplanned outage
System failure
Data failure
Database
Active
Failover Database
• An HA implementation that captures and applies data to a failover system in real time.
• For…•Fast failover (No restore)•Do root-cause analysis later!•Surgical Repair (Dynamic,
Selective undo)
•Example areas:•24x7/mission-critical
applications•Strict SLA requirements
• An HA implementation that captures and applies data to a failover system in real time.
• For…•Fast failover (No restore)•Do root-cause analysis later!•Surgical Repair (Dynamic,
Selective undo)
•Example areas:•24x7/mission-critical
applications•Strict SLA requirements
Real-World Configuration (Switchover)
Current Database
Planned Outage
System Changes
Data Changes
• Zero-Downtime Migrations• Rolling Upgrades• Zero-Downtime Maintenance
• For…•24x7 availability•Reduced windows for system
maintenance
• Example areas:•Can’t afford downtime to do in-
place upgrade
• Zero-Downtime Migrations• Rolling Upgrades• Zero-Downtime Maintenance
• For…•24x7 availability•Reduced windows for system
maintenance
• Example areas:•Can’t afford downtime to do in-
place upgrade “Switchover” Database
About GoldenGate Software
Established, Loyal Customer Base
Leading Industry Solutions
250 customers... 1500+ solutions implemented… in 35 countries
18,000 Node ATM Network with 24/7
Availability
Saving $ millions with real-time DW
and zero downtime migrations.
Database tiering handles average of
300,000 updates/hour, peaks at 800,000/hour
Achieving paperless enterprise for this
visionary healthcareprovider
GoldenGate Software is a privately held software company that offers Transactional Data
Management solutions.
GoldenGate & Transactional Data Management
Provides guaranteed capture, routing, transformation, delivery, and verification of data transactions across heterogeneous environments in real time.
TDM must be:
Real timeMoves with sub-second latency
Heterogeneous Moves transactions across different databases and platforms
TransactionalMaintains transaction integrity
GoldenGate differentiates on:
PerformanceHandles thousands of transactions per second with very low impact on IT systems
Extensibility & FlexibilityOpen architecture to meet demanding customer needs and data environments
ReliabilitySupports continuous operations and availability
How GoldenGate Works
Trail files: Stages and queues data for routing.
Delivery: Applies transactional data with guaranteed integrity.
Route: Data is compressed, encrypted for routing to targets.
Capture: Committed changes are captured as they occur by reading the transaction logs.
GoldenGate Veridata™: Reports on data discrepancies between in-use DBs
TDM & HA Evaluation Criteria
AvailabilityStandby available for reporting, logical repair, load balancing, failover with warm cache
MTTRNo restore requiredProportional to runtime cost of transactions in last log write (~ seconds)
PerformanceNear zero time latencyShip only committed transactions
Data Protection / Loss
Redo validation using SQL ApplyNo Loss (read access to last IO in current log)
ManageabilityDirector GUI, CLI, STATS
ImpactLow impact on deployed systemsMetadata outside the database
Questions & Answers
Contact Information:Alok Pareek
www.goldengate.com(415) 777-0200
301 Howard Street, Suite 2100, San Francisco CA 94105
Lag points address Logical failures
One-to-One One-to-Many
SourceSource
Target
Target
(1 day lag)
Target
(1 hour lag)
Target
(current)