46
Everything You Need to Know About Oracle Exadata Backup and Recovery: Best Practices Andrew Babb, Consulting Member of Technical Staff, Oracle Donna Cooksey, Principal Product Manager, Oracle Harpreet Singh, Vice President, Database Management, Fidelity Investments

Everything You Need to Know About Oracle Exadata Backup ... · Everything You Need to Know About Oracle Exadata Backup and Recovery: Best Practices Andrew Babb, Consulting Member

Embed Size (px)

Citation preview

Everything You Need to Know About Oracle Exadata Backup and Recovery: Best Practices Andrew Babb, Consulting Member of Technical Staff, Oracle Donna Cooksey, Principal Product Manager, Oracle Harpreet Singh, Vice President, Database Management, Fidelity Investments

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 2

The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3

Program Agenda

Evolving IT Infrastructure

Recovery, Recovery, Recovery

Architecting Your Backup Infrastructure

Customer Case Study – Fidelity Investments

New Modern Cloud Paradigm

Summary and Q & A

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 4

Evolution of Data Protection

IT consumers are increasingly involved in technology decisions – The flexible, fast moving opportunities of the “3rd Platform” translate to

more IT initiatives being driven by Line of Businesses (LOB) – Applications, storage, servers … even data protection?

Technology in stealth mode makes a sound data protection even more important !

Business Requirements Meeting IT Head-on

Greater Complexity Causing More Data Center Downtime: http://www.datacenterdynamics.com/focus/archive/2012/09/greater-complexity-causing-more-data-center-downtime-0

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 5

Critical Databases Get Poor Protection Today

What Business Wants Never lose business data Keep critical apps available

What IT Wants Private and public cloud solution Ensured end-to-end protection

What Business Gets Data loss on restore, typically full day End-user slowdown during backup

What IT Gets Sprawl of non-scalable solutions Uncertain protection, poor visibility

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6

Primary Causes of Downtime 2012 IOUG Survey – Enterprise Data and The Cost of Downtime*

Human Error

Storage Failure

Application Errors

Network Outages

Server Failure

Recovery plan / Training / Oversight

Interoperability / Scalability / Performance

Failover / Fallback capabilities

System Monitoring

Unplanned Downtime

*http://www.oracle.com/us/products/database/2012-ioug-db-survey-1695554.pdf

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 8

Recovery, Recovery, Recovery

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 9

What are Your Recovery Requirements? Four Key Points to Define

1 2 3 4

Recovery Point Objective (RPO)

Retention Period

Recovery Time Objective (RTO)

Disaster Recovery (onsite/offsite)

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 10

Group Databases Into Protection Tiers Basic Grouping Strategy - Example

Category Gold Silver Bronze RTO Seconds < 6 Hours Up to 24 hours

RPO Current Up to 3 hours* Up to 6 hours*

Backup Retention

Critical Restores Up to one week One day One day norm / not

critical

Retention 7 Years 6 months 1 month

DR / Long-term

Two sites for one week

Offsite copy within 3 days

No specific DR requirement

*Stay tuned to the new paradigm.

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 11

Exadata Environments

The criticality and workloads of typical Exadata databases makes recovery strategies especially important:

– Batch load / NOLOGGING operation went south – Long-term, periodic archival backups (keep forever / until) – Application patches and upgrades – Backing out a bad transaction

Common Restore Scenarios / Planning

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 12

Oracle Recovery Strategies Complementary and Integrated Technologies

Category Technology / Solution Recovery Time Objective (RTO)

Recovery Point Objective (RPO)

Physical Data Protection • Recovery Manager (RMAN) • Oracle Secure Backup (OSB)

Days/Hours As of last backup

• Data Guard or Active Data Guard Minutes/Seconds Current

Logical Data Protection • Flashback Technologies Hours/Minutes Minutes

Recovery Analysis • Data Recovery Advisor (DRA)

• Minimizes time for problem identification & recovery planning

Optimized Optimized

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 13

Oracle Logical Data Protection Technologies

Flashback Technologies are a suite of logical error investigation and correction capabilities built-in the Oracle database:

– Error investigation: Flashback query, version query and transaction query – Error correction: Flashback database, table, drop and transaction

Flashback Database operates on physical data blocks and is similar in effect to point-in-time recovery - other Flashback features operate at logical level

– Only Flashback feature which must be explicitly enabled by user as it generates logs

In applicable scenarios, Flashback features are more efficient than media recovery

Complements Physical Data Protection Strategy

Flashback Technologies Should be part of ALL Recovery Plans !

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 14

Restore Points

Restore point is a user-defined name assigned to an SCN or specific point in time – a user-friendly “bookmark”

FLASHBACK DATABASE TO RESTORE POINT 'before_upgrade';

User-defined restore point names may be used as aliases for SCN with the following supported commands:

– RECOVER DATABASE and FLASHBACK DATABASE commands in RMAN – FLASHBACK TABLE in SQL

What They Are and Why Use Them

There are two types of restore points – Normal and Guaranteed Guaranteed must be explicitly deleted by the user Normal age out of the control file

For archival backups, use the PRESERVE key word to retain the restore point until backup expiration

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 15

Flashback Database VS Point-in-Time Recovery Different Approaches and Multiple Use Cases

Flashback Database Traditional Point-in-Time Recovery Rewinds the database to SCN Restores then recovers the database to SCN

Advantages

• Significantly faster than point-in-time recovery - No restore and only limited redo needed

• Useful during database upgrades, application deployments, and efficient alternative to rebuilding a failed primary database after a Data Guard failover

• Provides continuous data protection

• Compatible with restore points

Works at the database or tablespace level

No additional logs necessary beyond redo

Compatible with restore points

Disadvantages

Requires Flashback logs and associated storage

Works at whole database level only

Flashback logging has some (minimal) overhead on database server

Time consuming especially for larger databases

Database is down until fully recovered

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 16

Data Recovery Advisor (DRA)

Oracle Database tool that automatically diagnoses data failures, presents repair options, and executes repairs at the user's request

Determines failures based on symptoms – Failure Information recorded in diagnostic Automatic Diagnostic Repository (ADR) – Flags problems before user discovers them, via automated health monitoring

Intelligently determines recovery strategies – Aggregates failures for efficient recovery, presents only feasible recovery

options and indicates any data loss for each option Can automatically perform selected recovery steps Accessed via RMAN or EM

Reduces Downtime by Eliminating Confusion!

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 17

How Good is Your Backup Infrastructure?

1. Documented recovery plan for database and object level recovery 2. Perform periodic (i.e. regularly) recovery tests for various recovery

scenarios: 1. Full database 2. Objects 3. Control file

3. Refresh test environments with RMAN 4. If hardware isn’t available to perform full database recovery tests,

use RMAN RESTORE VALIDATE

You Never Know – Unless Your Periodically Test It !

Job Security Tip # 1 – Successful recovery is all that matters!

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 18

Architecting Your Backup Infrastructure

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19

Full Backup Two types of RMAN full backups:

Image copy – Disk only Same size as the database less temp

files Backupset – Disk or tape

Smaller than image copy full Can be compressed and/or encrypted

by RMAN Full backup consumes more overhead on the

production server and take more time than an incremental backup

Restoration may be faster than an incremental

RMAN Traditional Backup Strategies

Full / Incremental Schedule Backupset backups – Disk or tape Typical schedule – Week full with daily incremental

backups Typical retention:

– Days to weeks – On disk – Weeks to years – On tape – Full and corresponding incremental backup

should be treated as a group • Reduces backup window and overhead on servers • Ideal with low-medium change rate e.g. <20% • Database must be in archived log mode

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20

RMAN Incremental Forever Strategy

Oracle Database 10g Release 2 Enterprise Edition > Incremental forever after initial full image copy Full image copy is rolled forward on user-defined schedule

• Roll-forward / merge does incur overhead on server • Offers SWITCH TO COPY capability

Typical retention – One to seven days Backup full or incremental to tape

Incrementally Updated Backups

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 21

Processing Offloaded From Database Nodes

Block Change Tracking (BCT) enables fast incremental backups – RMAN tracks 32k data file sections which include a changed block(s) – During an incremental backup, RMAN scans these 32k file sections to

determine which block(s) have changed Only these changed blocks are included in the incremental backup

Incremental Backup Scans Occur on Exadata Storage Cells

Note: Incremental backup without Block Change Tracking (BCT) enabled – all database blocks are scanned to determine what has changed

Database Server Exadata

Scan of blocks occurs on the database server

Scan of blocks is offloaded to the Exadata Storage Cells

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22

Backup of Compressed Data

Compressed data remains compressed in the backup – This data will not benefit from further compression during the

backup (e.g. RMAN backup or tape drive compression) – Deduplication software cannot deduplicate compressed data

Effects on Sizing and Processes HCC Data OLTP Compressed Tables

SecureFiles Compressed/Deduplicated

RMAN backup compression is effective on non-compressed database files Avoid using RMAN backup compression on HCC tablespaces by separating the

backups as shown below: Restore is no different than if the backups had not been separated

CONFIGURE EXCLUDE FOR TABLESPACE historical_data; CONFIGURE COMPRESSION ALGORITHM 'low’; BACKUP TABLESPACE historical_data; BACKUP AS COMPRESSED BACKUPSET database;

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23

Protecting Exadata Operating System Files

On the Exadata Storage Cells, the internal USB stick provides the backup On the Exadata database nodes, backup the operating system(OS)

files in the same manner as with any other database server

Please refer to the documentation for more information: http://wd0338.oracle.com/archive/cd_ns/E13877_01/doc/doc.112/e13874/maintenance.htm#CHDIDGAI

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 24

Exadata Backup Targets

20 – 25 TB / hour

All Exadata smart features

Considerations - Performance and Cost Trade-offs

Highest Performance High Performance and Added Flexibility

Cost – Varies with hardware configuration

Exadata Storage

Exadata Storage Expansion Rack

ZFS Storage Appliance (ZFS/SA)

StorageTek Tape Library

27 TB / hour Fastest Backup and Restore ILM Historical Archive Second DATA2 Disk Group

13 TB / hour Backups of database &

non-database files Snapshots Clones

9 TB hour* Backup of database and

non-database files Offsite Backups Vaulting

Note*: Backup Rate limited by number of tape drives – 8 x T10000C Drives

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 25

Oracle-Integrated Backup to Disk and/or Tape Multi-media Strategy: Disk-to-Disk-to-Tape (D2D2T)

Fast Recovery Area

RMAN Disk Backup

Backup to Tape BACKUP RECOVERY AREA;

BACKUP BACKUPSET;

D2D2T Exadata

StorageTek Tape Library

• Fast Recovery Area should reside on Exadata storage – slower storage could degrade production database performance

• Online redo, archived logs, Flashback logs, controlfile ZFS Storage Appliance (ZFSSA)

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26

Expanding Exadata Environments Connectivity Considerations

FRA

RMAN Disk Backup

Exadata

ZFS Storage Appliance (ZFSSA)

FRA What happens when a 2nd Exadata is added?

InfiniBand

What about a 3rd

Exadata?

FRA

10Gigabit Ethernet

The two Exadatas MUST be configured with different InfiniBand Subnets.

The 3rd Exadata would be connected via 10Gigabit

Refer to the MAA white paper: http://www.oracle.com/technetwork/database/features/availability/maa-wp-dbm-zfs-backup-1593252.pdf

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27

Customer Case Study – Fidelity Investments

Oracle Open World

Exadata Backups

September 24, 2013

Harpreet Singh Vice President, Database Management Fidelity Investments

29

Transition To Exadata – A Huge Success!

Challenges with traditional infrastructure • 300TB of storage with over 60% annual growth rate • Performance challenges • Cost reduction pressures • Need to make failover/recovery more robust

Benefits gained with Exadata • 42x performance gains for reporting & 40% for OLTP • Reduced storage by 30% using compression • Consolidated physical servers from 10 to 4 • Reduced direct/indirect chargebacks by 30% • Significantly improved failover, backup & recovery strategy

30

Exadata Architecture

31

Pre-Exadata Backup Challenges

Over 60% annual data growth rate

Business needs growing and becoming more complex

Expensive software/hardw

are licenses

Costly to keep backups on the

disk

Backups hurting

database performance

Complicated recovery with “no-logging”

Concerns around non-logical DR software

32

Fundamental Data Protection Strategy

1st Line of Defense • Flashback: 48

hours • data deletion • logical corruption • user errors

2nd Line of Defense • Disk Backup: 24

Hours • application • system

3rd Line of Defense • Standby Database

(DR) • Building/site, region • HW failure

Last Line of Defense • Tape: 35 Days

• Offsite • multi-site failures

33

Pros Faster recovery Data recovery from tables, schema, or entire database Roll database back and forth repeatedly within the

flashback window for complex data restore

Cons Same location as production

– No protection from storage failure No protection from physical corruption

Flashback Disk Backup Standby Database Tape Backup

• Oracle Flashback Database • Primary and Standby Sites

Flashback

Retention Period: 48 Hours Restore Time: < 1 Hour Space Used: 300GB

34

Flashback Disk Backup Standby Database Tape Backup

Pros Protect against physical/logical database corruption Faster backup and restore Minimal overhead to the production database

Cons Shorter protection window (24 hours) Same location as production so no protection from DR

or catastrophic storage failure

• Exadata Fast Recovery Area • Incrementally Updated

Disk Backup

Retention Period: 24 Hours Backup Rate: 1.2 TB/hour Restore Rate: 1 TB/hour Type: RMAN

Online Daily Normal Redundancy

35

Flashback Disk Backup Standby Database Tape Backup

Pros Great for any data recovery when combined with Flashback

Database Complete data protection if primary site is lost Protection from physical corruption Can be turned into snapshot standby database temporarily

and used for QA/Dev database refreshes through RMAN

Cons Resources (another set of servers/storage)

• Data Guard • Asynchronous • No Delay Apply • 48 Hour Flashback Database

setup • 700 miles between Primary

and Standby sites

Standby Database

36

Flashback Disk Backup Standby Database Tape Backup

Pros Longer term offsite retention than disk and standby Media is relatively cheap

Cons Slower backup and restore than disk Media is less reliable

Tape Backup Retention Period:

35 Days (Offsite)

Channels: 2-4 Nodes: 1 Backup Rate: 1TB/hour (2 channels) Restore Rate: 800GB/hour (2 channels) RTO: 3 Days Type: RMAN

CommVault Archived Redo Logs Retention

3 Days on disk

Archived Redo Logs Backup

Every 30 minutes

37

Planning a Comprehensive Backup Strategy

• Consider full backups once a week with daily incremental

Determine disk backup strategy

• Implement Oracle suggested RMAN backup strategy as it is great protection against data loss

Develop tape backup process

• At least annually Test different restore processes

• Should be centrally managed Consolidate tape backup system

38

Implementation Recommendations

Optimal performance • Configure Exadata backup over

InfiniBand for better throughput • Configure number of channels

based on database size and SLAs • Use one RMAN channel per tape

drive for better throughput • Enable block change tracking for

fast RMAN incremental backups

Data protection and disaster recovery • Backup Archived Log every 30

minutes for better data protection • Encrypt the data before writing to

tape for data security • Set-up Flashback on both primary

and standby databases • Utilize Data Guard broker

Monitoring • Use Oracle Enterprise Manager to

monitor: • Disk backup • Tape backup • Data Guard • Flashback

39

Summary

Have clear and well communicated recovery SLAs Build your strategy around the business needs Revisit a well-documented, multi-level strategy

periodically Be conservative and prepare for the worst Test Practice

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 40

The New Modern Cloud Paradigm

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 41

Oracle Database Backup Logging Recovery Appliance

Please refer to Oracle.com for additional information: http://www.oracle.com/us/corporate/features/database-backup-logging-recovery-appliance/index.html

Announced at Oracle OpenWorld 2013

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 42

Summary and Q&A

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 43

Oracle Technologies Mitigate Downtime Complexities Are Inherent in IT – Know IT and PLAN for IT!

Flashback Technologies RMAN Enterprise

Manager

Validated, reliable backup you know can be recovered

Oracle Engineered Solutions eliminate interoperability, patching and upgrade risks

Policy-based, data protection management

Failover, fallback and/or disaster recovery

Oracle Technologies

Quickly review and/or correct user errors

System Monitoring

Active Data

Guard

Oracle Secure Backup

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 44

Key Takeaways

RMAN backup / recovery on Exadata is the same as other platforms – just faster! Oracle data protection technologies meet diverse RTO /

RPO and budget requirements Database consolidation and data protection is ideally

suited to the Exadata platform

Exadata Backup and Recovery

Who Better to Backup Oracle Than Oracle?

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 45

Resources OTN HA Portal:

http://www.oracle.com/goto/availability

Maximum Availability Architecture (MAA): http://www.oracle.com/goto/maa

MAA Blogs: http://blogs.oracle.com/maa

Exadata on OTN: http://www.oracle.com/technetwork/database/exadata/index.html

Oracle HA Customer Success Stories on OTN: http://www.oracle.com/technetwork/database/features/ha-casestudies-098033.html

Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 46