38
ETE ZDLRA Business Case Energy Transfer Information Technology Infrastructure Services (ITIS) “Striving to exceed customer expectations via standard solutions” George F Mamvura Sr. Mgr. IT Infrastructure Services Oracle/Unix

ETE ZDLRA Business Case

  • Upload
    others

  • View
    3

  • Download
    1

Embed Size (px)

Citation preview

ETE ZDLRA Business Case

Energy Transfer

Information Technology Infrastructure Services (ITIS)

“Striving to exceed customer expectations via standard solutions”

George F Mamvura Sr. Mgr. IT Infrastructure Services – Oracle/Unix

Key ETE Database Protection Goals

Business Goals • Never lose critical business data

• No impact on business applications

I.T. Goals • Ensure database level recoverability

• Centralized service to protect all databases

2

Current backup solutions fail to achieve these goals!

ETE Backup Technology Evolution

3

Industry focus on Optimization of Storage Capacity, not pure Database Availability

1980’s 1990’s 2015

Tape Libraries …Still in use today

Network Attached Storage

Deduplication/ SAN Replications

Late 2000’s

Oracle Backup Appliances (ZDLRA)

General Backup Solutions Are Not Designed for Database

…most Treat Databases as Just Files to Periodically Copy

4

Daily Backup Window

Large performance impact on production

Data Loss Exposure

Lose all data since last backup

Many Systems to Manage

Scale by deploying more backup appliances

Poor Database Recoverability

Many files are copied but protection state of database is unknown

ETE Data Protection: A Complex Task

• Significant challenges for data protection?

#1: Backing up and managing increasingly large data volumes

5

Backups are too slow

Backups need constant management

Complex Recovery

Production slow down

Coordination of multiple backups

ETE Deploys ZDLRA A Fundamentally Different Approach

6

ETE Benefits achieved with the Recovery Appliance

7

Results Achieved Recovery Appliance Cost Savings Over 5

Years

Storage Consumption for Oracle DB Backups

5x less - 10TB vs about 49TB: Estimated 103TB reduction / 5 year

$176k

Capital Savings Cost Avoidance – Freed up 30TB of Hitachi Tier 1 Storage

$150k

Decreased Backup Time Reduced backup window by 5 hours $350k

Better Utilization of IT Assets - Offloaded most backup processing allowing more resources for data processes - Legacy solution use of production system resources could affect daily processing

$150k

Team Productivity Savings Theoretically, reduced personnel time needed to manage backup, restore and cloning

$575.7k

ETE Benefits achieved with the Recovery Appliance – continued…

8

Results Achieved Recovery Appliance

Extended PITR Recovery Window on Disk

Allowed ETE to have a longer recovery window at lower cost Legacy solution retention period limited due to high storage costs

Improved Development and Project Efficiency

Centralized and efficient project database builds/backups/restore replaced previous cumbersome process -- Substantial DBA time savings

Centralized Archival Offsite data archive for compliance and DR now accessed from a single repository and Replicated significantly reducing negative affect on production systems

ETE Platform as a Service Improved platforms recoverability standards to a Diamond service level - See Data Loss Calculator

ETE Benefits of a ZDLRA Solution- continued…

Backup Storage Analysis

9

0

20

40

60

80

100

120

140

1 2 3 4 5 6

TB

Year

DB Backup Storage Growth

Total Current (TB)

ZDLRA (TB)

ETE Benefits of a ZDLRA Solution- continued…

Five Year Cost Analysis

10

$0.00

$100,000.00

$200,000.00

$300,000.00

$400,000.00

$500,000.00

$600,000.00

$700,000.00

$800,000.00

Cost Status Quo Cost ZDLRA

Capital Cost

Cost Status Quo

Cost ZDLRA

ETE Benefits of a ZDLRA Solution- continued …

Decrease Backup Window Time

11

0

1

2

3

4

5

6

7

8

9

DB Backup Window (Tape) hrs DB Backup Window (ZDLRA) hrs

DB Backup Window

DB Backup Window (Tape) hrs

DB Backup Window (ZDLRA) hrs

ETE Benefits of a ZDLRA Solution- continued …

Team Productivity Improvement Analysis

12

Productivity Investment Year 1 Year 2 Year 3 Year 4 Year 5 Totals

Status Quo

Cloning $0.00 $16,380.00 $16,380.00 $16,380.00 $16,380.00 $16,380.00 $81,900.00

Backup Window $0.00 $98,280.00 $98,280.00 $98,280.00 $98,280.00 $98,280.00 $491,400.00

Restore Time $0.00 $45,000.00 $45,000.00 $45,000.00 $45,000.00 $45,000.00 $225,000.00

With ZDLRA

Cloning $0.00 $4,680.00 $4,680.00 $4,680.00 $4,680.00 $4,680.00 $23,400.00

Backup Window $0.00 $21,840.00 $21,840.00 $21,840.00 $21,840.00 $21,840.00 $109,200.00

Restore Time $0.00 $18,000.00 $18,000.00 $18,000.00 $18,000.00 $18,000.00 $90,000.00

Productivity Savings $0.00 $115,140.00 $115,140.00 $115,140.00 $115,140.00 $115,140.00 $575,700.00

ETE Benefits of a ZDLRA Solution- continued…

Recovery Appliance Unique Benefits for Business and I.T.

13

Minimal Impact Backups

Production databases only send changes. All backup and tape processing offloaded

Eliminate Data Loss

Real-time redo transport provides instant protection of ongoing transactions

Scale Protection

Easily protect all databases in the data center using massively scalable service

Database Level Recoverability

End-to-end reliability, visibility, and control of databases - not disjoint files

ETE Benefits of a ZDLRA Solution- continued …

Minimal Impact Backups: Boost Production Server Performance

• Offload Backup Processing, Eliminate Expensive Backup Agents, Reduce Network Load

14

Prior Database Server architect

Performance degrades with backups

Disk / Tape / Dedupe Backup Agents

Backup Operations: Merge, Compress, Validate, Delete,

Full/Tape Backups

Performance improves with backup offload

Database Server with Recovery Appliance

Disk / Tape / Dedupe Backup Agents

Backup Operations: Merge, Compress, Validate, Delete,

Full/Tape Backups

Delta Push

ETE Benefits of a ZDLRA Solution- continued …

Recovery Appliance: End-to-End Data Protection Visibility

Policy-based Management: Application-oriented Data Protection

15

1 Application-oriented

Protection Policy

Well Defined Oracle Blocks

RMAN Backup Set

End-to-end Oracle Block Validation

2

Recovery Appliance

3 EM Cloud Control: Integrated

Management

George F. Mamvura Sr. Mgr. IT Infrastructure Services – Oracle/Unix

Thank You

Accelerating Database Backup

and Recovery with Zero Data Loss

Recovery Appliance Kevin Prendergast

Lead Specialist

Javier Ruiz

Technical Team Lead

Agenda • Configuration

• Tape Hardware

• Network Setup

• Backup Configuration

• Restore and Clones

• Tape Backup Configuration

• Replication Configuration

• Troubleshooting

• Monitoring

ZDLRA Setup

Upstream

Tape Library 1

Downstream

Tape Library 2

Tape Library 1

RMAN: • Nightly incremental Backups • Archivelog Backups • Real-time Redo Shipping

Protected Databases

Replication: • Replicate only Gold and

Platinum

OSB Tape Library 1 : • Nightly incremental • Weekly full • Archivelog

OSB Tape Library 2 : • Archivelog • ZDLRA File System

WAN

OSB Tape Library 1 : • Nightly incremental • Weekly full • Archivelog

Tape Hardware

EMC Library NDMA

Oracle Secure Backup (OSB) 12.1

Direct fiber

Media manager setup configured in OEM recovery appliance

Network Configuration

New Cisco 2xxx Nexus switch infrastructure providing 40GB uplink to core.

Switches installed in each rack with 10GB to each port.

Compute Node Network Trunking

• NIC LACP(Link Aggregation Control Protocol) bonding is configured to provide redundancy and performance.

• The bonding trunks the interfaces together to provide a single network.

• Two(2) x 10GB for 20GB bandwidth using Dynamic Link Aggregation(mode 4).

• IEEE 802.3ad Dynamic link aggregation. Creates aggregation groups that share the same speed and duplex settings. Utilizes all slaves in the active aggregator according to the 802.3ad specification.

• Slave selection for outgoing traffic is done according to the transmit hash policy, which may be changed from the default simple XOR policy via the xmit_hash_policy option.

• Current configuration utilizing teamed active/passive 10GB interfaces due ETP network topology limitations and upgrade timelines.

• Future progression to trunked interfaces for 40GB uplink to core network and switch configuration, when Data Center is fully modernized.

Issues with Legacy Solution

• Daily backup window • Amount of DBs and size causing long RMAN backups • Batch processing and app jobs conflict with resources with backup

• Data Loss Exposure • From the last full backups archivelogs are not backed up still in ARCHDG • Manual process to restore files from tape and make sure correct files are

restored

• Space constraints with holding backups

• Scheduled maintenance requires special backups that take more time

• Tape backup with no understanding of RMAN backups pieces.

• Clones take more time and manual work to transfer RMAN backups

Energy Transfer ZDLRA

Benefits • Significant performance improvement

(reduction in time):

• Backups and restores of databases using the Recovery Appliance.

• Cloning of new databases from backups on the Recovery Appliance.

• Incremental forever architecture reduced backup window

• Offload backup processing from production servers:

• Backup compression, merge of incrementals, validation, deletion and backup to tape.

• Integrated and automated multi-tiered deployment (Replica and tape)

• RMAN automatically restores from tape if backup isn’t on the appliance

• Free up local /backup SAN space about 22T

• Non-prod environments can access backup files in ZDLRA producing faster refreshes

• Real-time redo shipping from memory allowing up to the second point in time recovery

• 10GB Nexus network connectivity in Active/Passive mode 1

• Online backup and restore store via RMAN fast access to data.

• Reduces backup overhead on large filesystem cache utilization.

Backup Module

• SBT library is bundled with ZDLRA database

• Protected databases can download module from OTN or install from ZDLRA $ORACLE_HOME/lib

• All protected database will have module installed • $ORACLE_HOME/lib

• Central path for lib

• Oracle wallet created to store the credentials needed to authenticate the protected database with ZDLRA. • Recommend creating a strong 12+ character password

ZDLRA User Accounts and Catalog

• Virtual private catalog (VPC) users are created on the appliance • DBAs can only access metadata for protected databases they manage

• Authentication credentials stored in Oracle wallet on the protected database host.

• Recovery Appliance catalog is managed by the appliance • Eliminates the need to maintain and manage a separate RMAN catalog

• Existing catalog metadata can be imported using RMAN IMPORT

DB Backup and Restore

• Nightly incremental backups

• First backup is incremental level 0, then level 1 thereafter

• Catalog reports a virtual full (VB$) available to the point of the incremental

• Settings on protected databases

• Enable block change tracking for fast incremental

• When database is added to Recovery Appliance management, Enterprise Manager (EM) automatically updates RMAN backup setting to use Recovery Appliance SBT_TAPE library

• Recovery Window for disk and tape backups configured via Recovery Appliance Protection Policy

• Backups beyond recovery window automatically purged by the appliance

• Eliminates need to perform RMAN DELETE OBSOLETE operation on protected databases

• Backup, restore and duplicate operations must connect to catalog

• Restore any virtual full backup eliminating overhead and time needed to merge / apply incremental

Protection Policies

• Central mechanism • Recovery window goals for disk backups

• Recovery window goals for tape backups

• Info about replication of backups

• Disk recovery window 4 days

• Maximum disk retention 15 days

• Media Manager 28 days

• Unprotected data window

• Policies • Bronze • Silver • Gold • Platinum

Improved Backup Performance

• Before ZDLRA full compressed

backup where taking between 8 to 12 hours with 4 channels

• Backup windows - 5pm to 5am • ZDLRA nightly incremental

backups completed between 2 to 3 minutes depending on the amount of block changes.

• Now all backups complete before 11pm

0

2

4

6

8

10

12

14

DB1 DB2 DB3 DB4

RMAN Backup Times

Before ZDLRA ZDLRA Level 0 After ZDLRA Level 1

Backups

• Offloading

• Compression

• Backup validation

• CPU usage

• Power usage

• Network usage

Load on the server from RMAN

0

10

20

30

40

50

60

Node 1 Node 2 Node 3

CPU Usage

Before ZDRLA After ZDLRA

0

10

20

30

40

50

60

Node 1 Node 2 Node 3

Power Usage

Before ZDRLA After ZDLRA

Restores Are 3x Faster

• Before ZDLRA restores could take up to 10 hour

• ZDLRA delivers a 75% improvement on restore times.

• This sped up our clones of non-prod databases.

0

2

4

6

8

10

12

DB1 DB2 DB3 DB4

RMAN Restore Times

Before ZDLRA With ZDLRA

Redo Shipping

• Support for 11g and 12c

• Data guard redo transport

• Incoming redo blocks directly from SGA on protected database

• Writes logs to redo staging location

• Converts to compressed archive log backup and written to delta store

• Preserves data loss with archived redo log

• Run archivelog backup and delete

• Archivelog deletion policy apply on standby

• Requires restart of database

log_archive_config dg_config=(zdlra,standby) redo_transport_user=RAVPC1 log_archive_dest_2='VALID_FOR=(ALL_LOGFILES, ALL_ROLES) ASYNC DB_UNIQUE_NAME=zdlra', 'SERVICE=“<NODE>ingest-scan:1521/zdlra:dedicated"'

Parameter Changes on Protected DB

OSB Tape Backup Configuration

• OSB 12.1

• Upstream • SL500

• SL150

• Downstream • SL500

• Configure copy to tape jobs in OEM • To add second library to upstream required custom setup • Custom OSB storage selector for archivelog backup pieces

to second library • ZDLRA file system tape backup • Tape life cycle automated

Replication Configuration

• RA version ra_automation-12.1.1.1.7-23494283.x86_64

• Replicate Gold and Platinum protected policies

Troubleshooting

• Methodology • Doc ID: 2066528.1

• Startup and Shutdown Process • Customer Outage Classifications and Restoration Action Plans Doc ID: 2022047.1

• Make sure you are on the latest patch for RA • Doc ID: 2028931.1

• Protected DBs • 10.2.0.4 and below not supported

• Replication • Transfer issue fixed with ra_automation-12.1.1.1.7-23494283

• Incidents and events • Watching for alerts showing protected DB not meeting recovery window

Monitoring Cloud Control

Monitoring and configuration via Cloud Control • Summary

• Protected DBs

• Data sent and received

• Performance

• Media managements

• Copy to Tape

• Storage locations

• Replication

• Incidents and events

• BI Reports

Monitoring Cloud Control

OSB Monitoring • Monitor

Devices

• Administer Devices

• Administer Library

• Setup file system backup jobs

Thank You Kevin Prendergast

Lead Specialist

Javier Ruiz

Technical Team Lead