Download ppt - Adventures in Dataguard

Transcript
Page 1: Adventures in Dataguard

Adventures in DataguardDr. Jason Arneil

Page 2: Adventures in Dataguard

Motivation

Why Dataguard

Page 3: Adventures in Dataguard

• Introduction

• The Motivation

• Dataguard Architecture & Features

• Creating a Physical Standby

• Maintaining your standby

• Using your Standby

• Performing a Switchover

AGENDA

Page 4: Adventures in Dataguard

Introduction

Health Warning

Page 5: Adventures in Dataguard

Introduction

• Jason Arneil

• System Administrator/DBA

• Using Oracle since 1998

• At Nominet since 2001

About Me

Page 6: Adventures in Dataguard

Introduction

• Nominet is the internet registry for .uk domain names

• Nominet has been in existence for over 11 years

• Nominet is run as a not-for-profit company

• Nominet is owned by its members

• There are over 6 Million .uk domain names

About Nominet

Page 7: Adventures in Dataguard

Motivation

Why Dataguard

• Big push on a Nominet Business Continuity Plan

• Dataguard is the Oracle solution for disaster recovery

• Physical Standby was the obvious option

• Maximum Availability Architecture (MAA)

Page 8: Adventures in Dataguard

Motivation

Business Continuity Site

Page 9: Adventures in Dataguard

Architecture & Features

Dataguard Processes

PrimaryDatabase

Transactions Physical/Logical StandbyDatabase

Backup /Reports

Transform Redo to SQL for SQL Apply

MRP/ LSP

ARCHArchived Redo Logs

Archived Redo Logs

ARCH

Oracle Net

StandbyRedo Logs

RFS

FAL

Online Redo Logs

LGWRLNS

Page 10: Adventures in Dataguard

Architecture & Features

Dataguard Features

• Several Protection Modes

– Maximum Protection

– Maximum Availability

– Maximum Performance

• Several Transport Modes

– LGWR SYNC

– LGWR ASYNC

– ARCH

Page 11: Adventures in Dataguard

Creating a Standby

Prepare Primary & Standby

• Prepare Primary Database

– Enable Force Logging

SQL> alter database force logging;

– Modify initialization parameters

• Prepare Standby Database

– Setup directory structure

– Create spfile with correct parameters

– Start database in nomount

Page 12: Adventures in Dataguard

Creating a Standby

Log Transport Parameters

• LOG_ARCHIVE_CONFIG='DG_CONFIG=(PRIMARY, STANDBY)'

• LOG_ARCHIVE_DEST_1='LOCATION=/var/oracle/PRIMARY/arch'

• LOG_ARCHIVE_DEST_2='SERVICE=PRIMARC DB_UNIQUE_NAME=PRIMARY'

• LOG_ARCHIVE_DEST_3='SERVICE=STANDBY LGWR ASYNCREOPEN=15 MAX_FAILURE=10 OPTIONAL

VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE) DB_UNIQUE_NAME=STANDBY'

Page 13: Adventures in Dataguard

Creating a Standby

ssh tunnels

• You may not wish your redo data being sent unencrypted across the internet to your standby. You can use ssh tunnels to avoid this

– ssh -N -L 3333:standby:1521 oracle@standby

• Now the tnsnames entry points to the localhost

STANDBYARC =

(DESCRIPTION =

(SDU = 32767)

(ADDRESS_LIST =

(ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT=3333)))

(CONNECT_DATA =

(SERVICE_NAME = STANDBY)))

Page 14: Adventures in Dataguard

Creating a Standby

Some Other Parameters

• FAL_SERVER

• FAL_CLIENT

• ARCHIVE_LAG_TARGET

• STANDBY_FILE_MANAGEMENT

• DB_FILE_NAME_CONVERT

• LOG_FILE_NAME_CONVERT

Page 15: Adventures in Dataguard

Creating a Standby

backup your primary

• Backup primary - rman is good

– rman> backup format '/backup/%U' database plus archivelog;

– rman> backup format '/backup/%U' current controlfile for standby;

• Recover backup on standby node

– I like using rman duplicate to create standby:

• (oracle$) rman target sys/password@PRIMARY auxiliary /

• rman> duplicate target database for standby;

Page 16: Adventures in Dataguard

Creating a Standby

Start applying redo

• Create standby redo log files on both primary and standby:

– sql> alter database add standby logfile thread 2 group 42 (’PATH_TO_DATA/standbyredo01.log') size 512M;

• Now you can start the physical standby recovering logs:

– sql>alter database recover managed standby database disconnect from session;

• Or if you prefer real time apply:

– sql>alter database recover managed standby database using current logfile disconnect from session;

Page 17: Adventures in Dataguard

Maintaining your standby

Monitoring the Standby

• You have to ensure your standby is keeping up with your primary

• You can check which was the last log to have been applied to your standby is

– sql> SELECT MAX(SEQUENCE#), THREAD# FROM V$ARCHIVED_LOG where APPLIED='YES' GROUP BY THREAD#;

MAX(SEQUENCE#) THREAD#

-------------- ----------

2976 1

1888 2

Page 18: Adventures in Dataguard

Maintaining your standby

Monitoring Standby Progress

• A good way of checking what the background processes of your standby are up to is using v$managed_standby

– SQL> select process, sequence#, status

from V$managed_standby;

PROCESS SEQUENCE# STATUS

-------- ---------- ------------

ARCH 2967 CLOSING

ARCH 2974 CLOSING

RFS 2977 IDLE

MRP0 1889 APPLYING_LOG

RFS 1889 IDLE

RFS 2977 IDLE

Page 19: Adventures in Dataguard

Maintaining your standby

Monitoring Your Standby

• You have to ensure your standby is keeping up with your primary

• V$DATAGUARD_STATS provides useful information

– SQL> select name, value from v$dataguard_stats;

NAME VALUE

-------------------------------- ------------------------------------

apply finish time +00 00:00:00

apply lag +00 00:00:11

estimated startup time 41

standby has been open N

transport lag +00 00:00:03

Page 20: Adventures in Dataguard

Maintaining your standby

Monitoring Your Standby

• A way of finding out what has been happening to your standby over a period time is to look at the v$dataguard_status view

– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)

– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)

– Log Apply Services 01-AUG-07 Media Recovery Waiting for thread 2 sequence 1889 (in transit)

– Remote File Server 01-AUG-07 Primary database is in MAXIMUM PERFORMANCE mode

– Remote File Server 01-AUG-07 RFS[53]: Successfully opened standby log 14: '+DATA2/standby/standbyredo02.log'

Page 21: Adventures in Dataguard

Maintaining your standby

Oracle can’t divide by 0

• Standby was happily working away

– ORA-07445: exception encountered: core dump [kcrarmb()+152] [SIGFPE] [Integer divide by zero] [0x00085C300

• MRP process crashes

– No redo gets applied from this point

• Logs after the one that caused the ORA-07445 still being shipped

• A simple restart of the managed recovery process does a FAL and the standby is back up-to-date

Page 22: Adventures in Dataguard

Maintaining your standby

kcrfr_resize2

• Lots of problems after upgrade to 10.2.0.3

– Recovery of Online Redo Log: Thread 2 Group 23 Seq 999 Reading mem 0

Mem# 0: +DATA3/standby/standbyredo11.log

ORA-00600: internal error code, arguments: [kcrfr_resize2], [652614828032], [268423168], [], [], [], [], []

• Perhaps caused by the following:

– Bug 3306010 OERI[kcrfr_resize2] possible in MEDIA recovery

Media recovery may fail with ORA-600 [kcrfr_resize2] when

the number of redo strands is set to a high value using

log_parallelism.

Page 23: Adventures in Dataguard

Maintaining your standby

kcrfr_resize2

• This issue has recently been published as Note:453259.1

– Triggered by having a large log_buffer

• This bug affects 10.2.0.3 and potentially 9.2.0.8

• It is related to the size of the log_buffer parameter

• Fix is included in 10.2.0.4

Page 24: Adventures in Dataguard

Maintaining your standby

kcrrupirfs

• ARC processes died on primary:

ORA-00600: [kcrrupirfs.20] [4] [368]

• Trace file showed the following:

Corrupt redo block 479421 detected: bad block number

Flag: 0x0 Format: 0x0 Block: 0x00000000 Seq: 0x00000000 Beg: 0x0 Cks:0x0 <<<<<<<--

----- Dump of Corrupt Redo Buffer -----000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

Page 25: Adventures in Dataguard

Maintaining your standby

kcrrupirfs

• Oracle think initially think this ORA-600 error was hardware related

– There are NO indications of any hardware fault - the primary keeps running

• After a couple of weeks it was decided this was a “bug situation”

– This was bug 4767278 which talked about FAL not being able to read from multiple mirror sides when encountering invalid/stale redo in a file. Apparently required for ASM configurations because ASM does not guarantee all mirror sides contain same data after writing.

– We were using ASM, but external redundancy

– Oracle then said “The ASM group is not 100% sure if the patch 4767278 will fix the problem”

Page 26: Adventures in Dataguard

Maintaining your standby

log corruption

• The Managed Recovery process crashed complaining about log corruption

MRP0: Background Media Recovery terminated with error 355

ORA-00355: change numbers out of order

ORA-00353: log corruption near block 2 change 1273622545 time 03/06/2007 08:32:46

ORA-00312: online log 13 thread 1: '+DATA2/standby/standbyredo01.log'

• Oracle blame the upgrade process at first. They suggest rebuilding the standby

• Then I notice that trying managed recovery rather than real time apply seems to allow the standby to progress

Page 27: Adventures in Dataguard

Maintaining your standby

log corruption

• At this point Oracle say “it looks like a bug”

• Lots of time spent diagnosing the issue

– ALTER SYSTEM DUMP LOGFILE '+DATA2/nom/standby33.log' scn min 865465290 scn max 865465300;

• Eventually Oracle produced a patch 5746174

– MRP HANGS WITH ASYNC LNS AND PARALLEL ARCHIVAL

Page 28: Adventures in Dataguard

Using Your Standby

Utilize those cpu cycles

• A Standby can be considered an insurance policy

• Several ways to utilize your standby

– Run your backups from your standby

– Open your standby read only for reporting

– Flashback standby to look at old data

– Open your standby read write for testing purposes

Page 29: Adventures in Dataguard

Using Your Standby

Open for Reports

• You need to cancel managed recovery

– sql> alter database recover managed standby database cancel;

• Then simply open the standby

– sql> alter database open;

• Redo is still transported to your standby

• To transition back to applying redo shutdown the open standby, startup mount and restart the recovery process

Page 30: Adventures in Dataguard

Using Your Standby

Open for read write

• You must have flashback database enabled for this

• Stop redo apply on standby

• Create a restore point

• Activate the Standby & perform read/write testing

• Flashback to restore point

• Start the redo on the Standby again

Page 31: Adventures in Dataguard

Using Your Standby

Open for read write

Physical Standby Physical Standby

read write

RestorePoint

Flashback Database

Activate standby

Page 32: Adventures in Dataguard

Using Your Standby

Flashback Database in a Nutshell

• Set up Flashback Database

– alter system set db_recovery_file_dest_size = 8G;

– alter system set db_recovery_file_dest = 'your flashback destination';

– alter system set db_flashback_retention_target = 1440 ;

– alter database flashback on;

• Once you have cancelled the standby recovery create a guaranteed restore point

– create guaranteed restore point before_activate;

Page 33: Adventures in Dataguard

Using Your Standby

Open for read write

• Activate your Standby

– SQL> ALTER DATABASE ACTIVATE STANDBY DATABASE;

• You can open the Standby for business

– SQL> ALTER DATABASE OPEN;

• To become a Standby again shutdown and startup in mount

– SQL> FLASHBACK DATABASE TO RESTORE POINT BEFORE_ACTIVATE;

– SQL> ALTER DATABASE CONVERT TO PHYSICAL STANDBY;

Page 34: Adventures in Dataguard

Using Your Standby

Open for read write

• However things never go according to plan

– ORA-00600: internal error code, arguments: [3705], [1], [8], [3], [8], [], []

• This was bug 4479323 which is a bug with recovery (not standby specific) and only occurs in a RAC environment

• This is fixed in 10.2.0.3

Page 35: Adventures in Dataguard

Doing a Switchover

It’s good to test

• A business continuity plan is no good unless it’s been tested

• It’s not all about the database

• Good to think in terms of services

Page 36: Adventures in Dataguard

Doing a Switchover

Database Switchover

• Make sure your standby is up-to-date

• Check your primary database switchover status:

– primary> SELECT SWITCHOVER_STATUS FROM V$DATABASE;

• Switchover primary database

– primary> ALTER DATABASE COMMIT TO SWITCHOVER TO PHYSICAL STANDBY with session shutdown;

• Switchover the standby

– standby> ALTER DATABASE COMMIT TO SWITCHOVER TO PRIMARY with session shutdown;

Page 37: Adventures in Dataguard

Doing a switchover

DNS Primer

• DNS allows translation from hostname to IP address

– example.co.uk IN A 162.0.0.1

• Our principle is all services are accessed through a CNAME

– anexample.co.uk 5M IN CNAME example.co.uk

• relocation of the service is just a case of changing where the CNAME points

Page 38: Adventures in Dataguard

Conclusion

Conclusion

• Dataguard is an efficient DR solution for your primary database

• Dataguard is mostly reliable but is not without it’s blips

• There are opportunities for gaining added value from your standby

• You can’t test your Business continuity plan enough

Page 39: Adventures in Dataguard

Questions?

Adventures in Dataguard

Contact:

[email protected]

• http://blog.nominet.org.uk