Adventures in Dataguard

  • View
    5.016

  • Download
    2

Embed Size (px)

Text of Adventures in Dataguard

  • 1. Adventures in Dataguard Dr. Jason Arneil

2. Why Dataguard Motivation 3.

  • Introduction
  • The Motivation
  • Dataguard Architecture & Features
  • Creating a Physical Standby
  • Maintaining your standby
  • Using your Standby
  • Performing a Switchover

AGENDA 4. Health Warning Introduction 5. About Me Introduction

  • Jason Arneil
  • System Administrator/DBA
  • Using Oracle since 1998
  • At Nominet since 2001

6. About Nominet Introduction

  • Nominet is the internet registry for.ukdomain names
  • Nominet has been in existence for over 11 years
  • Nominet is run as a not-for-profit company
  • Nominet is owned by its members
  • There are over 6 Million.ukdomain names

7. Why Dataguard Motivation

  • Big push on a Nominet Business Continuity Plan
  • Dataguard is the Oracle solution for disaster recovery
  • Physical Standby was the obvious option
  • Maximum Availability Architecture (MAA)

8. Business Continuity Site Motivation 9. Dataguard Processes Architecture & Features Primary Database Transactions Physical/LogicalStandby Database Backup / Reports Transform Redoto SQL forSQL Apply MRP/ LSP ARCH Archived Redo Logs Archived Redo Logs ARCH OracleNet Standby RedoLogs RFS FAL Online Redo Logs LGWR LNS 10. Dataguard Features Architecture & Features

  • Several Protection Modes
    • Maximum Protection
    • Maximum Availability
    • Maximum Performance
  • Several Transport Modes
    • LGWR SYNC
    • LGWR ASYNC
    • ARCH

11. Prepare Primary & Standby Creating a Standby

  • Prepare Primary Database
    • Enable Force Logging
    • SQL> alter database force logging;
    • Modify initialization parameters
  • Prepare Standby Database
    • Setup directory structure
    • Create spfile with correct parameters
    • Start database in nomount

12. Log Transport Parameters Creating a Standby

  • LOG_ARCHIVE_CONFIG='DG_CONFIG=(PRIMARY, STANDBY)'
  • LOG_ARCHIVE_DEST_1='LOCATION=/var/oracle/PRIMARY/arch'
  • LOG_ARCHIVE_DEST_2='SERVICE=PRIMARCDB_UNIQUE_NAME=PRIMARY'
  • LOG_ARCHIVE_DEST_3='SERVICE=STANDBY LGWR ASYNC
  • REOPEN=15 MAX_FAILURE=10 OPTIONAL VALID_FOR=(ONLINE_LOGFILES,PRIMARY_ROLE)DB_UNIQUE_NAME=STANDBY'

13. ssh tunnels Creating a Standby

  • You may not wish your redo data being sent unencrypted across the internet to your standby. You can use ssh tunnels to avoid this
    • ssh -N -L 3333:standby:1521 oracle@standby
  • Now the tnsnames entry points to the localhost
  • STANDBYARC =
  • (DESCRIPTION =
  • (SDU = 32767)
  • (ADDRESS_LIST =
  • (ADDRESS = (PROTOCOL = TCP)(HOST = localhost)(PORT=3333)))
  • (CONNECT_DATA =
  • (SERVICE_NAME = STANDBY)))

14. Some Other Parameters Creating a Standby

  • FAL_SERVER
  • FAL_CLIENT
  • ARCHIVE_LAG_TARGET
  • STANDBY_FILE_MANAGEMENT
  • DB_FILE_NAME_CONVERT
  • LOG_FILE_NAME_CONVERT

15. backup your primary Creating a Standby

  • Backup primary - rman is good
    • rman> backup format'/backup/%U'database plus archivelog;
    • rman>backup format '/backup/%U' current controlfile for standby;
  • Recover backup on standby node
    • I like using rman duplicate to create standby:
  • (oracle$) rman target sys/password@PRIMARY auxiliary /
  • rman> duplicate target database for standby;

16. Start applying redo Creating a Standby

  • Create standby redo log files on both primary and standby:
    • sql>alter database add standby logfile thread 2 group 42 (PATH_TO_DATA/standbyredo01.log') size 512M;
  • Now you can start the physical standby recovering logs:
    • sql>alter database recover managed standby database disconnect from session;
  • Or if you prefer real time apply:
    • sql>alter database recover managed standby database using current logfile disconnect from session;

17. Monitoring the Standby Maintaining your standby

  • You have to ensure your standby is keeping up with your primary
  • You can check which was the last log to have been applied to your standby is
    • sql>SELECT MAX(SEQUENCE#), THREAD#
      • FROM V$ARCHIVED_LOG
      • where APPLIED='YES'
      • GROUP BY THREAD#;
  • MAX(SEQUENCE#)THREAD#
  • ------------------------
  • 29761
  • 18882

18. Monitoring Standby Progress Maintaining your standby

  • A good way of checking what the background processes of your standby are up to is using v$managed_standby
    • SQL>select process, sequence#, status
  • from V$managed_standby;
  • PROCESSSEQUENCE# STATUS
  • ------------------------------
  • ARCH2967CLOSING
  • ARCH2974CLOSING
  • RFS2977IDLE
  • MRP01889APPLYING_LOG
  • RFS1889IDLE
  • RFS2977IDLE

19. Monitoring Your Standby Maintaining your standby

  • You have to ensure your standby is keeping up with your primary
  • V$DATAGUARD_STATS provides useful information
    • SQL>select name, value from v$dataguard_stats;
  • NAMEVALUE
  • -------------------------------- ------------------------------------
  • apply finish time+00 00:00:00
  • apply lag+00 00:00:11
  • estimated startup time41
  • standby has been openN
  • transport lag+00 00:00:03

20. Monitoring Your Standby Maintaining your standby

  • A way of finding out what has been happening to your standby over a period time is to look at the v$dataguard_status view
    • Log Apply Services01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)
    • Log Apply Services01-AUG-07 Media Recovery Waiting for thread 1 sequence 2977 (in transit)
    • Log Apply Services01-AUG-07 Media Recovery Waiting for thread 2 sequence 1889 (in transit)
    • Remote File Server01-AUG-07 Primary database is in MAXIMUM PERFORMANCE mode
    • Remote File Server01-AUG-07 RFS[53]: Successfully opened standby log 14: '+DATA2/standby/standbyredo02.log'

21. Oracle cant divide by 0 Maintaining your standby

  • Standby was happily working away
    • ORA-07445: exception encountered: core dump [kcrarmb()+152] [SIGFPE] [Integer divide by zero] [0x00085C300
  • MRP process crashes
    • No redo gets applied from this point
  • Logs after the one that caused the ORA-07445 still being shipped
  • A simple restart of the managed recovery process does a FAL and the standby is back up-to-date

22. kcrfr_resize2 Maintaining your standby

  • Lots of problems after upgrade to 10.2.0.3
    • Recovery of Online Redo Log: Thread 2 Group 23 Seq 999 Reading mem 0
    • Mem# 0: +DATA3/standby/standbyredo11.log
    • ORA-00600: internal error code, arguments: [kcrfr_resize2], [652614828032], [268423168], [], [], [], [], []
  • Perhaps caused by the following:
    • Bug 3306010 OERI[kcrfr_resize2] possible in MEDIA recovery
  • Media recovery may fail with ORA-600 [kcrfr_resize2] when
  • the number of redo strands is set to a high value using
  • log_parallelism.

23. kcrfr_resize2 Maintaining your standby

  • This issue has recently been published asNote:453259.1
    • Triggered by having a large log_buffer
  • This bug affects 10.2.0.3 and potentially 9.2.0.8
  • It is related to the size of the log_buffer parameter
  • Fix is included in 10.2.0.4

24. kcrrupirfs Maintaining your standby

  • ARC processes died on primary: