Upload
vudieu
View
229
Download
2
Embed Size (px)
Citation preview
Oracle Maximum Availability Architecture Best Practices for Oracle Exadata Hector Pujol, Oracle MAA Team Mike Smith, Oracle MAA Team Curt Lukenbill, Paychex DB Operations
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 2
Program Agenda
Exadata and Oracle Maximum Availability Architecture
High Availability Out of the Box
Oracle MAA Configuration Best Practices
Reference Configurations
PayChex Exadata/MAA Experiences
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 3
Exadata Built-In Hardware Redundancy
• Redundant Database Servers – Active-Active highly available clustered servers – Hot-swappable power supplies and fans – Redundant power distribution units
• Redundant Storage Grid – Data mirrored across storage servers – Redundant, non-blocking IO paths
• Redundant Network – Redundant 40GB/s IB connections and switches – Client access using HA bonded networks
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 4
Online Redefinition, Edition-based Redefinition, Data Guard, GoldenGate – Minimal downtime maintenance, upgrades, and migrations
Production Site
RAC – Scalability – Server HA
Flashback – Human error
correction
Active Data Guard – Data Protection, DR – Query Offload
GoldenGate – Active-active – Heterogeneous – Migrations and Upgrades
Active Replica
Maximum Availability Architecture (MAA) Integrated, Active, High Return on Investment
Oracle Secure Backup – Backup to tape / cloud
ASM – Volume Management
RMAN & Fast Recovery Area – On-disk backups
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 5
High Availability Out of the Box
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 6
Configuration
Automate installation and configuration Uses Exadata/MAA best practices for:
– Grid Infrastructure, Oracle Storage Grid and Oracle Database – Operating system (Linux or Solaris X86) – Network configuration (client and admin access, GigE, InfiniBand) – Initial monitoring setup (SNMP alerts, Oracle Configuration Manager,
Automatic Service Request, Grid Control Agents) – DBCA template for future usage
Within days of arrival, the Exadata System and Oracle Database are ready for use
Oracle OneCommand
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 7
Storage
Read and repair corruption from mirror with no application impact – Most mirroring solutions will read from mirror copy of block on I/O
error or failed storage checksum – Exadata does this plus performs additional validation and will also
read from mirror if a block is internally corrupt Highly available storage grid configured out of the box
– Creating disk group automatically creates associated failure groups – Disk group attributes preconfigured to give optimal uptime – Disk group placement on disk for optimal scalability
Preconfigured Protection
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 8
InfiniBand Network
Network configuration – Exhaustive testing has reduced brownout during InfiniBand failures – BONDING_OPTS="mode=active-backup miimon=100 downdelay=5000 updelay=5000 num_grat_arp=100“
– Switch and port failures are handled efficiently and transparently
Preconfigured Low Brownout and High Bandwidth
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 9
Compute Nodes
DBCA templates with HA best practices built in – Intelligent file redundancy configurations (ex: control file mirroring) – Parameter settings based on best practices – SGA / PGA configuration
Performance optimizations that also prevent outages – Efficient memory management using hugepages
Preconfigured High Availability
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 10
Automated Exadata Health Check
Comprehensive configuration check of Exadata software and hardware Reports any variance from MAA best practices Detects problems before they impact production Run monthly Run pre/post maintenance Download My Oracle Support Note 1070954.1
Exachk
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 11
Exachk Report
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 12
Exachk Sample Output
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 13
Recommendation and Repair
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 14
Recommendation and Repair
Types of Best Practices Examined – Software Maintenance – Database, Computer, and Storage Failure Prevention – Data and Logical Corruption Prevention – Site and Network Failure Prevention – Client Failover – RMAN – Consolidation – General Operational
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 15
Recommendation and Repair
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 16
Oracle MAA Configuration Best Practices
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 17
MAA for Storage Servers
Single ASM storage grid, three disk groups – DATA, data files, RECO, recovery files DBFS, file
system data
ASM redundancy protects against disk failure – Failure groups eliminate single point of failure – Intelligent corruption handling and automatic repair
ASM high redundancy (triple mirroring) for best data protection
– Alternative of using ASM normal redundancy (double mirroring) if also using Data Guard
Automatic Storage Management (ASM)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 18
ASM Disk Group Configuration
Prevent loss of cluster and disk group due to dual storage failures Tolerate storage failure during Exadata planned maintenance If no standby, always use at least one High Redundancy disk group
– If DATA is HIGH, application remains available – If RECO is HIGH, database can be restored with zero data loss – Select the disk group configuration option during deployment
High Redundancy
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 19
ASM Disk Group Configuration
“It’s really easy to say use high redundancy diskgroups, but I can’t take the hit on useable space! Plus, won’t I have an extra I/O which could
result in a performance hit?”
High Redundancy
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 20
ASM Disk Group Configuration
Usable space – Use high capacity disks – High capacity disks with high redundancy = 168TB usable – High performance disks with normal redundancy = 56TB usable
Performance – Write Back Flash Cache masks removes performance hit on small I/O’s
with high capacity disks – Write Back Flash Cache masks extra I/O with high redundancy – Read IOPS 1.5 million and write IOPS 1 million for either HC/HP disks.
High Redundancy
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 21
MAA for Compute Servers
Accelerate instance recover – Tune FAST_START_MTTR_TARGET to meet your SLA’s
Configure client connections to take advantage of automatic node failover
– Fast Application Notification (FAN) – Transparent Application Failover (TAF)
Application Continuity / Transaction Guard – Seamless failover of application connections – Replays transactions without major application re-write
Oracle Real Application Cluster
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 22
Application Continuity Example Java App Using Universal Connection Pool (UCP)
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 23
Application Continuity Using Service Configured for FAN / Replay and Replay Driver
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 24
Application Continuity Using Service for FAN and NON-Replay Driver
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 25
Application Continuity Seamless failover of application connections
No Interruption
Interruption
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 26
Application Continuity Seamless failover of application connections
App Using Application Continuity:
App NOT Using Application Continuity:
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 27
Use Oracle Resource Management
Use hugepages for optimal memory management – My Oracle Support Note 361323.1
Instance Caging - limit the amount of CPU used by an Oracle instance Database Resource Manager - allocate CPU resources across multiple
services that share the same database I/O Resource Manager - allocate I/O bandwidth among databases
– IORM is unique to Exadata storage
Reliable Service & Optimal Performance in Consolidated Environments
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 28
Capability Physical Block Corruption Logical Block Corruption
Dbverify, Analyze Physical block checks Logical intra-block and inter-object consistency
RMAN Physical block checks during backup and restore Intra-block logical checks
Active Data Guard
• Continuous physical block checking at standby • Strong isolation eliminates single point of
failure • Automatic repair of physical corruptions • Automatic failover
• Detect lost write corruption, auto shutdown and failover
• Intra-block logical checks at standby
Database In-memory block and redo checksum In-memory intra-block checks
ASM Automatic corruption detection and repair using extent pairs
Exadata HARD checks on write HARD checks on write
Oracle Data Protection R
untim
e ch
ecks
M
anua
l ch
ecks
Real Time Data Protection
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 29
Prevent, Detect, and Repair Data Corruptions
DB_BLOCK_CHECKSUM=FULL
– Detect physical corruption, auto-repair corruptions detected in memory DB_BLOCK_CHECKING=MEDIUM | FULL
– Detect logical corruptions, auto-repair corruptions detected in memory DB_LOST_WRITE_PROTECT=TYPICAL
– Detects silent corruption due to lost or mis-directed writes
Active Data Guard auto-block repair of corruptions detected on-disk Identical settings on primary and standby databases
My Oracle Support Note 1302539.1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 30
Fast Recovery from Corruption
Flashback operates on changed data only Correction time is reduced from hours to minutes
Correction time = error time + f(DB_SIZE) Rebuild of standby = Minutes + (DB_SIZE x network bandwidth)
Oracle Flashback Technologies
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 31
Enable Flashback Database – Minimal impact to OLTP workloads – Minimal impact to DW loads if operational practices and recommended
patches are in place (MOS 565535.1) Use local extent managed tablespaces Recreate objects instead of truncate tables prior direct load Pre-allocate flashback logs
– Size fast recovery area minimum redo rate X DB_FLASHBACK_RETENTION_TARGET
Fast Recovery From Corruptions Oracle Flashback Technologies
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 32
Backups
Backup Software – Recovery Manager (RMAN)
On-disk backups in the fast recovery area (FRA) Backup once, incremental forever
– Oracle Secure Backup (OSB) Manages the location and life cycle of backups
Choice of backup destinations – Exadata storage – Non-Exadata disk storage: Oracle or third party products – Tape: Oracle or third party products
Two Aspects to Exadata Backup: Software and Destination
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 33
Exadata Backup Destination Options
Fiber Channel
SAN
10GigE or InfiniBand Network
Oracle Secure Backup Media Servers
Oracle Secure Backup Admin Server
Tape library •Offsite Backups •Vaulting
ZFS Storage Appliance •Backups of database & non-database files •Snapshots •Clones
InfiniBand Network
Storage Expansion Rack •Fastest Backup and Restore •ILM Historical Archive •Second DATA2 Disk Group •Expansion of DATA
10GigE or InfiniBand Network
Ethernet
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 34
Disaster Protection Oracle Active Data Guard – Oracle Aware Data Protection
Active Standby Database
Production Database
Continuous Redo Shipment and Apply
Data Guard Broker Enterprise Manager Grid Control
Data Guard Production Workload
Queries, read-only reporting offloaded
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 35
Data Guard Best Practices
Configure network for Data Guard transport – Set Oracle Net RECV_BUF_SIZE and SEND_BUF_SIZE and maximum
TCP socket buffer sizes >= 3 X BDP – Place standby redo log groups on fastest portion of disk
Tune Active Data Guard apply performance if necessary – Assess apply performance using standby statspack – Tune based on top wait events (coordinator / recovery slaves) – Monitor real-time query performance using Active Session History
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 36
0
50,000
100,000
150,000
200,000
Data Load Redo Volume
Uncompressed HCC
Data Guard Best Practices
Hybrid columnar compression (HCC) conserves bandwidth
• 78% reduction in redo volume and network consumption
• 4% reduction in elapsed time required to complete load with HCC enabled
For all best practices, refer to: – Best Practices for Disaster Recovery for
Exadata Database Machine
Meg
aByt
es o
f dat
a
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 37
Integrated, Automatic Client Failover
Use SRVCTL to configure Clusterware managed services Data Guard Broker is required for complete automation
– CRS starts/stops services appropriate for database role – FAN compliant clients are automatically notified
srvctl add service -d <db_unique_name> -s <service_name> [-l [PRIMARY][,PHYSICAL_STANDBY][,LOGICAL_STANDBY] [,SNAPSHOT_STANDBY]] [-y {AUTOMATIC | MANUAL}][-r <instance1,instance2…>]
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 38
Integrated, Automatic Client Failover
Connection should specify both primary and standby SCAN hostnames Oracle Net Alias – An Example
SALES= (DESCRIPTION_LIST= (LOAD_BALANCE=off)(FAILOVER=on) (DESCRIPTION= (LOAD_BALANCE=on)(CONNECT_TIMEOUT=10)(RETRY_COUNT=3) (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=Austin-scan)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=OrderEntry))) (DESCRIPTION= (LOAD_BALANCE=on)(CONNECT_TIMEOUT=10)(RETRY_COUNT=3) (ADDRESS_LIST= (ADDRESS=(PROTOCOL=TCP)(HOST=Houston-scan)(PORT=1521))) (CONNECT_DATA=(SERVICE_NAME=OrderEntry))))
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 39
Oracle MAA Reference Configurations
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 40
Exadata MAA Configuration Options
HA Engineered into the Exadata system Second Exadata system deployed for remote DR
– Asynchronous redo transport, Data Guard Maximum Performance – Active Data Guard: offload read-only reporting
Remote Disaster Recovery with Maximum Performance
Primary Remote Standby
Asynchronous Transport
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 41
Exadata MAA Configuration Options
HA Engineered into the Exadata system Second Exadata system deployed for local DR (within 200 miles)
– Synchronous redo transport, Data Guard Maximum Availability – Active Data Guard: offload read-only reporting
Local Disaster Recovery with Zero Data Loss
Primary Local Standby
SYNC
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 42
Exadata MAA Configuration Options
Dual standby configuration – Local standby is primary failover target with zero data loss – Remote standby is failover of last resort – Either is used to offload read-only workload, backups, rolling upgrades, test
Multi-Standby: Local HA Failover plus Geographic Protection
Primary Remote Standby
Asynchronous
Local Standby
SYNC
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 43
Exachk for Planned Maintenance
Qualify maintenance readiness – Comprehensive configuration checks – Best practice adherence
Simplify software planning
– Critical Issue exposure report – Version recommendations
Automated Exadata Health Check – MOS 1070954.1
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 44
Software Maintenance Suggested Schedule
Frequency Planned Maintenance – Software Updates
3-12 months Database / Grid Infrastructure quarterly patch (QDPE)
Exadata
1-2 years Database / Grid Infrastructure patch set (by Error Correction Support end date)
InfiniBand switch
2-4 years Database / Grid Infrastructure release (e.g. 11.1 to 11.2, 11 to 12)
45
Paychex EXADATA & MAA
Curt Lukenbill Manager, Database Operations
Oracle OpenWorld September 2013
46 Copyright 2013, Paychex, Inc.
About Paychex, Inc. • Paychex is a leading provider of payroll, human resource, insurance,
and benefits outsourcing solutions • Founded in 1971 - Serves approximately 570,000 small and medium-
sized businesses. • Based in Rochester, NY – 100+ offices in U.S. and Germany • 12,000+ Employees • Fiscal 2013 highlights (ended May 31, 2013):
• $2.3 billion total service revenue • $569 million net income
• Computerworld list of “Top 100 Best Places to Work in IT” • Information Week list of “Top 250 Technology Innovators”
47 Copyright 2013, Paychex, Inc.
Paychex Database Operations & Services • Located in Webster, NY - 15 Miles NE of Rochester • 3 Managers – 2 Oracle & 1 SQL Server
• 27 DBA’s – 18 Oracle, 9 SQL Server • Database Security Analyst • Lead Oracle Database Architect • Team Responsibilities :
• Design and Standards, Project Participation, Database Security • Installation, Maintenance, Patching, DB Performance Management • 365x24x7 On-Call Support • Exadata Platform • Active Data Guard • GoldenGate • Oracle Enterprise Manager
48 Copyright 2013, Paychex, Inc.
ENTERPRISE REPORTING ENVIRONMENT • EXADATA – ENABLER FOR A NEW OPPORTUNITY
• Enterprise Reporting Warehouse
• PAYROLL Transactional DBs • Nine 2-Node RAC Clusters, HP DL785, RH LINUX, 11.2.0.2 DB, 7 TB
• Oracle GoldenGate sends transactions to EXADATA • Loads to Staging Tables
• Oracle Data Integrator transforms • Loads Enterprise Reporting Database
• Oracle Business Intelligence used to create reports
• Active Data Guard used for Disaster Recovery
49 Copyright 2013, Paychex, Inc.
PAYCHEX EXADATA ENVIRONMENT
50 Copyright 2013, Paychex, Inc.
PAYCHEX EXADATA ENVIRONMENT
51 Copyright 2013, Paychex, Inc.
MAA ASSESSMENT
• RECOVERY POINT OBJECTIVE – 0 DATA LOSS • RECOVERY TIME OBJECTIVE – 2 HOURS • Fault Protection “Out of the Box”
• Cell Server, Disk, Network, Database
• Review of Data Guard Configuration • Active Data Guard used to update Physical Standby DB
• Currently using ASYNC mode (SYNC used for transactional DB’s)
• Use of Test and Performance Testing Platform • Support Testing through use of Real Application Testing tool
• Validate Operational and Recovery Best Practices, Test Patches & SW changes
52 Copyright 2013, Paychex, Inc.
MAA ASSESSMENT • Monitoring/Alerting
• Validation of Auto Service Request (ASR) • Review Oracle Enterprise Manager integration
• Patching – Minimum once/6 months • Working with ACS to begin Platinum Services • 2-4 Patching events / year
• Implement FLASHBACK • Logical corruption protection
• Enhance Backup/Recovery • ZFS Storage Appliance
• Implement health check process • Run and review exachk weekly • Standard DB Health Checks
53 Copyright 2013, Paychex, Inc.
7420 ZFS STORAGE FOR BACKUPS • One in each data center
• Backing up both Prod and Standby
• RMAN Nightly Incremental with Image Copies • Using RMAN Catalog
• Moving GG files (Trail files) off Exadata to ZFS
• We are getting 9TB/Hour backup rates!
54 Copyright 2013, Paychex, Inc.
TO-DO LIST • EXTEND ZFS STORAGE FOOTPRINT – Oct 2013
• SCHEDULE PATCHING WITH ACS – Nov 2013
• UPGRADE OEM TO 12c – Feb 2014
• Evaluating use of the ODA for our OEM platform
• CONFIGURE USE OF DATA GUARD BROKER - Dec 2013
• UPGRADE GG TO 11.2 or 12 - 2014 • Replace GG Monitor with 12c EM Plug-In
• DETERMINE TIMELINE FOR ORACLE DATABASE 12c UPGRADE - 2014
55 Copyright 2013, Paychex, Inc.
PARTING THOUGHTS, LESSONS LEARNED… • Platform and Application are evolving together
• 20% client base to date, big push in 2014 • Dedicated Team within DBA Organization
• But don’t leave IT OPS Partners out of the discussion…. • Work with Oracle – They built it, they are the experts !
• End-End Single Vendor Support • Unified Monitoring of all components
• 3x Disk Mirroring – Not using it, paid the price once • Practice Fail-Over and other recovery scenarios • Standard Oracle DB – Same MAA Best Practices Apply ! • THANK YOU and have a GREAT week !!
56
Paychex EXADATA & MAA
Curt Lukenbill Contact : [email protected] or on LinkedIn
Oracle OpenWorld September 2013
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 57
Resources
OTN HA Portal: http://www.oracle.com/goto/availability
Maximum Availability Architecture (MAA): http://www.oracle.com/goto/maa
Exadata on OTN: http://www.oracle.com/technetwork/database/exadata/index.html
Exadata MAA Video: http://vimeo.com/62754145
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 58
Graphic Section Divider
Copyright © 2013, Oracle and/or its affiliates. All rights reserved. 59