Upload
jayson-lewis
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
Two Node Architecture, Unprotected
Two Node Architecture, Protected Apps tier and unprotected DB tier
Two Node Architecture, Protected Apps tier and DB tier
Failover Cluster• Detecting failure by monitoring the heartbeat and checking status of resources• Reorganizing Cluster membership in the cluster manager• Transferring Disk ownership from primary node to secondary node• Mounting the FS on secondary node• Starting DB instance• Recovering the Database and rollback of uncommitted data• Reestablishing the client connections to the failover node
FAILOVER CLUSTER OFFERINGS
• Veritas cluster server
• HP Service Guard
• Microsoft Cluster Service with Oracle Failsafe
• RedHat Linux Advanced Server 2.1
• Sun Cluster Oracle Agent
• Compaq, now HP, Segregated Cluster
• HACMP
RAC
Scalable RAC
Real Application Cluster• Many instances of Oracle running on many nodes• Multiple instances share a single physical database• All instances have common data, control, and initialization files• Each instances has individual, shared log files and rollback segments
or undo tablespaces• All instances can simultaneously execute transactions against the
single database• Caches are synchronized using Oracle’s Global Cache Management
technology (Cache Fusion)
RAC Building Blocks
• Instance and Database files
• Shared storage with OCFS, CFS or raw devices
• Redundant HBA cards per HOST
• Redundant NIC cards per HOST, one for cluster interconnect and one for LAN connectivity
• Local RAID protected drives for ORACLE_HOMES ( OCFS does not support ORACLE_HOME install)
CLUSTERINTER CONNECTFUNCTION • - Monitoring Health, status and message synchronization• - Transporting Distributed Lock manager messages• - Accessing remote File system• - Moving application specific traffic• - providing cluster alias routingInterconnect Requirements • - Low latency for short messages• - High speed and sustained data rates for large messages• - LOW Host CPU utilization• - Flow Control, Error Control and heart beat continuity monitoring• - switched network that scale well
INTERCONNECT PRODUCTS
• Memory Channel
• SMP Bus
• Myrinet
• Sun SCI
• Gigabit Ethernet
• Infiband Interconnect
INTERCONNECT PROTOCOL
• TCP/IP
• UDP
• VIA
• RDG
• HMP
IO CHANNEL HBA Products
• Adaptec
• DPT
• LSI Logic
• Interphase
• Qlogic
• Emulex
• JNI
FACRIC SWITCHES
• mcDATA
• EMC
• QLOGIC
• BROCADE
CLUSTER NODES
NUMASMP • - shared system bus and IO• - expensive and scalability problems• - Adding more CPU can result into upgrading architecture components• - DELL and HP-Compaq
BLADE Servers• - BladeFram system from egenera• - egenera - 24 2 way and 4 way SMP processing resources• - egenera - redundant central controllers ,redundant high-speed interconnects, PAN manager• - egenera - PAN manager handles external storage mapping and virtualization• - egenera - PAN manager handles , IO and network traffic to and from individual servers
Oracle’s High Availability (HA) Solution Stack
System Failure
Data Failure& Disaster
Human Error
Real Application ClustersContinuous Availability for all Applications
Data GuardZero Data Loss
Flashback QueryEnable Users to Correct their Mistakes
SystemMaintenance
Data Maintenance
Dynamic ReconfigurationCapacity on Demand without Interruption
Online RedefinitionAdapt to Change Online
UnplannedDowntime
PlannedDowntime
Shared Storage Options
• NFS Mounted storage ( Netapp )
• SCSI shared storage with OCFS, OFS, Raw devices
• Fiber channel Storage with fabric Architecture
11i Steps - 1
• Install RED HAT As 2.1 on all nodes
• Install 11i as single node install on Apps Tier
• Attached shared storage and install drivers for HBA
11i Steps -2 ( install OS Patches)
• rpm -Uv tar-1.13.25-9.i386.rpm
• This provides an updated version of tar
• Allows a user to tar files from a running database on OCFS
• examples :
• tar --o_direct -cvf /tmp/backup.tar *
11i Steps -2 ( install OS Patches)
• rpm -Uv fileutils-4.1-4.2.i386.rpm• This provides an updated version of cp and dd• Allows a user to copy files from a running database on
OCFS
• examples :• cp --o_direct /ocfs/quorum.dbf /tmp/backup/quorum.dbf• dd o_direct=yes if=/ocfs/quorum.dbf
of=/tmp/backup/quorum.dbf
11i Steps -3 Install oracle provided RPM’s
• ocfs-support-1.0.9-11.i686.rpm
• ocfs-tools-1.0.9-11.i686.rpm
• j2sdk-1_3_1_09-linux-i586.rpm.bin
• unzip-5.50-30.i386.rpm
• zip-2.3-10.i386.rpm
• wu-ftpd-2.6.1-21.i386.rpm
• hangcheck-timer-2.4.9-e.10-0.4.0-2.i686.rpm
• hangcheck-timer-2.4.9-e.10-enterprise-0.4.0-2.i686.rpm
11i steps -3 ( interconnect)
• ifconfig eth0:0 192.168.2.100• route add -host 192.168.2.100 dev eth0:0• Do this on each node• Create watchdog file (oracle installer checks for this to
install cluster option) # touch /dev/watchdog• Setup hangcheck-timer module
– # vi /etc/modules.conf– options hangcheck-timer hangcheck_tick=30
hangcheck_margin=180– # modprobe hangcheck-timer
11i steps -5 OCFS.conf – 5
• # ocfstool ( from x windows)• # ocfs config• # Ensure this file exists in /etc• #• node_name = linux3.home.com• node_number = • ip_address = 192.168.1.100• ip_port = 7000• comm_voting = 1• guid = 9D3B77AF2FF26E92E25D00E04CA44B58
11i Steps -6 install OCFS
• mkfs.ocfs -F -b 128 -L /s01 -m /s01 -u 500 -g 500 0755 /dev/sda1
• srvconfig_loc=/s01/oragsd-config ( touch this file)
11i steps -7 OCM• $ ls• If cmcfg.ora exists:• $ cp cmcfg.ora cmcfg.ora.original• If cmcfg.ora does not exist:• $ cp cmcfg.ora.tmp cmcfg.ora• $ echo HostName=dc1node3inter >> cmcfg.ora• $ vi cmcfg.ora• [comment out WatchdogSafetyMargin and WatchdogTimerMargin]• PrivateNodeNames=linux22 linux33 • PublicNodeNames=linux2 linux3• MissCount=210• KernelModuleName=hangcheck-timer• CmDiskFile=/u02/oracm-qourum• $ vi ocmargs.ora• [comment out first line, which contains the word “watchdogd”]• $ cd ../bin• $ cp ocmstart.sh ocmstart.sh.original• $ vi ocmstart.sh• [remove words “watchdog and” from line containing “Sample startup script...”]• [remove every line containing “watchdogd”, uppercase or lowercase. If it’s in a if/then/fi then remove the whole if/then/fi.]• $ su – root• export ORACLE_HOME=/d02/oracle/proddb/9.2.0• /d02/oracle/proddb/9.2.0/oracm/bin/ocmstart.sh• Configure and Start Cluster Manager• $ cd $HOME/product/9.2/oracm/admin
11i steps -4 ( cp/dd - DB files to shared storage )
• cp --o_direct /d03/oracle/proddata/* /s01/oracle/proddata/
• Recreate the controlfile
11i steps 8 – init.ora / spfile
• Create UNDO TBS for each instance
• Enable and disable thread for instance 2 from instance 1 and vice versa
11i steps 9 – instance 1• # RAC-specific Parameters• #• #########• cluster_database = true• cluster_database_instances=2• thread = 1• instance_number = 1• instance_name = PRODi1• service_names = PROD• local_listener = PRODi1• remote_listener = PRODi2
11i steps 10 – instance 2
• cluster_database = true• cluster_database_instances=2• thread = 2• instance_number = 2• instance_name = PRODi2• service_names = PROD• local_listener = PRODi2• remote_listener = PRODi1
11i Apps tier – 806/iAS tnsnames.ora
• PROD = (DESCRIPTION=• (ADDRESS_LIST =• (ADDRESS=(PROTOCOL=tcp)(HOST=linux1)(PORT=1521))• (ADDRESS=(PROTOCOL=tcp)(HOST=linux2)(PORT=1521))• )• (CONNECT_DATA=(SERVICE_NAME=PROD)(SERVER=DEDICATED))• )
• PRODi2 = (DESCRIPTION=• (ADDRESS=(PROTOCOL=tcp)(HOST=linux2)(PORT=1521))• (CONNECT_DATA=(INSTANCE_NAME=PRODi2)(SERVICE_NAME=PROD))• )
• PRODi1 = (DESCRIPTION=• (ADDRESS=(PROTOCOL=tcp)(HOST=linux1)(PORT=1521))• (CONNECT_DATA=(INSTANCE_NAME=PRODi1)(SERVICE_NAME=PROD))• )
Modify DBC file for Failover
• APPS_JDBC_DRIVER_TYPE=THIN• FND_MAX_JDBC_CONNECTIONS=100
# Setup at Apps Tier• APPS_JDBC_URL=jdbc:oracle:thin:@(DESCRIPTION=
(ADDRESS_LIST=(LOAD_BALANCE=ON)
(ADDRESS=(PROTOCOL=TCP)(HOST=linux1)(PORT=1521))
(ADDRESS=(PROTOCOL=TCP)(HOST=linux2)(PORT=1521)))
(CONNECT_DATA=(SERVICE_NAME=prod)))
WHAT can & cannot failover
• SQL* PLUS will failover using TAF
• JDBC Connections will failover
• Forms run time connections will not, users will have to reconnect