46
Oracle High Availability in application development Aleksandr Tokarev

Oracle High Availabiltity for application developers

Embed Size (px)

Citation preview

Oracle High Availabilityin application development

Aleksandr Tokarev

Plan

• Basic terms• Aspects of HA• Oracle approach to HA• HA for application development• Q&A session• Load balancing (optional)

Availability The ability to obtain a service to submit

new work, update or alter existing work or collect the results of previous work.

If a customer cannot access the system even all components are available, the system unavailable.

High Availability

The characteristic of a system, which aims to ensure an agreed level of operational availability for a higher than normal period.

Principles

1. Elimination of single points of failure2. Reliable crossover3. Early failures detection

Real world challenges• Growing user number • Users are located in different time zones• Stretching maintenance window• Failures become more complicated

Oracle HA stack

1. RAC2. At least

• DataGuard/Active Data Guard• Golden Gate or similar replication tool

3. Editioned objects 4. Web Logic Application server +

Universal Connection Pool (JDBC/ODP)

HA termsONS (Oracle Notification Service):• Publish/subscribe service for clusterware events• Could be localy or remotely consumed• Automatically installed and configured during Oracle

clusterware installation

FAN (Fast Application Notification): • Subset of ONS• Notifies client about service changes (what, when, where)• Could be issued by RAC (RAC one node), DG (fast start

failover)• 2 types: HA events and Load balancing events• 3 events could be used by applications: UP, DOWN, LBA • Integrated into: OCI, UCP, JDBC, ODP.NET

HA terms

FCF (Fast Connection Failover):Client-side feature to receive/process FAN. It works on UCP level.

TAF (Transparent Application Failover): Client-side feature to restore connections/resume select statements. It works on OCI level.

HA terms

TG (Transaction Guard):A way to provides at-most-once execution of transactions. Preserves commit outcome.

AC (Application Continuity):An approach to replay fallen transaction i.e. mimic node fail as a delay to client with minimal intervention to application code.

TAF• OCI feature (JDBC OCI, ODP.NET)• Could be done on client (tns) or server (dbms_service)

side• Autoconnects to preconfigured instance• Needn’t FAN• Works with RAC/RAC one node, DG physical standby,

Fast Restart Single instance• Based on callbacks (OracleOCIFailover interface)• Could continue SELECT statement from failure point• Session states, packages variable should be initialized in

callbacks

TAF Client sideRACDB_TAF = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = linux1-vip)(PORT = 1521)) (ADDRESS = (PROTOCOL = TCP)(HOST = linux2-vip)(PORT = 1521)) (LOAD_BALANCE = yes) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = racdb_taf.idevelopment.info) (FAILOVER_MODE = (TYPE = SELECT) – we could use SESSION here (METHOD = BASIC) – we could use PRECONNECTED here (RETRIES = 180) (DELAY = 5) ) ) )

TAF server sidesrvctl add service -d orcl -s TAF_TEST -r "orcl1,orcl2,orcl3,orcl4"

srvctl start service -s TAF_TEST -d orcl

begin dbms_service.modify_service(service_name => 'TAF_TEST', aq_ha_notifications => false, failover_method => dbms_service.failover_method_basic, failover_type => dbms_service.failover_type_session, failover_retries => 60, failover_delay => 3);end;

void Main(string[] args) { // register callback function OnFailOver ConObj.Failover += new OracleFailoverEventHandler(OnFailOver); //here let's establish connections and do whatever we want} //Failover Callback Function public FailoverReturnCode OnFailOver(object sender, OracleFailoverEventArgs eventArgs) { switch (eventArgs.FailoverEvent) { case FailoverEvent.Begin: { Console.WriteLine(" \nFailover Begin - Failing Over ... Please standby \n"); Console.WriteLine(" Failover type was found to be " + eventArgs.FailoverType); break; } case FailoverEvent.Abort: { Console.WriteLine(" Failover aborteded.\n"); break; } case FailoverEvent.End: { Console.WriteLine(" Failover ended ...resuming services\n"); break; } case FailoverEvent.Error: { Console.WriteLine(" Failover error gotten. Sleeping...\n"); Thread.Sleep(3000); return FailoverReturnCode.Retry; } } return FailoverReturnCode.Success; }

TAF ODP.NET example

TAF Javapublic interface OracleOCIFailover{

// Possible Failover Typespublic static final int FO_SESSION = 1;public static final int FO_SELECT = 2;public static final int FO_NONE = 3;public static final int;

// Possible Failover events registered with callbackpublic static final int FO_BEGIN = 1;public static final int FO_END = 2;public static final int FO_ABORT = 3;public static final int FO_REAUTH = 4;public static final int FO_ERROR = 5;public static final int FO_RETRY = 6;public static final int FO_EVENT_UNKNOWN = 7;

public int callbackFn (Connection conn, Object ctxt, // ANy thing the user wants to save int type, // One of the possible Failover Types int event ); // One of the possible Failover Events

FCF

• Asynchronously process UP/DOWN HA FAN events

• Removes affected connections • Works under Universal Connection Pool

(RAC/RAC one node, DG, Fast Restart)• ONS should be enabled

srvctl add ons srvctl enable ons srvctl start ons

FCF Java examplePoolDataSource pds = PoolDataSourceFactory.getPoolDataSource();

pds.setConnectionPoolName("FCFSamplePool");pds.setFastConnectionFailoverEnabled(true);pds.setONSConfiguration("nodes=racnode1:4200,racnode2:4200\

nwalletfile= /oracle11/onswalletfile");pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSour

ce");pds.setURL("jdbc:oracle:thin@(DESCRIPTION= "+ "(LOAD_BALANCE=on)"+ "(ADDRESS=(PROTOCOL=TCP)(HOST=racnode1) (PORT=1521))"+ "(ADDRESS=(PROTOCOL=TCP)(HOST=racnode2) (PORT=1521))"+ "(CONNECT_DATA=(SERVICE_NAME=service_name)))");

Ensure ons.jar is on application CLASSPATH.

FCF Java example boolean retry = true; while(retry) { try { //Getting a RAC connection from the pool conn = pds.getConnection(); // Executing a query on the connection. rs = stmt.executeQuery("select user from dual"); rs.next(); System.out.println("\nConnected as : " + rs.getString(1)); //Setting retry to false to exit the loop retry = false; } catch (SQLException eSQL) { System.out.println("\nSQLException: " + eSQL); // Checking connection usability after a RAC-down event triggers UCP FCF actions if (conn == null || !((ValidConnection) conn).isValid()) { try//Closing the connection { conn.close(); } catch (SQLException eClose) { System.out.println("\nException arose when closing connection: " + eClose); } //Setting retry to true to try again retry = true; } } Thread.sleep(1000); }

FCF ODP.NET exampleusing System;using Oracle.DataAccess.Client; class HAEventEnablingSample{ static void Main() { OracleConnection con = new OracleConnection(); // Open a connection using ConnectionString attributes // Also, enable "load balancing" con.ConnectionString = "User Id=user_name;Password=password;Data Source=oracle;" + "Min Pool Size=10;Connection Lifetime=120;Connection Timeout=60;" + "HA Events=true;Incr Pool Size=5;Decr Pool Size=2"; con.Open(); // Create more connections and perform work against the database here. con.Dispose(); }

Before Oracle 12 .config file should be changed: <onsConfig mode="remote"> <ons database="db1"> <add name="nodeList" value="racnode1:4100, racnode2:4200" /> </ons> <ons database="db2"> <add name="nodeList" value=" racnode3:4100, racnode4:4200" /> </ons> </onsConfig>

FCF ODP.NET exampleIf you need enhanced error treatment use OracleHAEventArgs

class:

public delegate void OracleHAEventHandler(object sender, OracleHAEventArgs eventArgs);

public static void OnFANEventHandler(OracleHAEventArgs eventArgs) { lock(typeof(FANCallBackSample)) { if (eventArgs.Status == OracleHAEventStatus.Down){ // Your event treatment } ... ... ... }}

TG

API under JDBC, OCI, ODP.NETto resolve possible ‘double-execution’ problem.

Creates globally unique LTXID Stores it on DB + client driver

TG use-case after outage

• Requests from DB LTXID by session handle (DBMS_APP_CONT.GET_LTXID_OUTCOME )

• Gets the commit state before failure• If committed return control to app• If uncommited asks about next actions

TG could be enabled by DBMS_SERVICE.CREATE_SERVICE or srvctl modify service -db DBNAME -s MYSERVICENAME -commit_outcome TRUE

TG Supported COMMITs

• Local transactions• Data definition language (DDL)

transactions• Distributed transactions• Commit on success (auto-commit)• PL/SQL with embedded commit

TG ODP.NET example txn = con.BeginTransaction();

<your processing> try { txn.Commit(); } catch (Exception ex) { if (ex is OracleException) { if (ex.IsRecoverable && ex.OracleLogicalTransaction != null && !ex.OracleLogicalTransaction.Committed) { // safe to re-submit work } else { // do not re-submit work

if (ex.OracleLogicalTransaction.UserCallCompleted){ // return commit success to application to continue }else{ // transaction committed, states such as row count or// out parameters may be lost// if the application needs these states.// return commit success and warn }}

} }

Don’t forget to grant GRANT EXECUTE ON DBMS_APP_CONT TO <Ora user under .NET>. due ODP.Net invokes it implicitly in case of exception.

TG Java exampleprivate static final String GET_LTXID_OUTCOME_WRAPPER = "DECLARE PROCEDURE GET_LTXID_OUTCOME_WRAPPER("+ " ltxid IN RAW,"+ " is_committed OUT NUMBER )

"+ "IS " + " call_completed BOOLEAN; "+ " committed BOOLEAN; "+ "BEGIN "+ " DBMS_APP_CONT.GET_LTXID_OUTCOME(ltxid, committed, call_completed); "+ " if committed then is_committed := 1; else is_committed := 0; end if; "+ "END; "+ "BEGIN GET_LTXID_OUTCOME_WRAPPER(?,?); END;";

boolean getTransactionOutcome(Connection conn, LogicalTransactionId ltxid) throws SQLException { boolean committed = false; CallableStatement cstmt = null; try { cstmt = conn.prepareCall(GET_LTXID_OUTCOME_WRAPPER); cstmt.setObject(1, ltxid); // use this starting in 12.1.0.2 cstmt.registerOutParameter(2, OracleTypes.BIT); cstmt.execute(); committed = cstmt.getBoolean(2); } catch (SQLException sqlexc) { throw sqlexc; } finally { if(cstmt != null) cstmt.close(); } return committed;}

TG Java exampleConnection jdbcConnection = getConnection();boolean isJobDone = false;

while(!isJobDone) { try { // apply the raise (DML + commit): RaiseToAllEmployees(jdbcConnection,5); // no exception, the procedure completed: isJobDone = true; } catch (SQLRecoverableException recoverableException) { // Retry only if the error was recoverable. try { jdbcConnection.close(); // close old connection: } catch (Exception ex) {} // pass through other exception s Connection newJDBCConnection = getConnection(); // reconnect to allow retry // Use Transacton Guard to force last request: committed or uncommitted LogicalTransactionId ltxid = ((OracleConnection)jdbcConnection).getLogicalTransactionId();

isJobDone = getTransactionOutcome(newJDBCConnection, ltxid);

jdbcConnection = newJDBCConnection; }}

AC

• JDBC OCI and JDBC thin Oracle 12 technology. It works with UCP, JDBC, Weblogic + RAC/RAC one node or DG

• Intended to hide failures from client by replaying workload using TG data

• Rebuilds transaction and non-transaction states

• Requires small intervention to code

AC workflow• Client issues a request to UCP• AC retains each call• Failure occurs and FAN is sent to UCP• AC reconnects when it is possible• TG checks transaction states and identifies last

success statement using LTXID• AC restores non-transaction state (package

variables, temporary tables etc) in accordance with settings

• AC replays saved calls in accordance with given boundaries and commits

AC configuration• Ask you DBA to configure AC service either by srvctl or

dbms_service.declare params dbms_service.svc_parameter_array;begin params('FAILOVER_TYPE'):='TRANSACTION'; params('REPLAY_INITIATION_TIMEOUT'):=300; params('FAILOVER_DELAY'):=3; params('FAILOVER_RETRIES'):=30; params('commit_outcome'):='true'; dbms_service.modify_service('[your service]',params);end;

• Don’t forget to grant execute on dbms_app_cont• Set for your sequences, guid and timestamp KEEP

properties GRANT [KEEP DATE TIME | KEEP SYSGUID].. [to USER]GRANT KEEP SEQUENCE.. [to USER] on [sequence object];ALTER SEQUENCE.. [sequence object] [KEEP];

AC Java exampleprivate void DBAction(Connection c, int numValue) throws SQLException {    String updsql = "UPDATE hr.employes " +                    "SET job_id=job_id " +                    "WHERE employee_id=?";    PreparedStatement pstmt = null; /*some non-transactional actions for instance sending email about employes

processing start using utl_mail*/     /* let’s set boundaries     */    ((oracle.jdbc.replay.ReplayableConnection)c).beginRequest();    pstmt=c.prepareStatement(updsql);

    c.setAutoCommit(false);    for (int i=0;i<numValue;i++) {        pstmt.setInt(1,i);        pstmt.executeUpdate();    }    c.commit();    // End of the Callback.    ((oracle.jdbc.replay.ReplayableConnection)c).endRequest();    pstmt.close();}

AC Java Exampleimport java.sql.*;import oracle.jdbc.pool.*;import oracle.jdbc.*;import oracle.jdbc.replay.*;

public static void main(String args[]) throws SQLException { Connection conn = null;

OracleDataSourceImpl ocpds = new OracleDataSourceImpl(); ocpds.setURL("jdbc:oracle:thin:@rac-scan:1521/app"); ocpds.setUser("user"); ocpds.setPassword("passw"); conn = ocpds.getConnection();

self.dbAction(conn,100000);

conn.close(); }

AC cons

• Doesn’t work with GoldenGate, Active DataGuard

• No .NET support • AC works ‘automagicaly’ so should to be

tested thoroughly• Memory consumption on JDBC side • Application should be AC-aware if a logic

uses non-transactional features

Summary client failover

Technology Connection cleanupAutomatic reconnection Replay

TAF not intendend + long-running queries only

FCF + + hand-coded

TG not intendend - sometimes hand-coded

AC not intendend + sometimes need to specify boundaries

Conclusions

If you wish to achieve true HA it is possible, but you should consider:1. Infrastructure costs2. Licensing burden3. Expenses for software development to support

various types of replay

We are happy to help our customers! Are they ready?

Q&A

Contacts

Feel free to ask:[email protected]

Reasons for smart load balancing

Capacity of HA architecture is changing overtime:

• Not all opened connections are best connections – could be connected to either slow or currently maintained nodes

• New connections shouldn’t even try to use dead, slow or currently maintained nodes

Oracle load balancing types

• Client-side– Uses TNS names for connections to different

nodes (load_balance = on)– Random connection distributions– Works on connection attempt

• Server-side– Relies on server metrics– Connects based on policies and configuration– Works continuously

What’s inside

What for

1. Adjust work distribution based on defined goals on working nodes:

– throughput– service time – CPU utilization (NONE)

2. Reacts as fast as possible on cluster reconfiguration:

– New nodes– Defunct nodes

How to enable LB

Ask your DBA to:

– Install Oracle RAC– Setup ONS/FAN daemons properly– Configure LBA by OEM or DBMS_SERVICE

Example:EXECUTE DBMS_SERVICE.MODIFY_SERVICE (service_name => 'sjob' , goal => DBMS_SERVICE.GOAL_SERVICE_TIME , clb_goal => DBMS_SERVICE.CLB_GOAL_SHORT);

How to enable LB JavaProperties prop = new Properties (); prop.put(oracle.net.ns.SQLnetDef.TCP_CONNTIMEOUT_STR, "" + (1 * 1000) ); // 1 second

PoolDataSource pds = PoolDataSourceFactory.getPoolDataSource(); pds.setConnectionPoolName("FCFSamplePool");pds. setConnectionProperties ( prop );pds.setFastConnectionFailoverEnabled(true); pds.setONSConfiguration("nodes=racnode1:4200,racnode2:4200\

nwalletfile= /oracle11/onswalletfile"); pds.setConnectionFactoryClassName("oracle.jdbc.pool.OracleDataSour

ce");pds.setURL("jdbc:oracle:thin@(DESCRIPTION= (LOAD_BALANCE=on)

(ADDRESS=(PROTOCOL=TCP)(HOST=racnode1) (PORT=1521)) (ADDRESS=(PROTOCOL=TCP)(HOST=racnode2) (PORT=1521)) (CONNECT_DATA=(SERVICE_NAME=service_name)))");

How to enable LB ODP.NETusing System;using Oracle.DataAccess.Client; class ConnectionPoolingSample{ static void Main() { OracleConnection con = new OracleConnection(); //Open a connection using ConnectionString attributes //related to connection pooling. con.ConnectionString = "User Id=scott;Password=tiger;Data Source=oracle;" + "Min Pool Size=10;Connection Lifetime=120;Connection Timeout=60;" + "Incr Pool Size=5; Decr Pool Size=2" +

"HA events=true;load balancing=true;pooling=true"; con.Open(); Console.WriteLine("Connection pool successfully created"); // Close and Dispose OracleConnection object con.Close(); con.Dispose(); Console.WriteLine("Connection is placed back into the pool."); }}

ConclusionIf you don’t want/have time to implement your own load balancing you definitely should consider Oracle real time load balancing.

It works “from the box” in case of FCF is configured properly and easily managed using services.

Q&A

Contacts

Feel free to ask:[email protected]