43
Real World Experience Running GoldenGate on Exadata January 20, 2013 Presented by: Alex Fatkulin Senior Consultant

Fatkulin presentation

  • Upload
    enkitec

  • View
    757

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Fatkulin presentation

Real World Experience Running

GoldenGate on Exadata

January 20, 2013

Presented by: Alex FatkulinSenior Consultant

Page 2: Fatkulin presentation

Who am I ?

Senior Technical Consultant at Enkitec

11 years using Oracle

Clustered and HA solutions

Database Development and Design

Technical Reviewer

Blog at http://afatkulin.blogspot.com

2

Page 3: Fatkulin presentation

My Replication Experience

Materialized View Replication – since 8i

Oracle Streams – since 9iR2

Oracle GoldenGate – since 10.4 (2009)

3

Page 4: Fatkulin presentation

GoldenGate + Exadata

Gaining a lot of market momentum

Common scenariosZero Downtime Migrations and UpgradesETL Data FeedsData Replication

Solution effectiveness depends on in-depth technical knowledge

Standard documentation is often not enough

4

Page 5: Fatkulin presentation

Agenda

General configuration

Tips & TricksManagerExtractDataPumpReplicat

DBFS

Grid Infrastructure Integration

5

Page 6: Fatkulin presentation

General Configuration

6

Page 7: Fatkulin presentation

General Configuration

GoldenGate binaries local on each compute node

DBFSTrail filesParameter filesCheckpoint filesBounded recovery filesReport files (optional)

DB accountsGGEXT – ExtractGGREP – Replicat, GGSCHEMA

7

Page 8: Fatkulin presentation

Manager

8

Page 9: Fatkulin presentation

Manager

PURGEOLDEXTRACTS to delete old trail files purgeoldextracts ./dridat/aa, usecheckpoints, minkeephours 8,

maxkeephours 8

PURGEDDLHISTORY to cleanup DDL history tables purgeddlhistory minkeepdays 7, maxkeepdays 7

PURGEMARKERHISTORY to cleanup Marker Table purgemarkerhistory minkeepdays 7, maxkeepdays 7

Start other processes when Manager starts AUTOSTART ER * Required if using Oracle’s Grid Infrastructure integration scripts

9

Page 10: Fatkulin presentation

Extract

10

Page 11: Fatkulin presentation

Redo Access

Redo is located on ASM

Archived logs usually located on ASM

Extract redo access optionsASM InstanceDBLOGREADER Integrated Capture

11

Page 12: Fatkulin presentation

Redo Access - ASM Instance

TRANLOGOPTIONS ASMUSER, ASMPASSWORD

Works through ASM instance callsdbms_diskgroup.getfileattrdbms_diskgroup.opendbms_diskgroup.read

Not very efficient

Legacy

12

Page 13: Fatkulin presentation

Redo Access - DBLOGREADER

TRANLOGOPTIONS DBLOGREADER

Works through OCI callsOCIPOGGRedoLogOpenOCIPOGGRedoLogReadOCIPOGGRedoLogClose

Select Any Transaction privilege required

Available since GoldenGate 11.1 and Oracle 10.2.0.5

13

Page 14: Fatkulin presentation

Redo Access - Integrated Capture

Oracle Streams Capture front end

Extract becomes an XStreams clientReceives LCRs and transforms these to trail filesOracle Streams Complexity is hidden by ggsci

Allows access to all Oracle Streams Capture features

Available since GoldenGate 11.2

Latest BP recommended (Streams Capture bugs)

14

Page 15: Fatkulin presentation

Extract – SCN token

Capture SCN for every operation in the trail file table user1.*, tokens(SCN=@getenv("oratransaction","scn"));

15

Logdump 10 >open ./dirdat/aa000002Current LogTrail is /u01/app/oracle/dbfs_mount/dbfs/ggs/dirdat/aa000002Logdump 11 >usertoken detailLogdump 12 >ggstoken detailLogdump 15 >n

2013/01/26 15:00:18.000.000 Insert Len 9 RBA 1092Name: SRC1.TAfter Image: Partition 4 GU s 0000 0005 0000 0001 32 | ........2

User tokens: 12 bytesSCN : 9352124

GGS tokens:TokenID x52 'R' ORAROWID Info x00 Length 20 4141 414f 7261 4141 4641 4144 4141 5441 4142 0001 | AAAOraAAFAADAATAAB..TokenID x4c 'L' LOGCSN Info x00 Length 7 3933 3532 3132 34 | 9352124TokenID x36 '6' TRANID Info x00 Length 8 3130 2e36 2e37 3639 | 10.6.769

Page 16: Fatkulin presentation

Extract – Compressed Tables

Extract will ABEND if not using Integrated Capture

16

ERROR OGG-01028 Object with object number 60573 is compressed. Table compression is not supported.

Space Advisor is often the causeDBMS_TABCOMP_TEMP_CMP

Table may no longer exist (dropped)Looking up in DBA_OBJECTS will produce zero rows

Page 17: Fatkulin presentation

Extract – Compressed Tables

17

SQL> select owner, object_name from dba_objects where object_id=60573;

no rows selected

SQL> select objectowner, objectname, optime from ggrep.ggs_ddl_hist where objectid = 60573 and fragmentno=1;

OBJECTOWNER OBJECTNAME OPTIME--------------- --------------- -------------------SRC1 COMP_TABLE 2013-01-26 16:09:43

SQL> begin 2 dbms_logmnr.start_logmnr( 3 startTime => to_date('2013-01-26 16:09:00', 'yyyy-mm-dd hh24:mi:ss'), 4 endTime => to_date('2013-01-26 16:10:00', 'yyyy-mm-dd hh24:mi:ss'), 5 Options => dbms_logmnr.DICT_FROM_ONLINE_CATALOG+dbms_logmnr.CONTINUOUS_MINE 6 ); 7 end; 8 / PL/SQL procedure successfully completed

SQL> select seg_owner, seg_name, to_char(timestamp, 'yyyy-mm-dd hh24:mi:ss') dt from v$logmnr_contents where data_obj#=60573 and operation='DDL' and rownum=1;

SEG_OWNER SEG_NAME DT--------------- --------------- -------------------SRC1 COMP_TABLE 2013-01-26 16:09:45

Page 18: Fatkulin presentation

Extract – Down Instances

Down Instances may prevent Extract from starting Instances kept offline in the cluster Instances that crashed

Extract checks for the latest SEQUENCE# lower than Extract’s begin time in V$LOG

If ARCHIVED = ‘YES’ it will lookup that SEQUENCE# in V$ARCHIVED_LOG

If archived log has been deleted Extract will ABENDCommonly happens if instance has been down for a

long time18

Page 19: Fatkulin presentation

Extract – Down Instances

19

SELECT sequence#, DECODE(archived, 'YES', 1, 0) sequence#=34, archived=‘YES’ FROM v$log WHERE thread# = 2 AND sequence# = (select max(sequence#) from v$log where first_time < TO_DATE('2013-01-26 20:56:05', 'YYYY-MM-DD HH24:MI:SS') AND thread# = 2);

SELECT name no rows! FROM v$archived_log WHERE sequence# = 34 AND thread# = 2 AND resetlogs_id = 786746958 AND archived = 'YES' AND deleted = 'NO' AND standby_dest = 'NO' order by name DESC

ERROR OGG-00446 Could not find archived log for sequence 34 thread 2 under default destinations

Page 20: Fatkulin presentation

Extract – Down Instances

20

create or replace view ggext.v$log as select group#, thread#, sequence#, bytes, blocksize, members, case thread# when 2 then 'NO' else archived end archived, status, first_change#, first_time, next_change#, next_time from sys.v_$log;

Temporary workaround (hack)

Extract will no longer try to lookup archived log and will be able to start

Page 21: Fatkulin presentation

Extract – Cache Manager

21

CACHEMGR virtual memory values (may have been adjusted)CACHESIZE: 64GCACHEPAGEOUTSIZE (normal): 8MPROCESS VM AVAIL FROM OS (min): 128GCACHESIZEMAX (strict force to disk): 96G

Defaults might be set too high

Large transactions will cause Extract to consume up to CACHESIZEMight result in excessive swapping and memory

usage on the compute nodes

Adjust using CACHEMGR CACHESIZE 4G (example) Insufficient cache will impact large transactions

performance due to excessive page out

Page 22: Fatkulin presentation

Extract – Bounded Recovery

22

Allows Extract to save in-flight transactions state

Located in GGS_HOME/BR directory

Done every 4 hours by defaultPerform now: SEND <GROUP> BR BRCHECKPOINT

IMMEDIATE

Make these available to each node in case of a failover

If bounded recovery files got corrupted Extract can still be started with BRRESET

Page 23: Fatkulin presentation

Extract – Bounded Recovery

23

info EXA_EXT, showch... Recovery Checkpoint (position of oldest unprocessed transaction in the data source): Thread #: 1 Sequence #: 84 RBA: 62266896 Timestamp: 2013-01-27 12:32:58.000000 SCN: 0.10578483 (10578483) Redo File: +DATA/dbm/onlinelog/group_2.258.786746973... BR Begin Recovery Checkpoint: Thread #: 2 Sequence #: 49 RBA: 340992 Timestamp: 2013-01-27 12:50:01.000000 SCN: 0.10600667 (10600667) Redo File:

Check bounded recovery info

Page 24: Fatkulin presentation

DataPump

24

Page 25: Fatkulin presentation

DataPump – General Config

Use PASSTHRU to skip data dictionary lookups

Specify GoldenGate VIP in RMTHOST If using Grid Infrastructure Integration

Use TCPFLUSHBYTES to allow larger writes on the Collector side

Use different names for source and destination trailsAvoids trail file purge bugs

25

Page 26: Fatkulin presentation

DataPump – Network Compression

Trail files generally compress wellEverything passed as stringsFully qualified object names for each row changed

Use COMPRESS option (RMTHOST) to compress trails sent over the network

26

GGSCI (exa1.test.com) 37> send exa_dp tcpstats...Data compression is enabledCompress CPU Time 0:00:00.000000Compress time 0:00:00.581401, Threshold 1000Uncompressed bytes 77449138Compressed bytes 6291347, 133211222 bytes/second

Page 27: Fatkulin presentation

DataPump – Trail not Available

Process will get stuck on positioning if trail [sequence] is not available

27

GGSCI (exa1.test.com) 4> add extract exa_dp, exttrailsource ./dirdat/aaEXTRACT added.GGSCI (exa1.test.com) 2> info EXA_DP

EXTRACT EXA_DP Last Started 2013-01-26 19:51 Status RUNNINGCheckpoint Lag 00:00:00 (updated 00:00:03 ago)Log Read Checkpoint File ./dirdat/aa000000 First Record RBA 0

...open("./dirdat/aa000000", O_RDONLY) = -1 ENOENT (No such file or directory)nanosleep({1, 0}, NULL) = 0open("./dirdat/aa000000", O_RDONLY) = -1 ENOENT (No such file or directory)nanosleep({1, 0}, NULL) = 0...

GGSCI (exa1.test.com) 7> alter EXA_DP, extseqno 2EXTRACT altered.

Page 28: Fatkulin presentation

Replicat

28

Page 29: Fatkulin presentation

Replicat – General Configuration

Use BATCHSQL where appropriate

Capturing SCNs as tokens on Extract side greatly helps in troubleshooting

Use multiple Replicat and Service Names to direct the workloadSegregate workload by instance affinity if you can

29

srvctl add service -d dbm -s ogg_rep1 -r dbm1 -a dbm2,dbm3,dbm4 ...srvctl add service -d dbm -s ogg_rep2 -r dbm2 -a dbm1,dbm3,dbm4 ......

Page 30: Fatkulin presentation

Replicat - Sequences

Not very efficient sequence replication algorithmNo bind variables in replicateSequence calls

Larger sequence cache on source helps somewhat

30

BEGIN ggext .replicateSequence (TO_NUMBER(2), TO_NUMBER(20), TO_NUMBER(1), 'REP1', TO_NUMBER(0), 'S1', UPPER('ggrep'), TO_NUMBER(1), TO_NUMBER (0), ''); END;

Sequence values increment one-by-one and in nocache modeSYS.SEQ$ might become point of contention

Can result in a significant drag on highly active DBs

Page 31: Fatkulin presentation

Replicat – Transient PK Updates

In the past transient PK updates were problematic

31

SQL> select * from src1.t; N V-- - 1 a 2 a 3 a

SQL> update src1.t set n=n+1; 3 rows updated

SQL> commit; Commit complete

Page 32: Fatkulin presentation

Replicat – Transient PK Updates

Handled transparently since 11.2.0.2

32

SQL> update src1.t set n=2 where n=1; update src1.t set n=2 where n=1 ORA-00001: unique constraint (SRC1.SYS_C004692) violated

SQL> exec dbms_xstream_gg.enable_tdup_workspace; PL/SQL procedure successfully completed SQL> update src1.t set n=2 where n=1; 1 row updated ... SQL> exec dbms_xstream_gg.disable_tdup_workspace; PL/SQL procedure successfully completed SQL> commit; Commit complete

Page 33: Fatkulin presentation

Replicat – GGS_STICK table

Temporary table used by DDLREPLICATION package

Any session which performed DDL will hold a TO enqueue on GGS_STICKTemporary Table Object Enqueue

Will prevent GGSCHEMA user drop

33

SQL> drop table ggrep.ggs_stick; drop table ggrep.ggs_stick ORA-14452: attempt to create, alter or drop an index on temporary table already in use

Page 34: Fatkulin presentation

DBFS

34

Page 35: Fatkulin presentation

DBFS

Create non-partitioned file system

Mount on all nodes

Use Oracle Grid Infrastructure to control where GoldenGate is runningAvoids accidental trail corruption

35

Page 36: Fatkulin presentation

DBFS Performance

Understanding I/O profileExtract

4KB writes into the trail

DataPump 1MB reads from the trail

Collector 24KB (and smaller) writes into the trail (default) Use DataPump’s RMTHOST TCPFLUSHBYTES to tune

Replicat 1MB reads from the trail

AIO not utilized by GoldenGate

36

Page 37: Fatkulin presentation

DBFS Performance

All IO ends up in a SecureFile segment inside a DBRelatively long code pathFavors throughput vs latency

Set SecureFiles segments to cachealter table dbfs.t_dbfs modify lob (filedata) (cache)

Put segments into recycle pool (if configured)alter table dbfs.t_dbfs modify lob (filedata) (storage

(buffer_pool recycle))

37

Page 38: Fatkulin presentation

Grid Infrastructure Integration

38

Page 39: Fatkulin presentation

Grid Infrastructure Integration

Note 1313703.1 Oracle GoldenGate high availability using Oracle ClusterwareRelies on Manager process to control everything elseGoldenGate checkpoint files manipulations

(copy/delete)

Use Oracle Grid Infrastructure Bundled AgentsRelies on Manager process as well

Write your own scripts

39

Page 40: Fatkulin presentation

Grid Infrastructure Bundle Agents

Download from Oracle Clusterware web pagehttp://oracle.com/goto/Clusterware

Unzip into temporary location and install

40

./xagsetup.sh --install --directory /u01/app/oracle/xag --nodes exa2,exa3,exa4

Page 41: Fatkulin presentation

Grid Infrastructure Bundle Agents

Make sure CRS_HOME environment variable is setScript relies on CRS_HOME to find crsctl executable

41

./agctl.pl add goldengate ogg1 \--gg_home /u01/app/oracle/ggs \--instance_type both \--oracle_home /u01/app/oracle/product/11.2.0/db_1 \--db_services dbm.ogg_rep1 \--databases dbm \--monitor_extracts exa_ext \--monitor_replicats exa_rep \--vip_name ora.dbm1.vip

[oracle@exa1 ~]$ crsctl status res xag.ogg1.goldengateNAME=xag.ogg1.goldengateTYPE=xag.goldengate.typeTARGET=OFFLINESTATE=OFFLINE

[oracle@exa1 ~]$ crsctl start res xag.ogg1.goldengateCRS-2672: Attempting to start 'xag.ogg1.goldengate' on ‘exa1'CRS-2676: Start of 'xag.ogg1.goldengate' on ‘exa1' succeeded

Page 42: Fatkulin presentation

Write your own scripts

Not as hard as you can imagine

Create separate resource scriptsManagerExtractReplicatDataPump

Add resource example

42

crsctl add resource $RESNAME \ -type local_resource \ -attr "ACTION_SCRIPT=$ACTION_SCRIPT,\ CHECK_INTERVAL=30,RESTART_ATTEMPTS=10,\START_DEPENDENCIES='hard(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)pullup(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)',\ STOP_DEPENDENCIES='hard(ora.dbm.db,dbfs_mount,intermediate:ora.dbm1.vip)',\ SCRIPT_TIMEOUT=300"

Page 43: Fatkulin presentation

Q & A

Email: [email protected]

Blog: http://afatkulin.blogspot.com

43