Eric Grancher, CERN IT department, eric.grancher@cern · Computing and storage needs 7 • Data...

Preview:

Citation preview

Eric Grancher, CERN IT department, eric.grancher@cern.ch

(documents available at https://indico.cern.ch/conferenceDisplay.py?confId=276758)

Agenda

3

• A few words on CERN and the computing challenges, Oracle at CERN

• Consolidation challenge

• Oracle multitenant database

• Real Application Testing / capture and replay

• Conclusions (your turn!)

• (demos, experience and tips)

Jürgen Knobloch- cern-it Slide-4

CERN CERN

27 km circumference

Staff members: about 2500

Research community: 10,000 scientists

Large Hadron Collider - LHC

The most complex machine on earth

• The world biggest particle accelerator

• 600 million collisions / second

5

• Fundamental physics • Why do fundamental particles weigh the

amount they do? • What is 96% of the Universe made of? • Where did the antimatter go to? • What was the universe like just after the

« Big Bang »? • Are there extra dimensions of space?

ATLAS/CMS, March 1st 2013

6

• “Having analysed two and a half times more data than was available for the discovery announcement in July, they find that the new particle is looking more and more like a Higgs boson, the particle linked to the mechanism that gives mass to elementary particles. It remains an open question, however, whether this is the Higgs boson of the Standard Model of particle physics, or possibly the lightest of several bosons predicted in some theories that go beyond the Standard Model. Finding the answer to this question will take time.

• Whether or not it is a Higgs boson is demonstrated by how it interacts with other particles, and its quantum properties. For example, a Higgs boson is postulated to have no spin, and in the Standard Model its parity – a measure of how its mirror image behaves – should be positive.“

• http://home.web.cern.ch/fr/about/updates/2013/03/new-results-indicate-new-particle-higgs-boson

Computing and storage needs

7

• Data volume • 25 PB per year (in files)

• > 5.25 * 1012 rows in an Oracle table (IOT, compression, partition) in one of the databases

• Computing and storage capacity, world-wide distributed • > 150 sites (grid computing)

• > 260 000 CPU cores

• > 269 Po disk capacity

• > 210 Po tape capacity

• Distributed analysis with costs spread in the different sites (« LHC Computing Grid »)

Oracle at CERN

9

• 1982: start

with

Oracle

at CERN

(accelerator

control)

Credit: N. Segura Chinchilla

10

Credit: M. Piorkowski

Consolidation, not easy! (a priori)

11

• Change version, parameters, statistics gathering, hardware! • Errors (ORA-600, 745), different execution plans, different

results?

• Does it fit on the one system? (average / peak!)

• Does one workload impact the others, take all resources at some point?

• Multi instance, schema, virtualisation consolidation, etc.

Oracle Multitenant Database

12

• Introduced in Oracle DB 12.1

• Ideal for consolidation, like virtualisation for database

• … but also additional features (cloning, rapid provisioning, regression testing, faster upgrades, move from one database to another -same storage-)

• SQL level compatibility, tablespace, users, PL/SQL, application unchanged • Any difference can be reported as a bug

• But is this the case for your application?

Oracle Multitenant Database

13

Non CDB

List of users / roles

User PL/SQL software

User tables / indexes

Oracle foreground processes

Database and instance parameters

SYS PL/SQL sofware

Oracle background processes

CDB - 1 PDB

Database and instance parameters

SYS PL/SQL sofware

Oracle background processes

CDB - 2 PDBs

Database and instance parameters

SYS PL/SQL sofware

Oracle background processes

List of users / roles

User PL/SQL software

User tables / indexes

Oracle foreground processes

List of users / roles

User PL/SQL software

User tables / indexes

Oracle foreground processes

List of users / roles

User PL/SQL software

User tables / indexes

Oracle foreground processes

Demo 1

14

• Create a pluggable database

• Create a tablespace, one user, two tables

• Clone a pluggable database

Real Application Testing Capture and Replay

15

• What if you could capture the workload, all workload at the database level (better than client level): select, insert, delete, update, PL/SQL calls, all?

• Real Application Testing Capture and Replay

• Used at CERN for capture as of 10.2, replay on 11.1, 11.2 and 12.1

• Was a key component for our successful migration from 10.2 to 11.2

Your objective for the testing

16

• New hardware -> time matters

• New version -> execution plans, LIO matters

• Difference in results

• Resource management or parameters

impact…

• All lead to different tests and observations

Capture and Replay

17

Capture

Upgrade to

12.1 and

nonCDB to

PDB

Replay

Copy of the

database: RMAN, DG, expdp

flashback_scn=

nnn

Open sessions

18

• In principle can create errors/issues at

replay

• In our experience, little of an issue, marginal

differences

In flight transaction • Recommendation is to stop the database instances, then start the

instance(s) in restricted mode and then enable capture. Not possible in

most cases

• It means that it can incur errors for dependent transactions

• Not an issue if errors are negligible percentage of the workload

Transaction B

Transaction A

Possible cascading effect

on some other transactions

19

Capture files (1/2)

• One file created per server process (each

session for dedicated server process)

• Sequential, buffered writing per session

20 20

access("/…/wcr_7jya5h0000009.rec", F_OK) = -1 ENOENT (No such file or directory)

open("/…/wcr_7jya5h0000009.rec", O_RDWR|O_CREAT|O_TRUNC, 0666) = 10

Capture files (2/2) $ ls -lrt /proc/7576/fd/

lrwx------ 1 oracle ci 64 Sep 20 22:06 9 -> […]wcr_7jwjrh0000002.rec

$ strace -tt -T -p 7937 2>&1 | grep "write(9"

22:24:30.745968 write(9, “…"..., 4096) = 4096 <0.000028>

22:24:30.746050 write(9, “…"..., 45056) = 45056 <0.000044>

22:24:30.746149 write(9, “…"..., 684) = 684 <0.000016>

22:24:31.111495 write(9, “…"..., 4096) = 4096 <0.000026>

22:24:31.111584 write(9, “…"..., 45056) = 45056 <0.000038>

22:24:31.111675 write(9, "..."..., 713) = 713 <0.000017>

22:24:31.474120 write(9, “…"..., 4096) = 4096 <0.000027>

22:24:31.474193 write(9, “…"..., 45056) = 45056 <0.000040>

22:24:31.474282 write(9, “…”…,712) = 712 <0.000019>

21 21

Synchronisation

22

• SCN: the COMMIT order in the captured workload will be preserved during replay and all replay actions will be executed only after all dependent COMMIT actions have completed

• OBJECT_ID: all replay actions will be executed only after all relevant COMMIT actions have completed

• OFF: no dependency (if independent transaction?)

sysdate and sequence

23

• Latest replay patch bundle (16086826, see

reference) captures sysdate and sequence

calls so that they can be used at replay

Divergence

24

• DBA_WORKLOAD_REPLAY_DIVERGENCE

• GET_DIVERGING_STATEMENT procedure in

DBMS_WORKLOAD_REPLAY

• Replay report provides a summary

SQL tuning set

25

• capture_sts => TRUE is not supported

in RAC

• Tuning set can be used to compare SQL

executions

Demo 2

26

• Capture on one database

• Check the capture files

• Replay

• Check the replay report

Methodology matters! (1/2)

27

• Reproducible tests (scripted): reload

database, upgrade, set parameters, disable

some of jobs and resource manager time

based settings

• Gather statistics, logs and reports

Methodology matters! (2/2: caching)

28

• Buffer cache and shared pool (globally less impact for

long replay)

Multiple strategies:

1. Take the AWR

reports after a first period of replay

2. Do not compare execution time but LIO, execution

plans, etc.

3. Pre-warm (hint: use Capture and Replay!)

Warming cache Measure

Several replays • Advise to compare between replays when doing multiple changes and

replay versus the capture (same version, measure only the differences not

the capture/replay differences) Capture

Replay 2

Replay 1

New platform,

new version,

capture->replay

New platform,

new version,

capture->replay,

changes changes

29

Demo 3

30

• Methodology, example CASTORNS

Multitenant Database – Capture/Replay

31

Non CDB,

SQL ordered by Gets

CDB,

SQL ordered by Gets

Consolidated replay

32

• In 11.2.0.3 apply patch 16086826

• In 11.2.0.4 and 12.1, no patch required

• My Oracle Support note 1453789.1, “Real

Application Testing: Consolidated Database

Replay Feature”

Consolidated replay

33

1. Copy data (plug)

Consolidated replay

34

2. Copy and process

capture files

Consolidated replay

35

3. Configure and

initialize replay

EXEC DBMS_WORKLOAD_REPLAY.BEGIN_REPLAY_SCHEDULE ('CONS_SCHEDULE');

SELECT DBMS_WORKLOAD_REPLAY.ADD_CAPTURE ('DBA') FROM dual;

SELECT DBMS_WORKLOAD_REPLAY.ADD_CAPTURE ('DBB') FROM dual;

EXEC DBMS_WORKLOAD_REPLAY.END_REPLAY_SCHEDULE;

EXEC DBMS_WORKLOAD_REPLAY.INITIALIZE_CONSOLIDATED_REPLAY

('CONS_REPLAY','CONS_SCHEDULE');

Consolidated replay

36

4. Remap connections

EXEC DBMS_WORKLOAD_REPLAY.REMAP_CONNECTION (schedule_cap_id =>

1,CONNECTION_ID => 1, replay_connection => 'db121ol5/pdba');

Consolidated replay

37

5. prepare, launch wrc,

start replay

EXEC DBMS_WORKLOAD_REPLAY.PREPARE_CONSOLIDATED_REPLAY (synchronization

=> 'OBJECT_ID');

wrc

EXEC DBMS_WORKLOAD_REPLAY.START_CONSOLIDATED_REPLAY;

Demo 4

38

• Replay multiple workloads into a pluggable

database

Resource manager

39

• 16 4.496583 select /*+ parallel(3) */

• 16 7.048278 select /*+ parallel(3) */

• 16 3.281641 select /*+ parallel(3) */

Resource management

40

• Resource management is critical for consolidation

• Example:

• BEGIN

• DBMS_RESOURCE_MANAGER.CREATE_CDB_PLAN_DIRECTIVE(

• plan => 'newcdb_plan',

• pluggable_database => 'salespdb',

• shares => 3,

• utilization_limit => 100,

• parallel_server_limit => 100);

• END;

• /

If start_capture hangs

41

• From Szymon Skorupinski: dbms_workload_capture.start_capture can hang, “ADDM Jobs are in Status Executing or Running for a Long Time” (Doc ID 1557550.1). Workaround to disable automatic ADDM runs after snapshot taking works. alter system set "_addm_auto_enable"=false scope=both sid='*’;

Library

42

• We have built of library of {source DB,

captured workload}

• Very useful for testing new version, new OS,

new platform

Conclusion

43

• Oracle Database 12c Multitenant database for consolidation

• Replay with your applications is the only way to prepare

• Use Real Application Testing not only for major upgrades, patching and/or parameter changes.

• It is integrated with Multitenant

• Capture and Replay to measure the differences if any

• Methodology, reproducibility

• Your turn!

References

44

• Oracle 12c testing guide,

http://docs.oracle.com/cd/E24628_01/server.121/e20852/part2.htm#CHDGFGCC

• Oracle Database Replay http://www.vldb.org/pvldb/2/vldb09-588.pdf

• Consistent Synchronization Schemes for Workload

Replayhttp://www.vldb.org/pvldb/vol4/p1225-morfonios.pdf

• Master Note for Real Application Testing Option (MOS Doc ID 1464274.1)

• Real Application Testing: Consolidated Database Replay Feature (MOS Doc ID

1453789.1)

• Pre and Post Installation Readme for Patch 16086826 DBREPLAY Patch Bundle 2 and

Database Replay Workload Consolidation Feature (MOS Doc ID 1565663.1)

• Scripts To Debug Slow Replay (MOS Doc ID 760402.1)

• Julian Dyke presentations on Database Replay

http://www.juliandyke.com/Presentations/Presentations.html

Recommended