41
RN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it A Closer Look inside Oracle ASM UKOUG Conference 2007 Luca Canali, CERN IT LCG

Oracle ASM

Embed Size (px)

DESCRIPTION

Oracle ASM Principles

Citation preview

A closer Look inside Oracle ASMA Closer Look inside Oracle ASM
UKOUG Conference 2007
Outline
Investigation of ASM internals
ASM and VLDB
ASM
Oracle Automatic Storage Management
Provides the functionality of a volume manager and filesystem for Oracle (DB) files
Works with RAC
Together with Oracle Managed Files and the Flash Recovery Area
An implementation of S.A.M.E. methodology
Goal of increasing performance and reducing cost
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM for a Clustered Architecture
Oracle architecture of redundant low-cost components
Servers
SAN
Storage
This is the architecture deployed at CERN for the Physics DBs, more on https://twiki.cern.ch/twiki/pub/PSSGroup/HAandPerf/Architecture_description.pdf
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Disk Groups
Example: HW = 4 disk arrays with 8 disks each
An ASM diskgroup is created using all available disks
The end result is similar to a file system on RAID 1+0
ASM allows to mirror across storage arrays
Oracle RDBMS processes directly access the storage
RAW disk access
Files, Extents, and Failure Groups
Files and
ASM Is not a Black Box
ASM is implemented as an Oracle instance
Familiar operations for the DBA
Configured with SQL commands
Info in V$ views
Some ‘secret’ details hidden in X$TABLES and ‘underscore’ parameters
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Selected V$ Views and X$ Tables
View Name
X$ Table
V$ASM_DISK
V$ASM_FILE
X$KFFIL
V$ASM_ALIAS
X$KFALS
V$ASM_TEMPLATE
X$KFTMTA
V$ASM_CLIENT
X$KFNCL
V$ASM_OPERATION
X$KFGMG
N.A.
X$KFDAT
A complete list in https://twiki.cern.ch/twiki/bin/view/PSSGroup/ASM_Internals
View Name X$ Table name Description
V$ASM_DISKGROUP X$KFGRP performs disk discovery and lists diskgroups
V$ASM_DISKGROUP_STAT X$KFGRP_STAT diskgroup stats without disk discovery
V$ASM_DISK X$KFDSK, X$KFKID performs disk discovery, lists disks and their usage metrics
V$ASM_DISK_STAT X$KFDSK_STAT, X$KFKID lists disks and their usage metrics
V$ASM_FILE X$KFFIL lists ASM files, including metadata/asmdisk files
V$ASM_ALIAS X$KFALS lists ASM aliases, files and directories
V$ASM_TEMPLATE X$KFTMTA lists the available templates and their properties
V$ASM_CLIENT X$KFNCL lists DB instances connected to ASM
V$ASM_OPERATION X$KFGMG lists rebalancing operations
N.A. X$KFKLIB available libraries, includes asmlib path
N.A. X$KFDPARTNER lists disk-to-partner relationships
N.A. X$KFFXP extent map table for all ASM files
N.A. X$KFDAT extent list for all ASM disks
N.A. X$KFBH describes the ASM cache (buffer cache of ASM in blocks of 4K (_asm_blksize)
N.A. X$KFCCE a linked list of ASM blocks. to be further investigated
Additional in 11g:
V$ASM_ATTRIBUTE X$KFENV ASM attributes, the X$ table shows also 'hidden' attributes
V$ASM_DISK_IOSTAT X$KFNSDSKIOST I/O statistics
ASM Parameters
*.instance_type='asm'
*.remote_login_passwordfile='exclusive'
More ASM Parameters
Exception: _asm_ausize and _asm_stripesize
New in 11g, diskgroup attributes
V$ASM_ATTRIBUTE, most notable
Query to display underscore parameters
select a.ksppinm "Parameter", c.ksppstvl "Instance Value" from x$ksppi a, x$ksppcv b, x$ksppsv c where a.indx = b.indx and a.indx = c.indx and ksppinm like '%asm%' order by a.ksppinm;
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Storage Internals
Default size 1 MB (_asm_ausize)
Tunable diskgroup attribute in 11g
ASM files are built as a series of extents
Extents are mapped to AUs using a file extent map
When using ‘normal redundancy’, 2 mirrored extents are allocated, each on a different failgroup
RDBMS read operations access only the primary extent of a mirrored couple (unless there is an IO error)
In 10g the ASM extent size = AU size
11g implements variable extent size for extent#>20000 (similar to the automatic extent size management in the DB)
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Metadata Walkthrough
Three examples follow of how to read data directly from ASM.
Motivations:
Build confidence in the technology, i.e. ‘get a feeling’ of how ASM works
It may turn out useful one day to troubleshoot a production issue.
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Example 1: Direct File Access 1/2
Goal: Reading ASM files with OS tools, using metadata information from X$ tables
Example: find the 2 mirrored extents of the RDBMS spfile
sys@+ASM1> select GROUP_KFFXP Group#, DISK_KFFXP Disk#, AU_KFFXP AU#, XNUM_KFFXP Extent# from X$KFFXP where number_kffxp=(select file_number from v$asm_alias where name='spfiletest1.ora');
GROUP# DISK# AU# EXTENT#
Example 1: Direct File Access 2/2
Find the disk path
v$asm_disk where GROUP_NUMBER=1 and disk_number
in (16,4);
DISK_NUMBER PATH
dd if=/dev/mpath/itstor419_6p1 bs=1024k
count=1 skip=17528 |strings
Device mapper multipathing is used under Linux
Block devices are given a name /dev/mpath/itstor.. to reflect the storage array name
The name is a symbolic link (handled by device mapper) to to /dev/dm-xxx (device mapper block device)
Device names and device mapper configurations are set up with /etc/multipath.conf
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
X$KFFXP
COMPOUND_KFFXP
INCARN_KFFXP
XNUM_KFFXP
ASM file extent number (mirrored extent pairs have the same extent value)
PXN_KFFXP
DISK_KFFXP
AU_KFFXP
Relative position of the allocation unit from the beginning of the disk.
LXN_KFFXP
0->primary extent,1->mirror extent, 2->2nd mirror copy (high redundancy and metadata)
‘File extent pointer’ X$ table. This is the author interpretation of the table based on investigation of actual instances.
X$KFFXP Column Name Description
ADDR x$ table address/identifier
INDX row unique identifier
INST_ID instance number (RAC)
COMPOUND_KFFXP File identifier. Join with compound_index in v$asm_file
INCARN_KFFXP File incarnation id. Join with incarnation in v$asm_file
PXN_KFFXP Progressive file extent number
XNUM_KFFXP ASM file extent number (mirrored AU have the same extent value)
GROUP_KFFXP ASM disk group number. Join with v$asm_disk and v$asm_diskgroup
DISK_KFFXP Disk number where the extent is allocated. Join with v$asm_disk
AU_KFFXP Relative position of the allocation unit from the beginning of the disk. The allocation unit size (1 MB) in v$asm_diskgroup
LXN_KFFXP 0->primary extent, ->mirror extent, 2->2nd mirror copy (high redundancy and metadata)
FLAGS_KFFXP N.K.
CHK_KFFXP N.K.
Example 2: A Different Way
A different metadata table to reach the same goal of reading ASM files directly from OS:
sys@+ASM1> select GROUP_KFDAT Group# ,NUMBER_KFDAT Disk#, AUNUM_KFDAT AU# from X$KFDAT where fnum_kfdat=(select file_number from v$asm_alias where name='spfiletest1.ora');
GROUP# DISK# AU#
1 4 14838
1 16 17528
This is much slower that reading X$KFFXP since it scans all the disks’ allocation tables
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
X$KFDAT
NUMBER_KFDAT
COMPOUND_KFDAT
AUNUM_KFDAT
Disk allocation unit (relative position from the beginning of the disk), join with x$kffxp.au_kffxp
V_KFDAT
Flag: V=this Allocation Unit is used; F=AU is free
FNUM_KFDAT
XNUM_KFDAT
Progressive file extent number join with x$kffxp.pxn_kffxp
‘Disk allocation table’ X$ table. This is the author interpretation of the table based on investigation of actual instances.
X$KFDAT Column Name Description
ADDR x$ table address/identifier
INDX row unique identifier
INST_ID instance number (RAC)
GROUP_KFDAT diskgroup number, join with v$asm_diskgroup
NUMBER_KFDAT disk number, join with v$asm_disk
COMPOUND_KFDAT disk compund_index, join with v$asm_disk
AUNUM_KFDAT Disk allocation unit (relative position from the beginning of the disk), join with x$kffxp.au_kffxp
V_KFDAT V=this Allocation Unit is used; F=AU is free
FNUM_KFDAT file number, join with v$asm_file
I_KFDAT N.K.
RAW_KFDAT raw format encoding of the disk,and file extent information
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Example 3: Yet Another Way
Using the internal package dbms_diskgroup
declare
data_buf raw(4096);
DBMS_DISKGROUP
It’s an Oracle internal package
Does not require a RDBMS instance
11g’s asmcmd cp command uses dbms_diskgroup
Procedure Name
dbms_diskgroup.read
dbms_diskgroup.close
(:hdl)
dbms_diskgroup.commitfile
(:handle)
dbms_diskgroup.resizefile
(:handle,:fsz)
dbms_diskgroup.remap
File Transfer Between OS and ASM
The supported tools (10g)
In 11g, all the above plus asmcmd
cp command
Works directly with the ASM instance
ftp into asm diskgroups can be used together with transportable tablespace.
CERN’s experience is the migration and consolidation of 10 DBs from 9i to a multi TB RAC DB on 10g
11g asmcmd is improved from the 10g version: Look in $OH/bin and $OH/rdbms/lib
(find $ORACLE_HOME -name *asmcmd*)
Strace and ASM 1/3
Example:
read64(15,"#33\0@\"..., 8192, 473128960)=8192
This is a read operation of 8KB from FD 15 at offset 473128960
What is the segment name, type, file# and block# ?
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Strace and ASM 2/3
From /proc/<pid>/fd I find that FD=15 is
/dev/mpath/itstor420_1p1
Note: offset 473128960 = 451 MB + 27 *8KB
sys@+ASM1>select number_kffxp, xnum_kffxp from x$kffxp where group_kffxp=1 and disk_kffxp=20 and au_kffxp=451;
NUMBER_KFFXP XNUM_KFFXP
Strace and ASM 3/3
From v$asm_alias I find the file alias for file 268: USERS.268.612033477
From v$datafile view I find the RDBMS file#: 9
From dba extents finally find the owner and segment name relative to the original IO operation:
sys@TEST1>select owner,segment_name,segment_type from dba_extents where FILE_ID=9 and 27+17*1024*1024 between block_id and block_id+blocks;
OWNER SEGMENT_NAME SEGMENT_TYPE
Investigation of Fine Striping
An application: finding the layout of fine-striped files
Explored using strace of an oracle session executing ‘alter system dump logfile ..’
Result: round robin distribution over 8 x 1MB extents


AU = 1MB
Metadata Files
Not listed in V$ASM_FILE (file# <256)
Details are available in X$KFFIL
In addition the first 2 AUs of each disk are marked as file#=0 in X$KFDAT
Example (10g):
Metadata files have 3-way mirroring
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Metadata 1/2
File#0, AU=0: disk header (disk name, etc), Allocation Table (AT) and Free Space Table (FST)
File#0, AU=1: Partner Status Table (PST)
File#1: File Directory (files and their extent pointers)
File#2: Disk Directory
File#3: Active Change Directory (ACD)
The ACD is analogous to a redo log, where changes to the metadata are logged.
Size=42MB * number of instances
Source: Oracle Automatic Storage Management, Oracle Press Nov 2007, N. Vengurlekar, M. Vallath, R.Long
A note on File#1 and file extent pointers:
The extent information available in X$KFFXP is
In File#1 for the first 60 extents (short files)
Extent# > 60 make use of indirect pointers
Indirect pointers are allocated in files
Direct investigation:
create tablespace test1 datafile size 30712k;
60 extents (normal redundancy)
65 extents (normal redundancy)
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Metadata 2/2
File#4: Continuing Operation Directory (COD).
The COD is analogous to an undo tablespace. It maintains the state of active ASM operations such as disk or datafile drop/add. The COD log record is either committed or rolled back based on the success of the operation.
File#5: Template directory
File#6: Alias directory
11g, File#9: Attribute Directory
11g, File#12: Staleness registry, created when needed to track offline disks
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Rebalancing
Goal: balanced space allocation across disks
Not based on performance or utilization
ASM spreads every file across all disks in a diskgroup
ASM instances are in charge of rebalancing
Extent pointers changes are communicated to the RDBMS
RDBMS’ ASMB process keeps an open connection to ASM
This can be observed by running strace against ASMB
In RAC, extra messages are passed between the cluster ASM instances
LMD0 of the ASM instances are very active during rebalance
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Rebalancing and VLDB
RBAL coordinates the rebalancing operations
ARBx processes pick up ‘chunks’ of work. By default they log their activity in udump
Does it scale?
Even at maximum speed rebalancing is not always I/O bound
Gmon is also involved at the beginning and the end of the rebalancing operation to take and release the diskgroup lock
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
ASM Rebalancing Performance
Oradebug setospid …
oradebug event 10046 trace name context forever, level 12
Process log files (in bdump) with orasrp (tkprof will not work)
Main wait events from my tests with RAC (6 nodes)
DFS lock handle
Buffer busy wait
ASM Single Instance Rebalancing
Single instance rebalance
Faster in RAC if you can rebalance with only 1 node up (I have observed: 20% to 100% speed improvement)
Buffer busy wait can be the main event
It seems to depend on the number of files in the diskgroup.
Diskgroups with a small number of (large) files have more contention (+arbx processes operate concurrently on the same file)
Only seen in tests with 10g
11g has improvements regarding rebalancing contention
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Rebalancing, an Example
Data: D.Wojcik, CERN IT
Rebalancing speed is measured in MB/minute to conform to V$ASM_OPERATION units
Test conditions may vary the results (OS, storage, Oracle version, number of ASM files, etc)
It’s a good idea to repeat the measurements when several parameters of the environment change to get meaningful results.
Chart2
1
1
2
2
3
3
4
4
5
5
6
6
7
7
8
8
9
9
10
10
11
11
Rebalancing Workload
Rebalancing operations can move more data than expected
Example:
A disk is replaced (diskgroup rebalance)
The total IO workload is 1.6 TB (8x the disk size!)
How to see this: query v$asm_operation, the column EST_WORK keeps growing during rebalance
The issue: excessive repartnering
Rebalancing in RAC is failed over when an instance crashes, but
Does not restart if all instance are down (typical of single instance)
No obvious way to tell if a diskgroup has a pending rebalance op
A partial work around is to query v$ASM_DISK to see if there are disk occupation imbalances
total_mb and free_mb
ASM Disk Partners
Two copies of each extents are written to different ‘failgroups’
Two ASM disks are partners:
When they have at least one extent set in common (they are the 2 sides of a mirror for some data)
Each ASM disk has a limited number of partners
Typically 10 disk partners: X$KFDPARTNER
Helps to reduce the risk associated with 2 simultaneous disk failures
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Free and Usable Space
When ‘ASM mirroring’ is used not all the free space should be occupied
V$ASM_DISKGROUP.USABLE_FILE_MB:
Amount of free space that can be safely utilized taking mirroring into account, and yet be able to restore redundancy after a disk failure
it’s calculated for the case of the worst scenario, anyway it is a best practice not to have it go negative (it can)
This can be a problem when deploying a small number of large LUNs and/or failgroups
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Fast Mirror Resync
ASM 10g with normal redundancy does not allow to offline part of the storage
A transient error in a storage array can cause several hours of rebalancing to drop and add disks
It is a limiting factor for scheduled maintenances
11g has new feature ‘fast mirror resync’
Redundant storage can be put offline for maintenance
Changes are accumulated in the staleness registry (file#12)
Changes are applied when the storage is back online
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Read Performance, Random I/O
~130 IOPS per disk
Destroking, only the external part of the disks is used
8675 IOPS
From other tests with less ad-hoc tuning and from application deployment: ~100 IOPS random read operations per disk (SATA 7200 rpm)
More info here: https://twiki.cern.ch/twiki/bin/view/PSSGroup/HAandPerf
Read Performance, Sequential I/O
(measured with parallel query)
MB/sec
Sequential I/O Throughput, 80GB probe table 64 SATA HDs (4 arrays and 4 RAC nodes)
34.9094625133
287.4670191623
409.5735978619
503.6173443858
548.6779324081
617.7959665095
611.9832999417
683.3908759101
695.529558478
677.4192746467
689.8283568821
718.5955929836
713.0668765247
745.1732377404
727.2567555806
739.9409144715
741.8871810638
740.3040318499
732.3320534061
724.4304199538
Sheet1
1
4.1615322248
1877.3133495
8000000
16710.5
545942
0
4917.5
7994874
0
65494007808
77.5
170296
39.5
232
315.5
53.5
337.5
170235
186630
0
0
2
34.2687391236
227.977457
8000000
6333
199234.5
0
0
8000000
8000000
65536000000
0
38792
0.5
14
4
0
0
0
0
187364
38791.5
4
48.8249776199
160.010314
8000000
6081
80168
0
33.5
8000000
8000000
65536000000
0
56710
2
0
0
0
0
0
0
187556
56711
8
60.0358658297
130.130546
8000000
6095
115957.5
0
0
8000000
8000000
65536000000
0.5
95151.5
0
8.5
5.5
0
0
0
0
187792
95151
10
65.4075065146
119.443477
8000000
5973.5
129581
0
48
8000000
8000000
65536000000
0
109498
2
0
0
0
0
0
0
187792
109497
12
73.6470182549
106.08033
8000000
5958.5
135741.5
0
0
8000002.5
8000000
65536020480
0.5
116587.5
0.5
0
5
2
2.5
0
0
187792
116587
14
72.954094403
107.0878895
8000000
6091.5
158535.5
0
913.5
8000000
8000000
65536000000
0
138789.5
1.5
11.5
0
0
0
0
0
187784
138789.5
16
81.4665408027
95.898266
8000000
6074.5
160525.5
0
1.5
8000000
8000000
65536000000
0
142706.5
0.5
0
0
0
0
0
0
187792
142706
18
82.9135845277
94.224608
8000000
6308.5
176009
0
76
8000269
8000000
65538203648
108.5
157423
315
61
80.5
97.5
200
16.5
26
187784
157294.5
20
80.7546704586
96.7436305
8000000
6132.5
196915.5
0
0
8000000
8000000
65536000000
0
177260
3
0
0
0
0
0
0
187792
177259
22
82.2339483359
95.003343
8000000
6130
214027
0
574
8000000
8000000
65536000000
0
194423.5
1
0
0
0
0
0
0
187783
194424.5
24
85.6632701139
91.200114
8000000
6157.5
220992.5
0
0
8000000
8000000
65536000000
1
201256.5
0.5
11.5
2
0
0
0
0
187777
201256.5
26
85.0041957527
91.907228
8000000
6110
241930.5
0
0
8000000
8000000
65536000000
0.5
222170
1.5
0
0
0
0
0
0
187777
222169.5
28
88.8315722633
87.9473345
8000000
6135.5
249381.5
0
30
8000000
8000000
65536000000
0.5
229970
0
0
0
0
0
0
0
187768
229970.5
30
86.6957611538
90.113979
8000000
6119.5
266981.5
0
0
8000000
8000000
65536000000
8.5
246128
1
17.5
0.5
0
0
0
0
187784
246127
32
88.2078307237
88.569234
8000000
6154
285152.5
0
10.5
8000000
8000000
65536000000
13.5
264811
1
8
2
0
0
0
0
187792
264812.5
34
88.4398437814
88.3368815
8000000
6315.5
296727.5
0
17.5
8000000
8000000
65536000000
8
276085
1
3
0
0
0
0
0
187758
276084
36
88.2511176884
88.525791
8000000
6254.5
532652
0
3.5
8000000
8000000
65536000000
10
290676
0.5
18
1
0
0
0
0
187758
290676.5
38
87.3007838018
89.48946
8000000
6330.5
332266
0
9.5
8000000
8000000
65536000000
3
310195
1
3
3.5
0
0
0
0
187761
310195
40
86.3588356917
90.465555
8000000
6278
359609
0
55.5
8000000
8000000
65536000000
6.5
336999
1.5
8
2
0
0
0
0
187792
336999
Sheet2
1
34.9094625133
1877.3133495
8000000
16710.5
545942
0
4917.5
7994874
0
65494007808
77.5
170296
39.5
232
315.5
53.5
337.5
170235
186630
0
0
2
287.4670191623
227.977457
8000000
6333
199234.5
0
0
8000000
8000000
65536000000
0
38792
0.5
14
4
0
0
0
0
187364
38791.5
4
409.5735978619
160.010314
8000000
6081
80168
0
33.5
8000000
8000000
65536000000
0
56710
2
0
0
0
0
0
0
187556
56711
8
503.6173443858
130.130546
8000000
6095
115957.5
0
0
8000000
8000000
65536000000
0.5
95151.5
0
8.5
5.5
0
0
0
0
187792
95151
10
548.6779324081
119.443477
8000000
5973.5
129581
0
48
8000000
8000000
65536000000
0
109498
2
0
0
0
0
0
0
187792
109497
12
617.7959665095
106.08033
8000000
5958.5
135741.5
0
0
8000002.5
8000000
65536020480
0.5
116587.5
0.5
0
5
2
2.5
0
0
187792
116587
14
611.9832999417
107.0878895
8000000
6091.5
158535.5
0
913.5
8000000
8000000
65536000000
0
138789.5
1.5
11.5
0
0
0
0
0
187784
138789.5
16
683.3908759101
95.898266
8000000
6074.5
160525.5
0
1.5
8000000
8000000
65536000000
0
142706.5
0.5
0
0
0
0
0
0
187792
142706
18
695.529558478
94.224608
8000000
6308.5
176009
0
76
8000269
8000000
65538203648
108.5
157423
315
61
80.5
97.5
200
16.5
26
187784
157294.5
20
677.4192746467
96.7436305
8000000
6132.5
196915.5
0
0
8000000
8000000
65536000000
0
177260
3
0
0
0
0
0
0
187792
177259
22
689.8283568821
95.003343
8000000
6130
214027
0
574
8000000
8000000
65536000000
0
194423.5
1
0
0
0
0
0
0
187783
194424.5
24
718.5955929836
91.200114
8000000
6157.5
220992.5
0
0
8000000
8000000
65536000000
1
201256.5
0.5
11.5
2
0
0
0
0
187777
201256.5
26
713.0668765247
91.907228
8000000
6110
241930.5
0
0
8000000
8000000
65536000000
0.5
222170
1.5
0
0
0
0
0
0
187777
222169.5
28
745.1732377404
87.9473345
8000000
6135.5
249381.5
0
30
8000000
8000000
65536000000
0.5
229970
0
0
0
0
0
0
0
187768
229970.5
30
727.2567555806
90.113979
8000000
6119.5
266981.5
0
0
8000000
8000000
65536000000
8.5
246128
1
17.5
0.5
0
0
0
0
187784
246127
32
739.9409144715
88.569234
8000000
6154
285152.5
0
10.5
8000000
8000000
65536000000
13.5
264811
1
8
2
0
0
0
0
187792
264812.5
34
741.8871810638
88.3368815
8000000
6315.5
296727.5
0
17.5
8000000
8000000
65536000000
8
276085
1
3
0
0
0
0
0
187758
276084
36
740.3040318499
88.525791
8000000
6254.5
532652
0
3.5
8000000
8000000
65536000000
10
290676
0.5
18
1
0
0
0
0
187758
290676.5
38
732.3320534061
89.48946
8000000
6330.5
332266
0
9.5
8000000
8000000
65536000000
3
310195
1
3
3.5
0
0
0
0
187761
310195
40
724.4304199538
90.465555
8000000
6278
359609
0
55.5
8000000
8000000
65536000000
6.5
336999
1.5
8
2
0
0
0
0
187792
336999
Sheet2
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
MB/sec
Implementation Details
Block devices
JBOD config
HW:
FC (Qlogic): 4Gb switch and HBAs (2Gb in older HW)
Servers are 2x CPUs, 4GB RAM, 10.2.0.3 on RHEL4, RAC of 4 to 8 nodes
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Conclusions
CERN deploys RAC and ASM on Linux on commodity HW
2.5 years of production, 110 Oracle 10g RAC nodes and 300TB of raw disk space (Dec 2007)
ASM metadata
Knowledge of some ASM internals helps troubleshooting
ASM on VLDB
CERN - IT Department CH-1211 Genève 23 Switzerland www.cern.ch/it
Q&A
Q&A
64 SATA HDs (4 arrays, 1 instance)
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
2
1030507090
110130150170190210230250270290310330350
I/O small read per sec (IOPS)
ASM Rebalancing Performance (RAC)
64 SATA HDs (4 arrays and 4 RAC nodes)
0
100
200
300
400
500
600
700
800
124810121416182022242628303234363840
MB/sec