19
06/22/2005 CDF Taking Stock 2005- 2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

Embed Size (px)

Citation preview

Page 1: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

CDF Taking Stock

By Anil Kumar

CD/CSS/DSG

June 22, 2005

Page 2: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Current Infrastructure

Machine Usage Type o/s Oracle

bzora1 Prd Sun V440 2.9 32b 9.2.0.6 32 bitb0dau36 Dev/Int Sun E450 2.9 32b 9.2.0.6 32 bit

fcdfora4 Prd Sun V880 2.9 64b 9.2.0.6 64 bitfcdfora1 Dev/Int Sun E4500 2.9 64b 9.2.0.6 64 bit

fcdfora6 Prd/Replica Dell PE 6650 RHAS 3 32b lx869.2.0.6 32 bit

fcdfora05 testbed Dell 2650 RHAS 3 32b** 9.2.0.6 32 bit

Decommissing fcdflnx1,fcdfora3, bzora1b0dau35

Page 3: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Current Infrastructure

Applic. Machine Capacity Used Rman Capacity Rman Used

CDF Online bzora1 595G 256G 535G 461GB (2x)

CDF Offline fcdfora4 1.3T 192G 270G 25G*

CDF Replica fcdfora6 1.2T 171G no rman implementation

Page 4: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

CDF Capacity All Applications

CDF Offline Space Usage

010000200003000040000

4/2

9/2

00

1

10

/29

/20

01

4/2

9/2

00

2

10

/29

/20

02

4/2

9/2

00

3

10

/29

/20

03

4/2

9/2

00

4

10

/29

/20

04

4/2

9/2

00

5

Dates

Siz

e (

MB

)

Series1

CDF On-line Space Usage

050000

100000150000200000250000

12/1

9/20

00

6/19

/200

1

12/1

9/20

01

6/19

/200

2

12/1

9/20

02

6/19

/200

3

12/1

9/20

03

6/19

/200

4

12/1

9/20

04

Date

Siz

e (M

B)

Series1

• CDF Offline DB Growth* 43(online)+5G(offline)/year

* Slow Control is not in Offline

• CDF Online DB Growth 50G/year

Page 5: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

CDF Online Applications

Page 6: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

CDF Offline Applications

Page 7: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Monitoring And Data Modeling Tools

Monitoring Tools : • dbatool/toolman

To monitor the space usage, users, SQL, tempspace, sniping of inactive sessions, auto start of Listener, IA, estimate table/Index stats

• OEM (Oracle Enterprise Manager)- DB Monitoring tool/ Monthly charts posted on web

Db Performance Charts :http://www-cdserver.fnal.gov/cd_public/css/dsg/db_stats/data/db_stats.htmlThe url for the ganglia charts (monitoring tools) is:

http://fcdfmon2.fnal.gov/

Data Modeling Tool : Oracle Designer is used for Data Modeling and initial space estimates for applications.

Page 8: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Uptimes

• Cdfonprd 100%

• Cdfofprd 99.4356%1776 minutes unscheduled Down Time since 11/11/2004

• Cdf Replica 100%

Page 9: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Accomplishments• Upgraded CDF databases to 9.2.0.6• Quarterly Database Security Up-to-date• Tuned/Regression test the streams replication as per current API usage.• Deployment of bzora1 for cdfonprd Very smooth transition. No interruption to Data Taking !• Decommissioned b0dau35• Oracle Backups for cdfonprd to DCache/Enstore

http://www-css.fnal.gov/dsg/external/cdfdbmtgs/all_other_documentation/bzora1.pdf

• Deployed the long/eagerly awaited streams replication across CDF databases. Hard Work of css-dsg spanned across more than 2 years is finally in

production. All issues encountered are addressed in timely manner. • Smooth Transition to fcdfora6 with streams replication.• Decommissioned fcdflnx1.• Implemented Capture of Long transactions in db.

Page 10: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Replication Tool

• Streams Replication tool “strmrep”

• Production Deployment of Streams Replication encountered two issues : a) Replication of two packages RUNDB and HDWDB caused Streams to halt. Worked very hard to address the issue/deploy the workaround. Permanent bug Fix is released by Oracle on Thus 06/16/05. This bug was not encountered in integration Test. b) SAM can’t be replicated using streams since SAM application has variable length CLOBS and functional index. There was not enough time to do regression test and no use case.

• One more error after production deployment that was causing one of streams process to halt. Deployed the work around. Oracle found the root cause and bug fix will be available in 1-2 weeks. • Cdf Streams Status on-line ( Courtesy Randy Herber)

http://dbb.fnal.gov:8520/cdfr2/databases?type=ora-strms&fsrc=cdfofpr2&nsrc=cdfofpr2&gsrc=cdfofpr2&dcbk=FILECATALOG

• Documentation http://www-css.fnal.gov/dsg/internal/ora_repl/

Page 11: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Freeware Db Support

• Mysql/Postgres prototype – proof of product with CDF data– Mechanism for population IS on demand, it does not support updates – CDF successfully tested with CDF code - (Karlsruhe)

• DSG has begun to provide consulting for freeware databases

– actively maintaining new versions of mysql & postgres in KITS and working towards a more robust environment

– actively maintaining documentation for mysql & postgres in our freeware area. • Reference url:

http://www-css.fnal.gov/dsg/external/freeware– actively assisting users with questions, upgrades, testing, etc. for freeware products.

Page 12: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Back-up

• CDF ONLINE DATABASES cdfonprd - Daily, 7 days of archives, Two Backup copies always on DISK - Allocated 535GB Used 461GB ( 2 Copies) , Backup time 1 Hr 23 Min Vs 2 Hrs 30 Min on old

hardware - CDF on-line Backup to DCache/Enstore: Dailycdfondev Daily, 14 days archives, one always on DISK cdfondev -> 2 Hrs 30Min cdfonint Daily, 30 days archives, one always on Disk - Allocated 356GB, Used 219GB ,cdfonint -> 2 Hrs 15 Min

• CDF OFFLINE DATABASES cdfofprd DFC+SAM, Daily, 8 days of archive and Export. One always on Disk. Allocated 270GB , Used 25 GB Backup time -> 1 Hr 24 Min Cdfstrm1 being replica of on-line and DFC. No backup ->RMAN/ Tape. cdfofdev– Daily cdfofdev, 7 days of archives , cdfofdev -> 3 Hrs 20 Mins cdfofint 2 times/week for cdfofint, 7 days of archives Allocated 67Gb, used 36Gb (2 copies) , cdfofint-> 2:30Mins

Page 13: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Oracle Backup for cdfonprd toDCache/Enstore

• RMAN to DCache/Enstore is working fine, but needs fine tuning to fit our(dsg) standard, firewall independent backup mode.

• Working reliably. Fully automated for dailys.

• Data Integrity tested twice while recovery.

• Data Integrity tested 4 times via md5sum

• Not currently using weekly or monthly PNFS directory structure.

• Intend to send weekly on Sunday and Monthly on Ist.

• No archives being sent yet.

• PNFS metadata maintenance being done manually.

Page 14: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

RMAN Backup on SAN

• Inexpensive, large disk array can accommodate growing RMAN backups

• Fast & reliable backup and recovery

• 24 x 7 and 8 x 5 support tiers available

• Can serve various O/S platforms

• Briefing on the database backup/recovery standardization on june 16, discussed the san testing in more detail.

http://www-css.fnal.gov/dsg/internal/briefings_and_projects/briefings/standardizing_database_backups.ppt

• Multiplexing of archives to local disk and SAN

Page 15: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

RMAN to SAN Experience

• d0ofdev1 RMANs to SAN since Nov. ’04

• Two 1TB SAN mount points available

• Keep 2 alternating days of RMANs on SAN, once/week to local backup disk

• RMAN validation to determine backup file integrity

• One validation failure since Nov. ’04

• Recoveries from SAN were all successful

Page 16: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

SAN issues

• Current SAN is not 24 x 7 support

• IDE disks are not as reliable as other, more expensive disks are

• Purchasing 24 x 7 SAN requires licensing and changes to O/S to be able to use it

• Firewall issues (CDF & D0 online)

• We will be extremely careful in implementing SAN for bzora1.

On bzora1 :

a) PCI Card has been installed.

b) fiber between cdf and fcc has been identified for use, we are waiting

for additional san hardware for bzora1.

Page 17: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

SAM Schema

• Production Deployments : - Autodestination Sub-System of SAM schema - Indexes on Param Values Deployed in production. - Data Types correction cut.

- Indexes for Volumes

• Work-in-progress - Request Sub System of SAM Schema. Cut in Mini-sam.

• Upgrade to Mini SAM as SAM Schema Evolved. -> This facilitate individual developers to have copy of SAM metadata and seed data available for server software rewrite if needed.

• Mini-SAM in Postgres. Initiative to move towards free ware Databases for SAM . Proof of product not complete, requires testing with a dbserver  from the sam development team

Page 18: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

What’s Next ?

• Deploy san/enstore backup recovery plan. ( TESTING OF SAN on d0 offline is work-in-progress)Backup to DCache/Enstore already in place for CDF on-line • Re-allocate Winchester Disk Array from fcdflnx1 to fcdfora1 sothat enough space to reconfigure streams integration setup. • Reconfigure Streams Test Env cdfonint -> cdfofint -> cdfrep23 • SAM Request Sub System Schema Deployment • Patch cdf database for replication of RUNDB and HDWDB packages. ( Patch was released by Oracle on Thu 06/16/05) • Converting cdfonline to 64 bit. Testing will be challenge. • O/S upgrade (reinstall) to 2.9 on b0dau36 . Decommissioned Veritas.• Performance tuning on fcdfora4 to sga > 2Gb to allocate more memory to streams • Migrating Slow Control to Linux. • Rewrite of dbatools/toolman for enhanced features of monitoring and 10g support.• Upgrade OEM to 10g . Work in progress. • Possible Upgrade to 10g due to incremental database backups and streams replication’s

enhanced features.• Testing of postgres mini sam for proof of product.

Page 19: 06/22/2005CDF Taking Stock 2005-2006 CDF Taking Stock By Anil Kumar CD/CSS/DSG June 22, 2005

06/22/2005 CDF Taking Stock 2005-2006

Concerns

• Replication of SAM depends upon the stress test results on fcdfora4.• Simulation of Applications as we have for CALIB. Robust test Suite needed. • Single point of failure for SAM and DFC• Migration of DFC to SAM . Plan and Schedule ?• Close Out for Data Guard/Standby is still pending. • Move Slow Control off of bzora1

- Require 3 instances - OS Linux ? If Linux then not a 24*7 Machine. • Some of CDF Applications Data Model is not in Designer. • What is cdf's direction, if any, in respect to freeware?• Any more Streams replica ? • Deputy CDF database Liaison ?• TNSNAMES deployment for CDF was a nightmare. Experience should be documented. • Special Clean-up jobs should be co-ordinated with css-dsg• In case of Hardware Failure on offline, we have to resintantiate replication Vs recovery since we

have partial backups on offline prd db. • Move off VxWorks from b0dau36.