Front cover Infrastructure Solutions - IBM Redbooks · Front cover Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver Business Intelligence Data Warehouse

ibm.com/redbooks

Front cover

Infrastructure Solutions:Design, Manage, and Optimizea 20 TB SAP NetWeaver Business Intelligence Data Warehouse

Christian MatthysDavid BrightCarol Davis

Fabio HasegawaPhilippe Jachimczyk

Steve LockwoodThomas Marien

Julien MaussionStefan Ulrich

Scalability study of SAP NetWeaver Business Intelligence on IBM System p5

Architectural description, test results, lessons learned

Manage the solution using Tivoli products

http://www.redbooks.ibm.com/


International Technical Support Organization

Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

March 2007

SG24-7289-00

© Copyright International Business Machines Corporation 2007. All rights reserved.Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP ScheduleContract with IBM Corp.

First Edition (March 2007)

This edition describes tests done in an AIX 5.2, DB2 V8 and SAP NetWeaver BI 3.5 environment with IBM System p5 and using Tivoli Storage Management and Tivoli data Protection products for storage management.

Note: Before using this information and the product it supports, read the information in “Notices” on page vii.

Contents

Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiTrademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii

Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixThe team that wrote this book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xBecome a published author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xiiComments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xii

Chapter 1. Project summary. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 The project proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.1.1 The test objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.1.2 The overall benchmark methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 The required test scopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.1.4 The initial configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61.1.5 The proposed monitoring tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111.1.6 The proposed infrastructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.1.7 The Key Performance Indicators test requests . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 The execution of the project - a technical summary . . . . . . . . . . . . . . . . . . . . . . . . . . . 261.2.1 The methodology in practice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 271.2.2 Overview of the combined load tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291.2.3 The System p5 configurations used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341.2.4 Online test results summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 391.2.5 Resource requirements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 421.2.6 Optimization and tuning options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471.2.7 The monitoring tool developed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 561.2.8 Using the System p5 virtualization features: thoughts for the next steps . . . . . . . 59

Chapter 2. The SAP NetWeaver BI perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 652.1 SAP NetWeaver BI overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

2.1.1 The SAP NetWeaver BI information model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 672.1.2 The SAP NetWeaver BI functions and technologies. . . . . . . . . . . . . . . . . . . . . . . 692.1.3 The SAP NetWeaver BI architecture summary. . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2.2 SAP NetWeaver BI solution configuration: the logical views. . . . . . . . . . . . . . . . . . . . . 722.3 SAP NetWeaver BI database configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

© Copyright IBM Corp. 2007. All rights reserved. iii

2.4 DB2 partitions and LPAR evolution. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772.5 The profile of InfoCubes and the population process . . . . . . . . . . . . . . . . . . . . . . . . . . 812.6 SAP NetWeaver BI processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

2.6.1 The upload process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842.6.2 The query process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 942.6.3 The rollup process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

2.7 Load distribution methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.7.1 The Best load distribution case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1042.7.2 The round robin load distribution case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

2.8 Maximizing the upload scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082.8.1 Understand the upload processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1082.8.2 Tests to select the right amount of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1102.8.3 Tests to select the right parameters for the upload scenario . . . . . . . . . . . . . . . 112

Chapter 3. The DB2 perspective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1193.1 DB2 overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

3.1.1 Instances (database manager) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1203.1.2 Database considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122

3.2 The major DB2 processes used for our test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293.3 Database Partitioning Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1313.4 Monitoring tools and scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

3.4.1 IBM DB2 Performance Expert (DB2 PE). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1333.4.2 NMON . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1363.4.3 DB2-specific monitoring scripts. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1403.4.4 DB2 checking scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

3.5 The process for redistributing the DB2 partitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1453.5.1 Step 1 - Restoring the initial image in our environment . . . . . . . . . . . . . . . . . . . 1463.5.2 Step 2 - Adding new LPARs and creating 32 new DB partitions. . . . . . . . . . . . . 1473.5.3 Step 3 - Executing the data redistribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

3.6 Balancing processing and disk usage. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1553.6.1 InfoCube aggregate table balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1583.6.2 InfoCube fact table balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

3.7 The DB2 configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1643.7.1 The DB2 instance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Chapter 4. The storage physical environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1754.1 Storage design description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1764.2 The storage and AIX file systems layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1804.3 The SAN design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1854.4 The backup and FlashCopy design and implementation . . . . . . . . . . . . . . . . . . . . . . 186

4.4.1 Backup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1864.4.2 FlashCopy. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

4.5 DS8300 internal addressing and total capacity consideration . . . . . . . . . . . . . . . . . . 190

Chapter 5. Using IBM Tivoli Storage Manager to manage the storage environment 1935.1 Introducing IBM Tivoli Storage Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

5.1.1 Backup techniques for databases. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1975.1.2 Backing up SAP using FlashCopy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

5.2 DB2 backup coordination across 32 partitions and multiple LPARs . . . . . . . . . . . . . . 2065.2.1 The process description . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2075.2.2 Influencing factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Chapter 6. Test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2156.1 On line tests: the KPI-A results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216

iv Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

6.1.1 KPI-A 7 TB results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2176.1.2 KPI-A 20 TB results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2196.1.3 KPI-A53 20 TB results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220

6.2 Infrastructure tests results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2216.2.1 KPI-1 results: flashback and roll forward. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2216.2.2 KPI-2 results: database restore and roll forward. . . . . . . . . . . . . . . . . . . . . . . . . 2236.2.3 KPI-3a results: FlashCopy with no online workload . . . . . . . . . . . . . . . . . . . . . . 2256.2.4 KPI-3b results: FlashCopy with online workload. . . . . . . . . . . . . . . . . . . . . . . . . 2286.2.5 KPI-3c results: tape backup with online workload. . . . . . . . . . . . . . . . . . . . . . . . 2316.2.6 KPI-4 results: creation of indexes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

Chapter 7. Proposal for more scalability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2417.1 Options to scale from 20 TB to 60 TB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242

7.1.1 Lessons learned from the 20 TB tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2437.1.2 Improving performance in a BI environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

7.2 The proposal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2447.2.1 Options selected . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2457.2.2 The logical architecture. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2457.2.3 The SAP architecture and the data model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2467.2.4 The DB2 environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2487.2.5 The System p5 environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2497.2.6 The storage environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2517.2.7 Tivoli Storage Manager and Tivoli Data Protection environment . . . . . . . . . . . . 2517.2.8 The physical architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252

Appendix A. The DB2 scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255DB2 monitoring scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

Script A: Upload monitoring script. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256Script B: rollup monitoring script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260

DB2 checking scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264DB2 checking - Script C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 264Output - Script D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270

Appendix B. Query variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287Query 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288Query 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288Query 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289Query 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289Query 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290Query 6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 290Query 7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Query 8 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 291Query 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292Query 10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293

Appendix C. Scripts for storage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295Do backup script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296Backup node0 script. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297backup sys3DB0 script. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298Backup sys3db1p script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301Restore. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 306

Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 309

Contents v

Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317Other publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317Online resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317How to get IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 318

Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319

vi Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

Notices

This information was developed for products and services offered in the U.S.A.

IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service.

IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not give you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A.

The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you.

This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice.

Any references in this information to non-IBM Web sites are provided for convenience only and do not in any manner serve as an endorsement of those Web sites. The materials at those Web sites are not part of the materials for this IBM product and use of those Web sites is at your own risk.

IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you.

Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products.

This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental.

COPYRIGHT LICENSE:

This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.

© Copyright IBM Corp. 2007. All rights reserved. vii

Trademarks

The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both:

AIX 5L™AIX®Domino®DB2 Universal Database™DB2®DS6000™DS8000™Enterprise Storage Server®FlashCopy®Footprint®HACMP™

Informix®IBM®Lotus®Micro-Partitioning™POWER™POWER5™POWER5+™Redbooks™Redbooks (logo) ™RS/6000®System p™

System p5™System Storage™SysBack™Tivoli®TotalStorage®WebSphere®xSeries®z/OS®1350™

The following terms are trademarks of other companies:

BAPI, ABAP, SAP NetWeaver, SAP R/3, SAP, and SAP logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries.

Oracle, JD Edwards, PeopleSoft, Siebel, and TopLink are registered trademarks of Oracle Corporation and/or its affiliates.

Snapshot, and the Network Appliance logo are trademarks or registered trademarks of Network Appliance, Inc. in the U.S. and other countries.

DataStage, Ascential, are trademarks or registered trademarks of Ascential Software Corporation in the United States, other countries, or both.

Java, JDBC, J2EE, Solaris, and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

Excel, Microsoft, MS-DOS, Visual Basic, Windows NT, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both.

Intel, Pentium, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States, other countries, or both.

UNIX is a registered trademark of The Open Group in the United States and other countries.

Linux is a trademark of Linus Torvalds in the United States, other countries, or both.

Other company, product, or service names may be trademarks or service marks of others.

viii Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

Preface

In order to improve the performance and operational efficiency of businesses worldwide, a customer using SAP® wanted to establish a global business program to define and implement a standardized, group-wide business process architecture and associated master data for the parameterization of the group software tools.

The expected growth of the number of users and the size of the database would be at a level never reached by other customers, however, so IBM® was asked to undertake the following:

� Test the application to be sure it could sustain such growth.� Prove the manageability of the solution.� Provide recommendations to optimize the infrastructure architecture.

This IBM Redbooks™ publication describes the testing that was done in terms of performance and manageability in a SAP NetWeaver® BI and DB2® environment on IBM System p™ when scaling a client’s solution to a data warehouse of 20 terabytes (TB). It also provides recommendations for an architecture to support a potential 60 TB data warehouse. The book resulted from a joint cooperative effort that included the PSSC, the IBM/SAP International Competency Center, the DB2-SAP Center of Excellence, SAP AG, and a customer.

This project involved multiple technical skills and multiple products, as described here:

� Chapter 1, “Project summary” on page 1 summarizes the entire project, starting from the proposal before any validation tests were made, through the description of the environment and options used, to the results achieved, in terms of performance and resources. This chapter can be viewed as an executive summary from a System p IT Specialist perspective.

� Chapter 2, “The SAP NetWeaver BI perspective” on page 65 and Chapter 3, “The DB2 perspective” on page 119, provide detailed views of the project from the perspectives of SAP specialists and DB2 specialists.

� Chapter 4, “The storage physical environment” on page 175 and Chapter 5, “Using IBM Tivoli Storage Manager to manage the storage environment” on page 193, describe the storage environment and the manageability issues in such a large environment.

� Chapter 6, “Test results” on page 215, provides the results of the testing in terms of performance and infrastructure management.

© Copyright IBM Corp. 2007. All rights reserved. ix

� Chapter 7, “Proposal for more scalability” on page 241, describes the available options to scale more when needed. It documents part of an ongoing project to demonstrate the scalability of this solution for a 60 TB warehouse. Here we cover the first step, Phase 1, and describe how to scale a data warehouse of 20 TB. Phase 2, scaling from 20 TB to 60 TB, will be undertaken in the future.

� Finally, several appendixes provide the tools, including DB2 storage scripts and storage scripts, that we needed to develop for this project.

The team that wrote this bookThis book was produced by a team of specialists from around the world working at the Products and Solutions Support Center (PSSC), Montpellier, France, with the support of the International Technical Support Organization (ITSO), Poughkeepsie Center. The PSSC is an IBM European technical center; its mission includes executing benchmarks and proofs of concept, as well as helping to design the architecture for complex solutions.

Christian Matthys is a Consulting IBM IT Specialist who has spent more than 25 years with IBM as a System Engineer working with large, mainframe-oriented customers in France. He spent three years as an ITSO project leader on assignment in Poughkeepsie, NY. In 2000, he joined the EMEA Design Center for On Demand Business in the PSSC, helping customer projects exploit leading edge technologies. Christian works as a project leader for the Poughkeepsie ITSO Center, leading and supporting residencies from his base in the PSSC.

David Bright has worked in the IT industry for 18 years, and joined IBM in 2002. For the last six years he has been part of the SAP technical team that implemented and now supports a very large SAP Business Suite installation for a major UK retailer. David’s primary technical responsibility is supporting the client's 6 TB SAP NetWeaver BI system, which runs on a partitioned DB2 database and AIX®.

Carol Davis is a Certified IT Specialist with the technical enablement team at the IBM SAP International Competency Centre. She has worked for IBM for more than 20 years and has experience in solutions design and development, architecture and performance, and high availability. For the past 10 years she has specialized in SAP solutions on UNIX®. She has worked with AIX since 1991, in the banking and stock market sector and later in the SAP environment. Carol has been heavily involved in SAP benchmarking, solution stress testing and scalability proofs of concept for various SAP solutions. She is the author of many white papers dealing with SAP implementation, best practices, and the use of IBM new technologies in SAP landscapes.

Fabio Hasegawa is a Senior IT Specialist who leads the DBA Distributed Services Center, IBM Application Services, Brazil. He has extensive experience working with IBM DB2, WebSphere® Application Server, WebSphere Message Broker, and WebSphere Information Integration. During his 10 years in the IT industry, Fabio has worked on projects helping

Notes:

� The tests and results provided in this book correspond to a specific application in a specific environment with specific criteria. Using the performance numbers to design another environment for another application with other criteria would be inappropriate.

� This book describes one project involving a series of tests using multiple products; it does not detail all the parameters of every product. Only the parameters capable of influencing these specific tests are discussed.

For detailed explanations about each product, refer to the appropriate product documentation.

x Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

Brazilian clients in the Telecom, Finance, Government and Banking segments. His areas of expertise include business intelligence solutions, infrastructure sizing, performance management, and designing and improving high availability solutions focused on information management services.

Philippe Jachimcsyk is a Certified I/T Specialist who has 20 years of experience with IBM in both development and technical sales support in server, networking, and storage organizations. After spending 10 years as a technical sales engineer in charge of defining comprehensive architectures and recommendations to customers evolving from traditional server-centric to distributed data infrastructure, Philippe is now the technical leader in the Storage Benchmark Center in the IBM PSSC Customer Center, Montpellier. His responsibilities cover the complete range of technologies for designing and implementing optimized SAN information system, such as Tivoli® SAN and storage management, servers, business applications, networking, storage, and virtualization products.

Steve Lockwood has 22 years of IT experience, including roles as IT Architect, Consultant, Application Development Analyst and Project Manager. He brings a diverse understanding of all elements of a project, from project management to specific technical skills to testing strategies. His core competencies lie in the development of very large scale business intelligence solutions, with emphasis on the use of SAP NetWeaver BI. Steve is currently the Senior Lead Architect with the information brand UK Software Group Services.

Thomas Marien has been with IBM for 10 years. He is an SAP Solution Architect with the UK e-Business Technical Solutions team, providing end-to-end, platform-independent solutions around SAP. In this customer-facing role, he has worked with some of the largest automotive and consumer product goods companies. Previously Thomas spent two years at the IBM SAP International Competency Center, developing sizing and architecture methodologies. He works out of Bedfont, UK.

Julien Maussion has worked on the SAN pre-sales support team in the Product and Solution Support Center (PSSC) in IBM France at Montpellier as part of the Advanced Technical Support organization for EMEA since 2003. He now works with the South West EMEA IOT TechWorks team providing Storage Virtualization Technical Support. Julien’s areas of expertise include IBM TotalStorage® Productivity Center (IBM TPC), mid-range and high-end storage solutions (IBM DS4000/ESS/DS8000™), virtualization (SAN Volume Controller), SAN and interconnected product solutions.

Stefan Ulrich is an IBM DB2 Certified Advanced Technical Expert and SAP Certified Technology Consultant, SAP NetWeaver. Stefan has been with IBM for 16 years, and has 6 years of experience in DB2 UDB and 2 years of experience in SAP projects; his area of expertise is SAP R/3® Systems with DB2 UDB.

AcknowledgementsWe express our appreciation to the many people who contributed their time and skills to this project.

Thomas Aiche PSSC, IBM France

Franck Almarcha PSSC, IBM France

Thomas Becker SAP AG, Germany

Francois Briant PSSC, IBM France

Guiyun Cao IBM USA

Jean-Philippe Durney PSSC, IBM France

Edmund Haefele IBM SAP Competence Center, IBM Germany

Preface xi

Michael Junges SAP AG, Germany

Carlo Marini PSSC, IBM France

Steffen Mueller SAP AG, Germany

Antoine Naudet PSSC, IBM France

Robert Nicolas PSSC, IBM France

Thomas Rech IBM Germany

Marc Rodier IBM France

Herve Sabrie PSSC, IBM France

Become a published authorJoin us for a two- to six-week residency program! Help write an IBM Redbooks document dealing with specific products or solutions, while getting hands-on experience with leading-edge technologies. You'll team with IBM technical professionals, Business Partners and/or customers.

Your efforts will help increase product acceptance and customer satisfaction. As a bonus, you'll develop a network of contacts in IBM development labs, and increase your productivity and marketability.

Find out more about the residency program, browse the residency index, and apply online at:

ibm.com/redbooks/residencies.html

Comments welcomeYour comments are important to us!

We want our IBM Redbooks publications to be as helpful as possible. Send us your comments about this or other books in one of the following ways:

� Use the online Contact us review book form found at:

ibm.com/redbooks

� Send your comments in an email to:

[email protected]

� Mail your comments to:

IBM Corporation, International Technical Support OrganizationDept. HYTD Mail Station P0992455 South RoadPoughkeepsie, NY 12601-5400

xii Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

http://www.redbooks.ibm.com/residencies.html

http://www.redbooks.ibm.com/residencies.html



http://www.redbooks.ibm.com/contacts.html

Chapter 1. Project summary

This chapter can be read as overview or as executive summary. The technical details are provided in the following chapters.

We recommend that you read this chapter. It summarizes the entire project, prior to delving into the details, in order to understand the overall scope of the effort. The chapter covers the following topics:

� The initial objectives and settings of the project

� The technical options, and the results relevant to these technical choices

1

© Copyright IBM Corp. 2007. All rights reserved. 1

1.1 The project proposal

This section describes the initial proposal from IBM to answer a customer request regarding the scalability of a solution; the proposal was created before any formal test was run. The testing subsequently revealed some limitations and new possibilities that changed this initial proposal.

We describe the initial proposal in order to illustrate how a project can evolve from a proposal, based on theory and experience from previous tests, and to demonstrate the value of tests.

1.1.1 The test objectives

Faced with different extensions in using SAP NetWeaver BI , a customer asked IBM to set up a SAP NetWeaver BI stress test with these three objectives:

1. Prove that the complete SAP NetWeaver BI productive solution (infrastructure, database, and SAP application) is stable with significant levels of users and data: grow from a 7 TB data warehouse to 20 TB, and grow from 100 concurrent users to 300.

2. Cover all aspects of the solution including:

a. The infrastructure

b. The SAP NetWeaver BI administrative activities

c. The impact of the layered and scalable architecture on operational activities

d. The simulation of user activity, including online and batch activities

3. Demonstrate that the infrastructure can be easily managed, thereby improving usability and satisfaction for users.

The IBM European Products and Solutions Support Center (PSSC), located in Montpellier, France, was asked to set up this SAP NetWeaver BI stress test to address the customer expectations in terms of infrastructure scalability and manageability. This was a joint cooperative effort that included the IBM/SAP International Competency Center, the DB2-SAP Center of Excellence, SAP AG, and a customer.

The test consisted of a two-stage plan to prove that SAP NetWeaver BI 3.5 can scale to manage up to 60 TB of data, and that the IBM infrastructure represented by IBM System p5™, DB2, and IBM TotalStorage products could manage the performance, throughput, and management requirements of such an SAP NetWeaver BI system, as defined by the customer.

1. The first stage of the project would test the system to 20 TB on the hardware landscape as defined by the customer, using one p5-595 and a single DS8300.

2. The findings of the first stage would provide the information necessary to design the hardware infrastructure necessary to scale to 60 TB.

Some flexibility existed within the boundary conditions to recommend improvements and tune the systems and methodologies.

As mentioned, the series of tests at 20 TB was designed to provide the basis for a follow-on test of 60 TB. However, the targeted system was so large that, from the infrastructure side, it was necessary to move toward 20 TB in several stages in order to balance the system, as well as to determine at what point in the growth of the system a specific limitation would become evident. Thus, the infrastructure tests focused on three system sizes:

� The baseline (7 TB) � A midway size (around 14 TB)

Prove the scalability and the manageability of an SAP NetWeaver BI

� 20 TB of data

� One p5-595

� One DS8300

2 Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver BI Data Warehouse

� The final environment (20 TB)

This project plan was split into three parallel projects that included the 20 TB test and two additional infrastructure risk mitigation scenarios at baseline and midway. This is the project design implemented in this test.

The following infrastructure constraints were prerequisites imposed by the customer:

� The tests would be done on the current customer operating system level of AIX 5L™ Version 5.2.

� The current customer version of DB2, DB2 V8.2, would be used.

� The hardware landscape would be limited to a single POWER5™ server, and a single DS8300.

� The test would be performed with SAP NetWeaver BI 3.5.

IBM High Availability Cluster Multiprocessing (HACMP™) and Logical Volume Manager (LVM), two solutions that can be used on a System p5 running AIX 5L, were beyond the scope of these tests.

1.1.2 The overall benchmark methodology

A project like this, involving multiple parties (IBM, a customer, and a software company), multiple skills (architecture, specialists, and project management) and multiple products (AIX 5L, DB2, SAP NetWeaver BI application, storage, performance, network, and so on) requires a proven methodology and clear project management.

Simply using the best technology does not necessarily mean that the performances obtained will meet the customer's expectations. What is absolutely necessary when conducting performance tests is a benchmark methodology with a formal framework that allows:

� Definition of roles and responsibilities� Ability to track the project� External support to help understand the environment quickly� Reutilization of the experience

Benchmarks can be divided into three time frames: qualification, execution, and conclusion, as explained here:

1. In the benchmark qualification step, decisions are made to minimize risk as much as possible.

One of the most important things in this phase is to translate the customer's expectations (for example, “fast response time” or “a system faster than my current infrastructure”) into measurable elements that will be the benchmark's goals.

The qualification phase consists of:

– Understanding customer objectives.

– Defining the benchmark goals (for example, batch execution time, online transaction processing (OLTP) average response time for a certain number of users, elapsed time for a database (DB) load, and so on) and defining how to measure them (for example: “are the application's logs suitable for that purpose or are some other software components needed?”).

Note: The midway size did not produce formal results, so the test results provided in this book are the results of the baseline and the final environment.

Chapter 1. Project summary 3

– Defining the architecture to use. The simplier the model is, then the simplier the setup, management, and problem determination are. The model must include the input part (to simulate users or start batch), the architecture or system under test, and the output part (results and performance indicators collection).

– Defining the time frame needed for benchmark execution and outlining an activities calendar.

– Identifiying the required skills, roles, and responsibilities.

2. The benchmark execution phase itself can be divided in two parts:

a. The preparation

This consists of the hardware (HW) installation, operating system installation, network and storage configuration, software (SW) installation, data load/restore and, finally, functional test. It is highly recommended to build scripts and to document all the steps executed during this phase in order to be able to reproduce the environment quickly in case of need.

b. The benchmark tests

The test phase is the core part of the benchmark in which performances are measured and tuning actions are performed. During the test phase, data collection should be activated on:

• The hardware and operating system (for example, operating system (OS) logs, Central Processor Unit (CPU), memory, Input/Output (I/O) activity)

• The disk subsystem (either from the operating system point of view or directly through subsystem tools such as IBM Enterprise Storage Server® (ESS) StoreWatch expert)

• The networks (from the operating system, or using network environment tools).

• The middleware (for example, Tivoli Performance Viewer)

• The Relational DataBase Management System (RDBMS) performance tools (for example, StatsPack for Oracle® or DB2 ControlCenter for DB2 Universal Data Base (UDB))

3. The benchmark conclusion phase can be divided into two parts:

a. The analysis part

This is based on human experience, and several tools can be used to aggregate, compare, and correlate results. It is important to match the data collected with the workload in order to have a clear understanding of the behavior of the application under test and to determine which part of the application to tune or investigate (for example, unexpected I/O-intensive RDBMS operation during a certain period could yield a missing index).

b. The change or decision step

Figure 1-1 on page 5 illustrates the main building blocks of this phase, during which the benchmark manager and team cope with rules, resources, and constraints to reach the goals (for example, to reach the desired response time).


Figure 1-1 The steps of a benchmark

A result of these steps could be a decision to go back to a new benchmark execution phase, with a new environment (hardware, software or parameters) from which to perform a new analysis.

It is important to understand that there is no “magic formula” for benchmarks, and almost every application is unique. So, for example, it is impossible to be sure in advance that a particular disk placement is the most efficient one for a specific test.

Possible improvements depend on the type and complexity of the application under test. That is, the more complex the application is, the more it is possible to improve; in contrast, it would not be worthwhile to try to improve a 10-minute task. The general rule is that the most effective improvements are achieved at the higher layers.

� Gains of 80% (or more) are not uncommon if it is possible to change the application's logic (for example, by allowing an improved level of parallelism), or to optimize the code.

� Improvements between 20% and 30% can be obtained by tuning the middleware and via the Relational DataBase Management System (RDBMS).

� Improvements of more than 10% are rarely achieved by simply tuning the operating system.

1.1.3 The required test scopes

Two types of tests were initially required by the customer:

� Reporting/Dataload (named also online) tests

� Infrastructure tests

The online tests are described in Table 1-1 on page 6.

Performance Goals

Analysis & Diagnostics

Constraints

Parameterstuning:

Change one Thing at a time Interactions may increase combinations

Architecture Changes Application design Data Modeling Recompile

Decide

Problem determination

process

Stop the test

•Ext. supp. Help •Profiling •Tracing/DBG

Where to expect maximum gain.


Table 1-1 Online tests initially required

The infrastructure tests are described in Table 1-2.

Table 1-2 Infrastructure tests initially required

For more detailed information about this topic, refer to 1.1.7, “The Key Performance Indicators test requests” on page 15.

Some of these tests were recombined or removed during their execution.

1.1.4 The initial configuration

This section describes the initial configuration and architecture defined when the project started. This configuration evolved during the tests to achieve the results; when applied, the final configurations and the reasons for change are explained in Chapter 6, “Test results” on page 215.

The DB2 initial layoutThe initial implementation uses six DB2 partitions running within one single logical partition (LPAR) or physical machine. All database objects are either limited to partition 0 (DB2 system

Test name Test description

A Simulation of 100 users and reproduction of current dataload (combined).This test represents the current production load of concurrent online queries, running alongside the current production data load; it simulates actual online users.

B Simulation of 300 reporting users.

C Simulation of 3 times the combined dataload from KPI-A.

D Combination of tests B and C.

E Simulation of 500 reporting users.

F Simulation of 5 times the current data load.

G Simulation of 5 times the current data load, and more data.

Test name Test description

Scenario 1 Flashback restore and simultaneous roll forward of 500 gigabytes (GB) of data in less than 8 hours.

Scenario 2 Database restore from tape using IBM Tivoli Storage Manager (TSM) server and roll forward of 2 TB of logs in less than 18 hours.

Scenario 3a Tape backup of a FlashCopy® in less than 8 hours independently.

Scenario 3b FlashCopy in less than 8 hours, in parallel with simulated query and data load activity.Measure query and dataload activity (mixed workload).

Scenario 3c Online tape backup, in parallel with simulated query and data load activity.Measure query and data load activity first.

Scenario 4 Rebuild indexes for 3 TB data in less than 2 hours. No online activity in progress.Mark 3 TB of indexes bad, then rebuild the indexes of those tables.Test across a spread of different tables (Persistent Staging Area (PSA), Operational Data Store (ODS), InfoCubes, aggregates).

Five p5-595 LPARs to host 33 DB2 partitions.


catalog, base tables, dimension tables), or spread across all partitions (fact tables, Operational Data Stores (ODS), aggregates, Persistent Staging Area (PSA)).

In principle, the target architecture was supposed to be run with 33 DB partitions running on five different LPARS.

� One LPAR will contain the DB2 partition 0 with only the base tables, dimension tables, and the DB2 system catalog.

� The other objects will be running on DB2 partitions 6 to 37. These partitions will be spread across four LPARS, each LPAR running eight DB2 partitions.

DB2 partitions 1 to 5 will be empty after the redistribution process and will be dropped.

The objects that are parallelized will not be spread across all 32 partitions. Depending on the estimated workload and the size, the objects will be partitioned using 8 or 12 partitions. The objects will be assigned to DB2 partitions and LPARS so that the production workload is spread across all LPARS to guarantee the maximal usage of resources.

During the tests, the database will contain two different areas of data:

� One area is made of the tables containing the current SAP NetWeaver BI production data.

� The other area will be the replicated and increased InfoCubes and ODS objects used for the scenarios.

The p5-595 logical partition initial layoutThe initial plan is to have six logical partitions (LPAR) on the p5-595.

� Five LPARs are used for DB2:

– One LPAR hosts the unique DB2 partition 0.

– Four LPARs are dedicated to the remaining 32 DB2 partitions to take full advantage of DB2 parallelism.

� One single large LPAR is used for SAP. This LPAR houses three SAP instances:

– The central instance (CI)

– One instance for InfoCube-load

– One instance for on-line query

By separating the load profiles into separate instances, we can better profile the load requirements, and control the resource distribution.

During the life-time of the proof of concept, this LPAR layout and resource distribution will change in response to knowledge gained from the project load tests. The objective will be to achieve the best design for performance and resource utilization.

For the 20 TB system design, named Sys3, the initial configuration was expected to be:

� 8 CPUs for LPAR-0

� 4 CPUs for LPARs-1/2/3/4 (AS1)

� 40 CPUs for the application server, in LPAR-5 (AS2)

The number of the CPU inside each LPAR will be adjusted dynamically during the tests to reach the best configuration once the load distribution over the components is better understood.

Figure 1-2 on page 8 describes the initial LPAR layout.

One p5-595 LPAR for all SAP components


Figure 1-2 The initial LPAR layout

Initial layout of the disk and file systems The goal of the design is to implement a DB2 multi-node shared nothing architecture. That means, ideally, that each DB2 partition will have dedicated CPUs, memory and disks. This configuration gives the following advantages:

� It offers a nearly linear scalable architecture with the number of nodes.

� It increases the DB2 parallelism (and potentially the performance).

� It gives a flexible architecture: it is possible to add and/or move nodes according to the needs.

For the 20 TB configuration we will use 33 DB2 partitions, and we will have one DS8000 with 64 ranks for a total of 512 disks. Our choice is to have 4 ranks dedicated to each DB2 partition. Each DB2 partition will have 3 different file system types: one type for data and index tablespace, one type for temporary tablespaces, and one type for logging. DB2 will be spread over those 4 logical unit numbers (LUNs); one LUN per rank is created. This disk layout will be applied for every DB2 partition. Figure 1-3 on page 9 describes the DS8000 layout.

IBM p595 64 CPU 1.9 GHZ 256 GB RAM

LPAR 5 (SAP)

CI

AS1

AS2

LPAR 0 (DB2)Partition 0

LPAR 1 (DB2)Partition 6 to Partition 13




One single DS8000 with 512 disks


Figure 1-3 The DS8000 initial layout

With respect to DB2, the file system layout has to adhere to several requirements. These are performance, manageability during normal operation, dependencies on other system components (for example IBM Tivoli Data Protection (TDP), or DS8300), and the flexibility to change the DB2 partition layout or to move partitions to different LPARS or new physical hardware.

The major drivers for the file system layout are the DB2 components to be used on each partition. These are:

� The active log files.

� The database containers for the data and the indexes.

� The database containers for the temporary data.

The IBM Tivoli Storage Manager and IBM Tivoli Data Protection - initial design

One physical logical partition will serve as the Tivoli Storage Manager server partition and control access to the tape library and the 16 LTO3 tape drives. The connection from the Tivoli Storage Manager server to the tape library and drives will be achieved across 16 fibre channels (FC). All FlashCopy target volumes are attached to the Tivoli Storage Manager server using 16 FC connections. Each of the 5 logical partitions hosting the DB2 partitions is connected to its database volumes (FlashCopy source volumes) using four FC connections. In addition, each of the database server partitions has Storage Area Network (SAN) access to the 16 LTO3 tape drives using 4 FC connections. They will be used during LAN-free online backups to and/or LAN-free restores from the tapes. We will use 16 FC adapters (each of 2 GB/sec.) to drive the 16 LTO3 drives. Each LTO3 drive is able to handle 80 MB/sec. (read or write) without compression; because we hope to be able to have a 2:1 compression ratio, we need a minimum of 80x2=160MB/sec. for the transfer rate.

DS8000

DB2 Partition 6 LUN1Log Data+Index








LUN

LUN

RANK

SAN

LPAR 1

DB2 Partition 6

DB2 Partition 7


Figure 1-4 describes the flows for backup and restore mapped on the hardware infrastructure.

Figure 1-4 The storage infrastructure for backup and restore

Several scenarios will be exploited in the test cases (the names of the scenarios are explained in 1.1.7, “The Key Performance Indicators test requests” on page 15):

� Server-free backup (scenario 3a)

This scenario is shown by A1, A2, and A3 on Figure 1-4.

During a server-free backup scenario, the database server will initiate a write suspend to the database. FlashCopy operations will be started afterwards for all database sources to their corresponding target volumes (A1). Then the database server resumes normal operation. The Tivoli Storage Manager server then accesses the FlashCopy target volumes (A2), starts the DB2 database, and a backup of the FlashCopy image is done to all the 16 tapes in parallel using Tivoli Storage Manager (A3). The overall backup process is controlled using Tivoli Storage Manager for Hardware and Tivoli Data Protection for Enterprise Resource Planning.

� LAN-free backup (scenario 3c):

This scenario is shown by B1 and B2 on Figure 1-4.

In this scenario, the individual database partitions will be backed up on each of the logical server partitions. The storage agent (Tivoli Storage Manager for SAN) is installed on each logical server partition having access to the LTO3 drives. Since there are 5 logical server partitions, 33 database partitions and 16 LTO3 drives available, there are resource constraints and the individual backup sessions have to be scheduled. Each database partition reads from its containers on the database source volumes (B1) and writes the data directly to a tape session (B2) using the DB2 backup command, Tivoli Storage Manager for ERP and Tivoli Storage Manager for SAN.

72 Fiber Channel connections

LPAR5TSM

BackupServer

Production DB2 servers Other server

SAN

A1

A2

A3

358416 LTO3

B1

x4

B2 C1

x4 x4 x4

x4 x4

x4x4 x4 x4x24 x16

x16

x16

C2

DS830020 TB

640 disks

Tape zone

Disk zone

LPAR1DB2 partition 0

LPAR2DB2 partitions

6 to 13

LPAR3DB2 partitions

14 to 21

LPAR4DB2 partitions

22 to 29

LPAR5DB2 partitions

30 to 37

A1


� LAN-free restore (scenario 2):

This scenario is shown by C1 and C2 on Figure 1-4 on page 10.

In this scenario, on each of the logical server partitions the individual database partitions will be restored. The storage agent (Tivoli Storage Manager for SAN) is installed on each logical server partition having access to the LTO3 drives. As there are 5 logical server partitions, 33 database partitions and 16 LTO3 drives available, there are resource constraints and the individual restore sessions have to be scheduled. Each database partition reads from a tape session (C1) and writes the data to the database containers (C2) using the DB2 restore command, Tivoli Storage Manager for ERP and Tivoli Storage Manager for SAN.

1.1.5 The proposed monitoring tools

The following tools are expected to be used during the tests:

� For LPAR monitoring:

NMON V101 for AIX 5L Version 5.2 is used to monitor the LPARS; one NMON instance is alive in the background to collect all the relevant information (CPU, RAM, DISKS, PROCESS, IO, NETWORK, and so on). The information is recorded on a file; every 4 hours the cron daemon restarts this background process. This way, the files generated are reasonably small and easy to analyze.

� For real-time system monitoring:

The XMPERF2 graphical monitor will be used for real-time system monitoring. Recordings and snapshots from the graphical monitor will also be used to track load distribution across LPARs and over the multiple components on System p5.

� For global processes monitoring:

Workload Manager (WLM) is configured to group the processes of multiple components, sharing an LPAR, into component groups. Running in passive mode, WLM allows the monitoring of the process groups to determine the resource utilization per DB2 partition and per SAP application server instance.

� For storage monitoring:

Two tools will be used: Tivoli Productivity Center for Disks (TPC) and Performance Data Collection Utility (PDCU).

� For SAP monitoring:

Throughput information for data loads will be available via InfoPackages monitoring and InfoCubes data request content details. This information should also be available in the statistics for the InfoCubes being loaded. SAP needs to ensure that the statistics are activated for each InfoCube.

� For transaction statistics:

An external reporting tool will be used to capture online transaction response times.

� For DB2 monitoring:

DB2 Performance Expert for Multiplatforms3 monitors the DB2 buffer quality, response times, cache overflows, locks, and other values. The DB2 Performance Monitor allows

1 The NMON tool was developed to support benchmarks and performance tuning for internal use though available externally. It is a free tool designed for AIX and Linux® performance specialists to use for monitoring and analyzing performance data. More information at http://www-128.ibm.com/developerworks/aix/library/au-analyze_aix/

2 XMPERF is at the same time a graphical monitor, a tool for analyzing system load, and an umbrella for other tools. More information at http://sysmhr1.au.ibm.com/doc_link/en_US/a_doc_lib/perftool/prfusrgd/frame4_toc.htm

3 More information about this product at http://www-306.ibm.com/software/data/db2imstools/db2tools/db2pe/db2pe-mp.html


http://sysmhr1.au.ibm.com/doc_link/en_US/a_doc_lib/perftool/prfusrgd/frame4_toc.htm

http://www-128.ibm.com/developerworks/aix/library/au-analyze_aix/

http://www-306.ibm.com/software/data/db2imstools/db2tools/db2pe/db2pe-mp.html

detailed analysis of historical data and will be used as central DB2 performance repository. The tool allows SQL queries against the repository and thus extensive analysis.

� The Tivoli performance and throughput will be determined from a combination of DB2 backup run-times and size, NMON I/O rates, and Tivoli log data.

1.1.6 The proposed infrastructure

The infrastructure is triplicated for ease of use and three systems are set up: Sys1, Sys2, and Sys3.

Hardware

The infrastructure, shown in Figure 1-5 on page 13, is divided into three different logical environments:

� The baseline (7 TB case) is made of one p5-595 (POWER5, 1.9 GHz Turbo) Sys2 and the DS8300 Sys2 with 384 disks, with a tape attachment to the IBM 3584 library.

� The midway size (14 TB case) is made of one p5-595 (POWER5, 1.9 GHz Turbo) Sys1 and the associated DS8300 Sys1 with 512 disks, with a tape attachment to the IBM 3584 library.

� The final test configuration (20 TB case) proposed is made of one p5-595 (POWER5, 1.9 GHz Turbo) Sys3 plus one LPAR of the p5-595 Sys2 and the associated DS8300 Sys3 with 512 disks, with a tape attachment to the IBM 3584 library. A copy of the data of the Sys3 environment is available on a second DS8300 Sys3bis with 512 disks; the copy is done by Metro Mirror technology. The Metro Mirror technology is a function of a storage server that maintains a consistent copy of a logical volume on the same storage server or on another storage server. All modifications that any attached host performs on the primary logical volume are also performed on the secondary logical volume.

A monitoring agent and the SAP Graphic User Interface (GUI) 6.4 are installed on 10 injectors while the injector console is installed on a PC in the benchmark room. The injectors are made of IBM xSeries® x300 with 1 GB RAM.

Note: At the time of the tests, four System p5 595 models are available:

1. A standard model, with a POWER5 technology and a processor clock rate at 1.65 GHz

2. A turbo model, with a POWER5 technology and a processor clock rate at 1.9 GHz

3. A standard model, with a POWER5+™ technology and a processor clock rate at 2.1 GHz

4. A turbo model with a POWER5+ technology and a processor clock rate at 2.3 GHz

Only the turbo models were used in this project. The tests started with the POWER5 technology models, then they were upgraded to the POWER5+ technology. To avoid any confusion the p5-595 models are identified in this book with the technology used.


Figure 1-5 The three infrastructure environments: the baseline, the midway and the final configuration

SoftwareFigure 1-6 describes the main software installed to run this project.

Figure 1-6 The software used for the project

SAN

SYS2(7TB)

DB2

SYS1(14TB)

DB2

SYS3(20TB)

DB2

SYS2TSM

DS 8000(Sys 1)

512 disks

DS 8000(Sys 2)

512 disks

DS 8000(Sys 3)

512 disks

DS 8000(Sys 3bis)384 disks

LTOTape

Library16 Drives

x 16 Metro MirrorZone 4

Serv

ers

Sto

rage

SAP GUI 6.4

Windows XP

SAP BW 3.5

SAP Kernet 6.40

DB2 Client 8.2

AIX 5.2ML 6

DB2 UDB ESE DPF8.2

AIX LVM(JFS2)

SDD / MPIO

AIX 5.2ML 6

TSM CLIENT5.2

StorageAgent

Injectors

TCP/IP

TSM Server5.3

AIX 5.2ML 6

LT0316 Drives

FC through SAN

AS

DB

DS8300

FC through SAN

TCP/IP

TCP/IP

TCP/IP


For each of the environments, the following operating systems and software levels are initially installed:

� AIX 5L Version 5.2 ML 6 (For TDP for FlashCopy the minimum level required is: AIX 5L Version 5.2 (32 bit or 64 bit) with min ML 5 and PTF U488817 for APAR IY44637 or AIX 5L Version 5.3 (32 bit or 64 bit). Use of AIX virtual I/O is not supported.)

� DB2 8.2.2 ESE DPF (FixPak 12)

– DB21085I Instance db2eb8 uses 64 bits and DB2 code release SQL08025 with level identifier 03030106.

– Informational tokens are DB2 v8.1.1.112, s060429, U807381_15114, and FixPak 12.

Actually, the FixPak 9 was installed at the beginning of the project, then we moved to FixPak 12.

� IBM Tivoli Access Manager (TAM) WebSEAL 5.1

� DS8000 Storage manager 5.1.0.1369

� DS8000 CLI 5.1.0.370

� SAP software level: SAP NetWeaver BI 3.5 SP 12 Kernel Release 640 patch level 129

� Injection front-end

– Windows® XP Professional– An external tool

The database manager configuration, the database configuration and additional setup parameters such as DB2 Registry, db2cli.ini or special SAP/DB64 profile parameters will be set according to SAP implementation recommendations and configuration of the current customer production database.

� For Tivoli Storage Manager and Tivoli Data Protection (TDP,) the following levels are used:

– On all LPARs the following levels are installed:

• TSM4HW: 5.3.1.2 (eGA 17.03.2006)

• TSM4mySAP: 5.3.2.2 (eGA 17.03.2006)

• TSM API: = 5.3.0.0

• Pegasus CIM client: 2.3.2.2

– On one LPAR the following components are installed:

• CIM (Common Information Model) agent DS Open API: 5.1.0.47

• We started with DS8300 level: 6.1.0.38

Figure 1-7 on page 15 describes the software stack needed for Tivoli Storage Manager and TDP.

4 DB6 is the SAP short name for DB2 for UNIX and Windows SAP official name. The data base server and the application server can be UNIX (AIX, Linux, Solaris™, HP-AIX) or Windows, This is the same than the IBM name: IBM DB2 UDB for UNIX, Windows and Linux.


Figure 1-7 The Tivoli Storage software stack

NetworkAll the servers used for the test are connected to a private Virtual Local Area Network (VLAN). The IBM and customer team agreed to set up a Virtual Private Network (VPN) connection with port-forwarding to allow the customer team to connect to the systems. An external user can connect to those systems with a proper authentication to the VPN concentrator located in the IBM location. Once authenticated through a Web site the user will be using a CISCO applet on his local machine that will listen locally on a specific port and redirect the service to the target server with the right port.

1.1.7 The Key Performance Indicators test requests

The Key Performance Indicator (KPI) work units are divided into two categories, those focusing on load and those focusing on infrastructure and maintenance. Several of the infrastructure tests also include load components which will be repeats of load tests, adding possible overhead for infrastructure activities.

� Data load

The current production load monitored is 25 million records per hour for load, and 5 million records per hour for aggregate loading.

The InfoCube load activity will be generated by process chains provided by the customer. The InfoPackages triggered by the process chains will load data into the InfoCubes. We will start with a total of 3 data targets (InfoCubes) per InfoPackage. Data for the InfoCubes will be sourced from the underlying ODS objects.


The upload procedure to the InfoCubes includes the processing of complex update rules (start routines). The load includes the update of the InfoCubes and the rollup of existing aggregates.

� Online query load

According to an analysis of the query statistics it was observed that 80% of all queries return less than 1,000 rows, another 17% return less than 10,000 rows. The distribution will be reflected in the queries that will be designed for the test (80% returning < 1,000 rows, 20% returning < 10,000 rows), At the same time, the analysis has shown that about 80% of the queries hit aggregates, 20% run against fact tables (highly selective).

The online load will be generated via an external tool. Scripts simulate online users; the scripts for queries have been provided by the customer and SAP has built the scripts to execute these queries in a simulated multiuser environment. The scripts simulate HTML queries.

80% of the queries will be targeted against aggregates and 20% of the queries are intended to hit the fact tables.

� Concurrent load and query test

The scenarios include simulation of online users executing queries and simulation of the load activity needed to update the data InfoCubes. The combined profile will simulate both online queries and InfoCube loading. InfoCubes being loaded will never be simultaneously used as the targets for online queries.

The following sections detail the requested tests.

Scenario ADescriptionSimulation of 100 users and reproduction of current data load (combined).

Purpose of this testThis test represents the current customer production load of concurrent online queries, running alongside the current production data load. It simulates actual online users.

Achievement criteriaTo meet this test KPI the following must be achieved:

– Data load requirement: 25 million records/hr. for load and 5 million/hr. to aggregate.

– Query load requirement: 50 navigations per minute with an average response time of 20 sec.

ExecutionThe application servers will be configured in two groups, with the CI as a standalone service. The two application server groups will each have a different load focus: one group for online queries, and one for data load.

Using the information on load profile, gathered during the calibration and load profiling step, resources will be assigned to two workload application server instances. One instance will be used for Query and one instance for data load. The CI will be a separate instance.

The resources will be redistributed over the components, application servers and the DB servers, to achieve the target joint KPIs.

The base or reference test


DeliverableThis test will provide:

– CPU utilization for the total load, and for each of the separate components

– Load distribution profile

– Configuration and tuning information

Scenario BDescriptionSimulation of 300 reporting users.

Purpose of this testThis test represents three times the current production load of concurrent online queries. This test provides the opportunity to analyze the expected high-end query load on the current hardware landscape, without additional concurrent load.

Achievement criteriaTo meet this test KPI the following must be achieved: Query load requirement: 150 navigations per minute with an average response time of 20 seconds.

ExecutionUsing the information from the load profile, documented during the calibration and load profiling step, the application servers will be configured and resources will be assigned to the application server instances and the DB2 LPARs. The CI will be a separate instance.

The resources will be redistributed over the components: application servers and the DB servers, to achieve the target KPI.




– System configuration and tuning information

– SAP instance configuration

Scenario CDescriptionSimulation of three times the current production data load.

Purpose of this testThis test represents three times the current data load production system. This scenario provides the basis for analyzing the resource requirements of the high-end InfoCube load throughput, expected for the current hardware landscape, without additional concurrent load.


– Data load requirement: 75 million records/hr for load

– 15 million/hr to aggregate

Three times the number of users of the base test

Three times the combined load (queries, load, aggregate) of the base test


ExecutionThe challenge of this test is to balance the parallelization of the data load jobs and effectively distribute the system resources to application server and DB server nodes.

The starting point will be based on the data load profile done in the calibration tests, and validated in test A. The configuration will then be adjusted at all levels (storage, server, DB, application) until the KPI, or the best possible results are achieved.





– SAP configuration including level of parallelism and number of instances

– Number of SAP work processes (WP) utilized and SAP work processes available

Scenario DDescriptionCombination of scenarios B and C: 300 users (three times the current online load) with three times the current data load.

Purpose of this testThis test represents the expected high-end load for the hardware configuration being tested in this test.


– Data load requirement: 75 million records/hr for load and 15 million/hr to aggregate.

– Query load requirement: 150 navigations per minute with an average response time of 20 seconds.

– The load test scenario is expected to run up to 12 hours.

ExecutionThis is the extreme challenge for the load scenarios. The objective is to balance the two KPI requirements against each other. The InfoCube load activity that involves one batch process spawning multiple dialog work processes, is expected to be extremely aggressive due to the complete lack of wait time. This batch work will have to be controlled to keep it from degrading the query response times, while at the same time ensuring enough capacity is available to achieve its targeted throughput. The query load will be the most sensitive as the KPI is based on average response times which preclude extreme load oscillation that can result from interference from the InfoCube loading.

As in scenario A, the application servers will be configured in two groups, with the CI as a standalone service. The two application server groups will each have a different load focus: one group for online queries, and one for data load. In this way, resource allocation to the different workloads can be better controlled.

The starting position will be based on information gathered from the previous tests, and on the current knowledge of the load profiles.

During reiterative concurrent runs, the system will be modified until either the joint KPIs are achieved, or the best balance possible identified.

Three times the combined load with three times the number of users






– SAP Instances, WP available, WP utilized

Scenario D53DescriptionRepeat of scenario D after an upgrade from AIX 5L Version 5.2 to AIX 5L Version 5.3.

Purpose of this testThe purpose of this test is to quantify the contribution of AIX 5L Version 5.3 and SMT to the high-end load expectation for this hardware landscape in regard to the load scenarios.

Achievement criteriaSame as scenario D: Query load requirement: 150 navigations per minute with an average response time of 20 seconds.

ExecutionSame as scenario D.






Scenario 1DescriptionFlashback and simultaneous roll forward of 500 GB of data in less than 8 hours.

Purpose of this testThis test measures the FlashCopy restore and roll forward of the DB2 database simulating a recovery within the scope of a single production day.


– Flashback of FC Backup to Test environment

– Roll forward through 500 GB of log files

– Successful start of SAP System

– Completion timestamp is the first SAP transaction

Scenario D with a new AIX 5L version

Infrastructure management test: Flashback and roll forward


ExecutionThe test requires a large amount of log files available for the database. The test will run for a long period of time and therefore reset and multiple runs are time-consuming. The possible methods are:

– Take a backup of database from FlashCopy using TDP for hardware.

– Generate 500 GB TB log files by changing the table entries in the database by using the load scenario or with direct SQL DML statements against the database.

– Restore the database by using the FlashBack to the system.

– Roll the database forward through 500 GB logs and optimize the process by adjusting the DB parallelism for the roll forward processes.

– Stop the roll forward and activation of each partition manually.

– Start of SAP system and perform transactions.


– CPU, memory and disk utilization for the test

– DB2 and OS configuration and tuning information

– Output of DB2 utilities gathered with “list utilities,” “roll forward query status” and information in the db2diag.log

– Total runtime of the test

– Roll forward query status for proof of log sequence

Scenario 2DescriptionDatabase restore from tape using Tivoli Storage Manager and roll forward 2 TB of logs in less than 18 hours.

Purpose of this testThis test measures the tape restore and roll forward of the DB2 database and simulates the roll forward of 3 days’ redo logs (disaster recovery). The focus of this test is on parallel restore time from tapes.


– Restore the database from tape and roll forward the database through 2 TB logs

– End of roll forward

– Successful start of SAP system

– Total KPI time = restore time + roll forward time < = 18 hours

ExecutionThe test requires a large amount of log files available for the database, a well-defined, reliable and performance-optimized setup to retrieve and apply the log files. Massive parallel restore requires access to multiple parallel tape devices. The possible methods are:

– Tape backup of database from FlashCopy using TDP for hardware

Infrastructure management test: DB restoration


– Generate 2 TB log files by changing the table entries in the database by using the load scenario, or with direct SQL statements against the database.

– Drop of existing database to simulate a disaster recovery on a clean system.

– Restore the database to the clean system with fast container allocation (DB2_USE_FAST_PREALLOCATION)

– The restore will be done directly to the database server using a “LAN-free” restore.

– Roll forward the database through 2 TB logs and optimize the process by adjustment of the database and parallelism of roll forward processes.

– Stop the roll forward and activation of each partition manually.

– Start of SAP system and perform transactions.




– Output of DB2 utilities gathered with “list utilities,” “roll forward query status” and information in the db2diag.log

– Total runtime of the test.

Scenario 3aDescriptionTape backup of a FlashCopy in less than 8 hours (no online workload).

Purpose of this testThe purpose of this test is to show the feasibility of backing up a FlashCopy image of the 20 TB database in a reasonable time to tape via Tivoli Storage Manager. Achieving a high throughput rate avoids the need for multiple FlashCopy target sets within the DS8000 storage server: the backup will be archived completely to tape before starting the next backup cycle on the FlashCopy volumes. The test simulates a cycle backup window of 8 hours.


– Successful backup of a 20 TB database (located on FlashCopy volumes) to tape in less than 8 hours.

The FlashCopy background copy does not need to be completed at the end of the 8 hours.

ExecutionThe management of both the FlashCopy process and the database backup to the 16 LTO3 tapes will be managed using Tivoli Storage Manager for Hardware and eventually using additional scripts: after the database I/O has been suspended, FlashCopy has been invoked, and all database partitions have resumed with productive operation, the LUNs containing the FlashCopy of the database images will be accessed by another single serveror logical partition.

The backup of the database will be started as a “LAN-free” backup in multiple “waves”: The DB2 backup command will be invoked in parallel on multiple database partitions making it possible to access the 16 LTO3 tape drives in parallel. Assuming a total of 32

Infrastructure management test: DB backup of a FlasCopy


database partitions and 16 LTO3 drives, either a large amount of database partitions will be backed up using a low degree of sessions or vice versa.

At minimum, two waves of 16 parallel DB2 backups of a database partition will be started (each of them using one single session to Tivoli Storage Manager on one LTO3 drive) sequentially.

The FlashCopy source is on a separate rank within the storage system from its FlashCopy target.

DeliverableThis test will provide a description of the backup infrastructure including:

– Tivoli Storage Manager setup and device definitions

– DB2 backup command specification (number of sessions used, level of parallelism for each database partition) and description of the backup scheduling process out of Tivoli Storage Manager for Hardware

– Tivoli Storage Manager Server and StorageAgent sizing recommendations for the CPU utilization and the fibre channel adapter throughput

Scenario 3bDescriptionFlashCopy in less than 8 hours, in parallel with simulated query and data load activity.

Purpose of this testThis test measures the performance implications of a FlashCopy backup during production. The background copy process of the FlashCopy has to be completed in less than 8 hours.


– Incremental FlashCopy backup of database in the production system with an average 8 hr per day (assuming 200 million records = 500 GB log).

– Online Query load requirement: 25 navigations per minute with a targeted average response time of 20 sec. Measurement of the % of degradation due to the FlashCopy activity will be reported as part of the deliverable of this KPI.

– Data load requirement: 25 million records/hr.

ExecutionThis test is extremely difficult to simulate in a repeatable manner. The FlashCopy, running in the background, will destroy normal means of resetting the system for a restart. The timing is also critical because the system must first be put in a state where it differs from the current FlashCopy at approximately the same ratio the production system would differ after 8 hours of production. This is assumed to be the equivalent of 200 million records (25 million /hr. * 8 hrs).

Once this state is achieved, the incremental FlashCopy will be started, and then the load scenarios. Measurement and tuning will take place on a system in a constant state of change.

To repeat this test a backup either from tape or PPRC must be used to restore the status and then the 8 hour production difference re-established, or a means of resetting via data removal must be identified. To create an 8 hr delta via InfoCube-load activity will also be

Infrastructure management test: FlashCopy of a database


very time-consuming. An alternative data load is also desirable to reduce the preparation time for this test scenario.

– Take an offline backup of the database to tape (Scenario 3, if FlashCopy is taken offline).

– Create temporary target tables that are striped across all DB2 partitions, all LPARs, and all DS8300 ranks.

– Take a full offline FlashCopy and then activate incremental recording.

– Use DB2 load to insert 200 million records (expected throughput of production InfoCube-load activity in 8 hrs) to create delta.

– After delta is established, take the incremental Flashcopy and measure the impact on production. The incremental FlashCopy must be finished after 8 hours (KPI measurement).

– Repeat of test

• Option 1: Wait for end of incremental FlashCopy, continue with the next steps. The database will not be identical to the one when starting the test, but the behavior should be comparable.

• Option 2: Drop and rebuild temporary table, use customized jobs to reset the InfoCube load data, and continue with the next steps.

Using the DB2 load tool to create the change delta between the FlashCopy backup and the online database, will speed up the preparation time for the test. Using a temporary target table, and customized InfoCube-load delete jobs, should allow the system to be reset to the original status more quickly than a restore by tape.






– Response time and throughput comparison data to normal production

Scenario 3cDescriptionOnline tape backup, in parallel with simulated query and data load activity. Measure query and data load activity first.

Purpose of this testThis test measures the performance implications of an online system backup to tape. The degradation will be compared between a test run with online backup and a previous test with the same load specification without backup activity.


– Online query load requirement: 25 navigations per minute with an average response time of 20 sec

– Data load requirement: 25 million records/hr

Infrastructure management test: tape backup when in production


– Measurement of the degradation percentage due to the backup activity will be reported as part of the deliverable of this KPI.

ExecutionThe backup of the database will be started as LAN-free backups on each logical server partition in multiple waves. The DB2 backup command will be invoked in parallel on multiple database partitions allowing access to 16 LTO3 tape drives in parallel. Assuming a total of 32 database partitions and 16 LTO3 drives, either a large number of database partitions will be backed up using a low degree of sessions, or vice versa. At minimum two waves of 16 parallel DB2 backups of a database partition will be started (each using one single session to Tivoli Storage Manager session per LTO3 drive) sequentially. The impact of the online database backup on the workload has to be limited allowing DB2 to throttle its utilities. The performance impact has to be measured.

The DB2 Registry Variable DB2_OBJECT_TABLE_ENTRIES has to be set to an optimum value. To set up this parameter at the right value, the possible methods are:

– Configure DB2 to allow throttling of utilities.

– Start an online backup of the database directly to tape using TDP for ERP.

– Adjust the performance impact of the backup by changing the impact priority during backup runtime.






– Response time and throughput comparison data to normal production

– StorageAgent throughput, dependent on the backup throttling

– Measurement of the throughput trade-off between production activity and backup activity

Scenario 4DescriptionRebuild indexes for 3 TB of data in less than 2 hours. No online activity in progress. Mark 3 TB of table indexes bad, and then rebuild the indexes of those tables. Test across a spread of different tables (PSA, ODS, Infocubes, aggregates).

Purpose of this testThis test measures the parallel index creation, for example after an I/O error or after a roll forward scenario and first access to the database, which may result in massive index creation when indexes were created, recreated, or reorganized during roll forward.


– Recreation of indexes of 3 TB of data. The data volume will be cumulated over tables, beginning with the largest tables in the database.

Infrastructure management test: DB index rebuilding


ExecutionThis test requires optimal performance and stability of the whole system environment. The test will stress storage, operating system, and database software.

The test will start with marking indexes as bad using db2dart.

After DB2 restart the indexes will be recreated either at first access or database activation.

During the test no workload will run against the database and configuration changes to the database for optimal performance will be made.

The possible methods are:

– Identification of tables for the test

• Begin with largest tables down to smaller tables until 3 TB data volume is reached.

• Amount of data will be measured based on the SAP table "db6tsize".

– Mark indexes of identified tables as bad

– Reconfigure database for optimal performance; for example:

• Decrease bufferpools to get more sort memory.

• Increase sort memory to avoid spilling to disk.

• Use additional temp space to spread I/O workload.

• Use various DB2 parameters to improve performance, for example INDEXSORT, and INTRA_PARALLEL.

– Optimize operating system settings for optimal performance (optional).




– DB2 snapshots performed during the test

– Total runtime of the test

Summary of the testsTable 1-3 offers a final summary of the tests.

Table 1-3 Tests effectively run

Test name Test description Objective

On-line KPIs

KPI-A 7TB � Simulation of 100 users.

� Test combination of queries, records loads and aggregate loading.

� AIX 5L Version 5.2

� Load of 25 million records per hour

� Aggregate loads of 5 million per hour

� 50 navigations per mn for queries

� 20 sec. average response time for queries


1.2 The execution of the project - a technical summary

This section focuses on the SAP application servers used for the combined load and the System p5 systems of both application server and database. It includes the approach taken to achieve maximum efficiency in system utilization, the means of measurement, and summarizes resource utilization. This section follows the System p5 servers through the landscape changes required to achieve the necessary combined load KPIs and to take advantage of technology changes beneficial to performance.

KPI-A 20TB � Simulation of 100 users.







KPI-A53 20TB � Simulation of 100 users.







Infrastructure KPIs

KPI-1 � Flashback and roll forward of 500 GB of data

� < 8 hours

KPI-2-1 � Restore 2 TB of data from tapes � Restore time + roll forward time < 18 hours

KPI-2-2 � roll forward of 2 TB of log simultaneously with KPI-2-1

KPI-3a � Tape backup of FlashCopy � < 8 hours

KPI-3b � FlashCopy with simultaneous queries and loads

� Tape backup < 8 hours

� Observe the queries and loads response times.

KPI-3c � Online tape backup with online activities � Observe online activities response times

KPI-4 � Rebuild 3 TB of indexes � < 2 hours

Note: 1.2, “The execution of the project - a technical summary” on page 26 presents a few results for the KPI-D; these results are not detailed in Chapter 6, “Test results” on page 215. Actually the KPI-D was not entirely successful and they will be included with another architecture in Phase 2 of the project, outside the scope of this book. However, we provide the partial results in the summary section because they help to understand the full process of the project.

Test name Test description Objective


1.2.1 The methodology in practice

In this test, the starting point was the current hardware and software infrastructure in production at a customer site. The hardware and software levels provided the baseline at the start of the project, and determined the progress through the infrastructure migration. The full project, described in Figure 1-8, consists of two phases: Phase 1 is built with a SAP NetWeaver BI data warehouse of 20 TB; Phase 2 will be built with a SAP NetWeaver BI data warehouse of 60 TB. This book covers only Phase 1.

Figure 1-8 The two phases of the project

� In Phase 1 of the project, covered by this document, the System p5 servers were limited to LPAR configurations, and the requirement to distribute system resources in a static manner. During the life-cycle of the project, it was of course possible to reconfigure the server environment for better load balancing. However, it was not feasible, nor desirable, to attempt any dynamic reconfiguration during a test run. In this case, a non-optimal CPU distribution over LPARS, or a memory overcommitment resulting in poor performance, the configuration was modified for the subsequent runs. As in a typical production environment, the system configuration selected needed to cope with the varying load profiles of the production day.

In the baseline phase of the project, the load balancing was done without the use of hardware multi-threading (SMT). The configuration of the SAP environment was tailored to reflect the underlying hardware infrastructure such that during the migration to AIX 5L Version 5.3 and SMT, the SAP instances were also rebalanced to improve the scalability.

� From experience gained in Phase 1 of the project, a proposal for a viable micro-partition implementation was designed, to allow for dynamic processor sharing during the highest of the load tests. Phase 2 will be based on micro-partitioning. This is detailed in 1.2.8, “Using the System p5 virtualization features: thoughts for the next steps” on page 59.

In addition to these changes over the landscape, the infrastructure changes described in Figure 1-9 were validated.

Important: If you are familiar with the SAP NetWeaver BI information model and terminology, the following sections can be well understood. If you are not, we recommend you read 2.1, “SAP NetWeaver BI overview” on page 66 before you read these sections.

This book covers the tests of Phase 1

AIX 5L 5.2DB2 V8

AIX 5L 5.3DB2 V8

AIX 5L 5.3DB2 V9

AIX 5L 5.3micropartitions

DB2 V9

Phase 1 Phase 2

20 TB 60 TB7 TB


Figure 1-9 Infrastructure evolution during Phase 1 execution

1. The initial hardware for Phase 1 was a System p5 595 (POWER5, 1.9 GHz Turbo) with 256 GB memory. This proved to be a limitation for the database and additional memory was added for the database LPARS.

2. The database storage server was upgraded to 512 GB with the new turbo technology, increasing the speed of the FlashCopy and increasing the bandwidth. This played more of a role in the infrastructure tests than in the load tests because the original storage server was not near its limits, and therefore the additional capacity had little effect on throughput or run times of the combined load tests.

3. The database LPARs were then upgraded to the new p5-595 (POWER5+, 2.3 GHz Turbo) technology, with a speed bump from 1.9 to 2.3 GHz. This improvement was visible in both database heavy components of the combined load: query and aggregation.

4. Tests were done with AIX 5L V5.3 to measure the value of upgrading the operating system.

5. In preparation for the move to the new architecture in Phase 2, DB2 was upgraded to Version 9 and the database was tested on the new hardware for verification.

The methodology of moving across the landscape was to maintain a reference point for each move. A change in test methodology, a change in OS settings or configuration parameters, a change in middleware settings or versions, were done with a back-link to the previous runs, as shown in Figure 1-10. The intention is to insure that run data can be compared across the test scenarios by means of either direct compatibility, or extrapolation.

Figure 1-10 How to compare results between runs

Upgrade of DB server

from 256 GB to 512 GB

Technology upgrade for the Storage

server(DS8000 turbo)

Technology upgrade for

the DB server(p595

POWER5+)

Upgrade of DB2 from V8 to V9

Phase 1 implementations

DB server: p595 POWER5 with 256 GBStorage server: DS8000DB2 V8, AIX 5.2

Initial layout:

DB2 LPARs moved to

new architecture

Upgrade of AIX 5L

from V5.2to V5.3

Addition to Phase 1

Preparation to Phase 2

p595POWER5

DBv8 512GB

p595POWER5

DBv8256GB

p595POWER5+

DBv8 512GB

p595POWER5+

DBv9512GB

Load Test1

Load Test5


1.2.2 Overview of the combined load tests

The combined load test scenario consisted of three different load types, each with a very different profile and a different KPI requirement. They represent the online report generation, and the InfoCube maintenance, necessary to bring new data online for reporting.

� The online users are simulated by queries initiated by the injector tool.

� InfoCube maintenance includes two activities:

– Data load (or upload of data)

– Data aggregation

Both are initiated by SAP Job-Chains.

Upload of data from the ODS into the InfoCubes The objective is to transform new data into the format defined for the reporting objects and to load the data into the target objects. For this scenario we use a pseudo batch, a batch driver spawning massive parallel dialog tasks.

In this load, the data is extracted from the source repository, in this case an ODS, using a selection criteria (DB2 select), processed through complex translation rules (CPU intensive) and then written into the target InfoCube (DB2 insert). This load allows for a wide variety of configuration options—level of parallelism, size of data packets, number of target cubes per extractor, and load balancing mechanism being the most significant. The translation rules for this scenario were extremely complex and represented a worst case in a customer situation.

Figure 1-11 shows the upload process and components. The batch extractor selects the data in data blocks from the source and initiates an asynchronous dialog task to take over the job of processing the block through the translation rules and updating the target InfoCube(s). The dialog tasks can run locally or on another application server in the system.

Figure 1-11 Data load scenario

Aggregation loadThe objective was to aggregate the new data according to rules defined to improve the access efficiency of known queries. For this scenario we used a batch.

The aggregation of the loaded data is primarily database intensive. The roll-up in these tests was done sequentially over 10 aggregates per InfoCube. There is not much configuration or tuning possibility for the aggregate load. The three options available in this scenario were:

1. The type of InfoCube: profitability analysis or sales statistics. This has an effect on the weight of the translation rules defined for the InfoCube type.

2. The number of InfoCubes to use: this was based on the number required to get the KPI throughput. There was not that much in the way of fine tuning.

ODS Extractor

IC

:ICDia

Dia

:

DataTranslation Rules for arget info cube

Source object Batch job extracting data for processing Asynchronous

dialog tasks processing each datablock

Target Info-Cubes


3. The block size used: there was little guidance on this possibility available, so the initial setting was used.

QueriesThe objective is to simulate end user reporting. For this scenario we used online transactions.

The query load consists of 10 different query types with variants that cause them to range over 50 different InfoCubes used for reporting. The queries are designed such that some of them use the OLAP cache in the application servers, some use aggregates, and others go directly to the fact tables. This behavior is detailed in 2.6.2, “The query process” on page 94.

The query load is database focused and database sensitive. Competition for database resources is immediately translated into poor response times. The Query-KPI is the most delicate of the combined load types. Tuning of the queries was restricted to database methods to improve access paths and database buffer quality.

Combined loadFigure 1-12 shows the load distribution of the different load profiles in terms of ratio of physical CPUs utilized on the application servers compared to the database servers. This graph was created using calibration tests of the load types in isolation, and then all the load scenarios together (KPI-A).

Figure 1-12 Combined load distribution

This graph shows that the number of accesses to the database is twice the number of accesses to the application server for the query load process, that the number of accesses to the application server is sixfold the number of accesses to the database for the upload process, and that the number of accesses to the database is twice the number of accesses to the application server for the aggregate load process. For the KPI-A, altogether, these processes provide 2.25 accesses to the application servers when one access is made to the database server.

The combined load scenarios in the tests were to scale from a complex workload, to 5 times this workload (expressed in number of end users).

� In Phase 1, the tests were based on the KPI-A (one System p5 server, and one DS8300 storage server for the database) and partially on the KPI-D (where the database will be

CPU unit ratio: one DB server unit for 2.25 application server units

0

1

2

3

4

5

6

Physical CPUs

Query Aggregate Load KPI-A

Load Distribution Ratio DB-Apps

DBAPPS


distributed across multiple System p5 servers using micro-partitioning, and multiple storage servers).

� In Phase 2, the tests are based on the KPI-D (increase from 20 TB to 60 TB) and another KPI, named KPI-G.

Figure 1-13 shows the increased complexity of the combined load for the full project, Phase 1 (KPI-A) and Phase 2 (KPI-D and KPI-G).

Figure 1-13 The test targets for the full project (Phase 1 and Phase 2)

One important point to note in this test is that the scenario is based on a real-life situation, with several independent geographies and markets to simulate: reporting is a daytime activity, maintenance is a nighttime activity. The objective is never to maintain InfoCubes that are currently active for reporting. Therefore, in the scenarios, there are usage categories for the InfoCubes: those active for reporting (target of the queries), those being loaded, and those being aggregated. In the real-life scenario, InfoCubes would be maintained during off-hours—loaded, aggregated, statistics updated, and then given free for reporting. The throughput requirements for this test represent the windows of time required to do this maintenance.

The limitations of this testUnlike SAP ERP systems (R/3), SAP NetWeaver BI systems are very flexible with regard to customizing data models, data transformation rules, and design of reporting queries. Whereas in R/3 mostly standard transactions are used whose logic yields essentially the same workload in different customer installations since they are only customizable to a certain degree, the processes in SAP NetWeaver BI are highly adaptable to the customers' specific requirements. This applies to:

� The data model of the InfoCubes (and thus the star schemes) and the ODS objects

� The data flow for staging the data within SAP NetWeaver BI (using PSA, staging in (EDW)-ODS)

Combined Load Requirements

5

251525

125

75

2.08

1.25

0.8

0

20

40

60

80

100

120

140

160

KPIA KPID KPIG

Mil

Rec/

Hr

0

0,5

1

1,5

2

2,5

Tnx/

Sec

Aggregation

Load Requirements

Query/Sec

SAP NetWeaver BI installations vary in many aspects


� The data transformation rules for modifying, enriching, and adjusting the data for the InfoProviders

� The design of the queries with regard to:

– The structure of the data that is to be displayed– The complexity of calculated key figures– The amount of data that is to be read– The number of InfoProviders that are accessed (MultiProvider)

� The connectivity to other systems

There is no norm as to how a SAP NetWeaver BI system should be built, and customers' systems look quite different even if similar processes (for example SCM analytics, human resources reporting) are implemented.

There is a great variety of business reporting requirements across different companies, making it necessary to adjust the SAP NetWeaver BI system in many different areas to the customer's needs. Hence, SAP NetWeaver BI installations from customers vary much with regard to the layout of the data model, data flow, and so on.

The system that has been set up for this specific test therefore cannot allow for a one-to-one transferability of the results to other SAP NetWeaver BI installations, for example with respect to data throughput or number of navigation steps per hour. The performance values achieved in the different areas (reporting, data upload, and aggregate rollup) heavily depend on the data model and implemented business logic. Some factors that affect these key figures are specified in Table 1-4.

Table 1-4 Influencing factors that may affect test results

Scenario Influencing factors

Reporting � InfoProvider structure (single InfoCube, homogeneous/heterogeneous MultiProvider, types of InfoProviders used (InfoCubes, ODS, InfoObjects, and so on)

� InfoCube design (number and type of dimensions (line item), number of characteristics

� Aggregates for InfoCubes� InfoCube compression � OLAP-features used (calculated key figures, external hierarchies,

complex structures)� Amount of data in the InfoProviders, amount of data displayed in the

Query

Data loading � Complexity of transfer and update rules (start routines, ABAP™, formulas)

� InfoProvider design (number of characteristics, InfoCube dimensions)� ODS settings (for example BEx-Flag)� Data flow design (staging via PSA, ODS, intermediate InfoCubes)� Customizing of the extractors and InfoPackages (package sizes,

maxprocs, processing within SAP NetWeaver BI)� Type and number of source systems used for extraction (SAP, FlatFile,

DBConnect, Third-Party)

Aggregate rollup � Aggregate definition (number of characteristics, and thus aggregation ratio)

� InfoCube design (number of characteristics, dimensions)� Aggregate complexity (attributes, hierarchy nodes, fixed values)� Structure of the Aggregate hierarchy (with regards to basis aggregates

that support the rollup process)


What was set up in this test for producing the data load is supposed to simulate a comparatively heavy workload in the three areas: reporting (queries), data loading, and aggregate rollup.

� All InfoCubes are updated using comparatively complex (CPU intensive) update rules; we expect the average update logic in typical SAP NetWeaver BI installations to be less complex and workload producing. The throughput numbers (records per hour) denote the number of records written to the fact table of the target InfoCube, the ratio between records read and records written was approximately 3:2, which means one third of the records read from the source are deleted in the update rules. The start routine performs lookups from various master data tables, ODS active data tables and DDIC5 tables and can be considered more expensive than average, producing additional DB accesses. Additionally, only DataMarts are used in the data load scenario. This also means that the whole extractor workload is produced within SAP NetWeaver BI, which is not usually the case in customer installations (where data is extracted in other systems). All these aspects should be factored in when wanting to transfer the throughput figures to other installations.

� The reporting scenario comprises a mixture of OLAP cache and aggregate use which, from our experience, can be considered representative for a typical customer installation: 50% OLAP cache usage, 80% aggregates. However, there are no general statistics on these figures across different customer systems and, of course, these ratios vary considerably in other installations, depending on the degree of optimization for such a scenario. All queries run on MultiProviders using characteristic 0INFOPROV.

� The use of MultiProviders is state-of-the-art and is recommended for performance improvement. Instead of running a query on a large InfoCube containing data from a large time period, it is better to split the data with respect to a time characteristic to multiple InfoCubes and to run the query on a MultiProvider built on these InfoCubes. This allows parallelization of the InfoCube accesses which is in general faster than the access to one very large InfoCube fact table (also check SAP note 6295416).

� The aggregates are structured in a relatively flat aggregate hierarchy (two levels), and the InfoCubes used for the rollup process have 10 and 21 aggregates, respectively. Certainly, the rollup performance depends heavily on the aggregate hierarchy, that is, the size and structure of the source tables (fact tables of InfoCube or basis aggregates). Basis aggregates are mainly used for supporting the rollup process by serving as data source: Aggregates can be built out of other aggregates (of higher granularity) to reduce the amount of data to be read and, hence, to improve the roll-up performance. The aggregate hierarchy is determined automatically. Because of these dependencies, the throughput figures for the rollup process may to a certain extent be transferable to other scenarios.

5 The SAP R/3 system includes two special users in the default installation: DDIC and SAP*. These users are created in clients 000 and 001 with standard names and passwords. The user DDIC (from data dictionary) is the maintenance user for the ABAP dictionary and for software logistics. It is the user required to perform special functions in system upgrades, like SAP, and has special privileges.

6 For SAP search for OSS notes, SAP NetWeaver BI patches, and so on, visit http://service.sap.com/notes


http://service.sap.com/notes

1.2.3 The System p5 configurations used

The initial proposal defined a single p5-595 (POWER5, 1.9 GHz Turbo) for Phase 1 of the project: one LPAR for the application server components and 5 LPARS for the database server activity. The ratio of application server to DB server requirement for the data load throughput targets forced the addition of further application servers. Figure 1-2 on page 8 shows this logical implementation of the baseline installation. This remained the central configuration throughout Phase 1, although additional application servers (CPU capacity ) were added to support the application server requirements of the data loading. The configurations differ between the KPI-A and KPI-D tests.

KPI-A considerationsFor the KPI-A tests:

� The 33 database partitions were distributed over 5 physical LPARs. One LPAR was dedicated to DB2 partition 0, the focus of the client activity. Each of the additional four LPARs each housed eight DB2 partitions. This is detailed in “The DB2 perspective” on page 119.

� A single LPAR was dedicated to the application servers. When the project began it was not clear what the load distribution would be, and not having micro-partitioning available, the best option was to combine the three instances in a single LPAR. The load types were separated into dedicated application servers such that the resource utilization and behavior could be tracked: CI administration and aggregation (named sysxci, where x is the system number), InfoCube load (named sysxbtc), and online reporting or queries (named sysxonl).

Throughout Phase 1, the application server load was divided by using different dedicated instances and network aliases. This allowed the instances to appear as if they were installed on separate servers, but allowed the flexibility of CPU sharing within the LPAR. Once the load distribution was known, it was possible to separate the instances to different servers without any change in the job chains, or monitoring overviews. Figure 1-14 shows the concept on a single Ethernet adapter, both front-end network used for online access, and backbone server network connecting the application servers to the database.

DVEBMSG00, D01, and D02 represent 3 SAP instances: the CI (instance 00), the online (instance 01) and the data loading (instance 02). Each of these has its own network address, which is implemented as an alias on the actual server address. This was done to make the configuration more flexible and to track communication paths.

Important: Keeping in mind that customer SAP NetWeaver BI installations vary in many aspects, the different KPI figures achieved in this test can be taken as performance indicators which can be transferred to other installations to a limited extent if the complexity of the tested scenarios is taken into account. Unlike the SAP Application Performance Standard values achieved by benchmarking SD-Applications in SAP-ERP, the throughput values obtained in this test cannot be considered as standardized figures since they are specific for the implemented scenarios.

The purpose of this test is mainly to show that the SAP NetWeaver BI application together with the IBM infrastructure can handle heavy online application activity in combination with infrastructure workload for a large (up to 20 TB) database, still providing stability and manageability of the solution.


Figure 1-14 Network configuration to connect the application servers to the database

Figure 1-15 shows the configuration used for the combined load tests, KPI-A, from 7 TB to 20 TB. Due to the requirements of the data loading scenario, the application server configuration was expanded by adding Ether channels and CPUs.

Figure 1-15 The SAP server and the database server implementation for KPI-A

en0

en1

52

50

59

52

50

59

DVEBMGS00 Central InstanceAdmin Activity

10.3.13.user network

D02data loadingBatch Activity

sys3ci sys3cip

sys3btc sys3btcp

D01online query load

sys3onl sys3onlp

10.10.10.server network

network.52 is basis network, 59 and 50 are aliases for other instances.

sys3ci

sys3cisys3onlsys3btc Cpu:11

Mem:30GB

sys3ci

sys3as03

sys3as04

sys3as03sys3as05

sys3as04sys3as06

Cpu:32Mem:94GB

Cpu:32Mem:94GB

SAP Servers DB Servers

Cpu:12Mem:25GB

Cpu:10Mem:47GB

Cpu:10Mem:47GB

Cpu:10Mem:47GB

Cpu:10Mem:47GB

sys3db0

sys3db1

sys3db2

sys3db3

sys3db4

10.3.13.5210.3.13.59

10.3.13.50

10.10.10.5210.10.10.59

10.10.10.50

10.3.13.2

10.3.13.4

10.3.13.3

10.3.13.6

10.10.10.2

10.10.10.4

10.10.10.3

10.10.10.6

10.10.10.51

10.10.10.54

10.10.10.57

10.10.10.56

10.10.10.58


Ether channel considerationsThe blue arrows in Figure 1-15 represent the communication focus. The application server clients connect to DB2 partition 0 only. All client-oriented traffic is between the application servers and DB2 partition 0, which has a unique role in the DPF environment and functions as a type of master coordinator for the other instances. All communication between the DB2 partitions is between DB2 partition 0 and the other nodes. The additional instances do not communicate among themselves. Therefore, DB2 partition 0 was implemented first with a 2-card Ether channel7, and then eventually a 4-card Ether channel to handle this communication load. Although it would have been possible to implement a virtual Ethernet channel as a backbone network between the DB LPARs after the introduction of AIX 5L Version 5.3, this was purposely not done so as not to introduce a DB dependency on a single server. This decision was made with respect to the new design requirements for the Phase 2 hardware.

CPU considerationFor KPI-A, to handle the InfoCube load requirements of 25 million records per hour, 64 CPUs were added to the baseline configuration for the application servers. The data translation rules in effect, between the ODS and the target InfoCubes, as defined by customer requirements, are complex and CPU intensive. The CPUs used for load were split into two physical LPARs on separate System p5 servers. Each LPAR housed two SAP instances during KPI-A. This was done to improve the SAP load balancing functionality. Using round-robin load balancing, each instance participated equally in the distribution and the chances of more equal distribution over the physical LPARs was achieved. This is done at the cost of redundant memory for SAP buffer pools and instance-related memory structures required for the second instance.

Seven SAP instances were used:

� The sys3ci SAP server hosted three SAP instances: sys3ci was reserved for administration, the batch instance (sysxbtc) was used for aggregation and load triggering, and the online instance (sysxonl) was dedicated to the query load.

� The sys3as03 SAP server hosted two SAP instances: sys3as03 and sys3as05.

� The sys3as04 SAP server hosted two SAP instances: sys3as04 and sys3as06.

The main effort of the combined load scenario was focused on the InfoCube load design— this scenario had the highest throughput requirements, and the greatest flexibility of load.

The objective was to achieve the most throughput possible with the least load on the database, the database being limited by the specification to a single System p5 in Phase 1. This was to be done while maintaining an approach that could be implemented in a production system.

The first decision was to use dedicated InfoCube load application servers that could be driven to capacity. It would normally not be possible or practical to reserve such hardware capacity in a production system. However, with the possibility of using virtualization in stage 2, reserving these CPU resources would no longer be necessary in the final configuration.

Figure 1-16 summarizes the evolution from the initial proposal to the infrastructure actually installed for the KPI-A tests.

7 Ether Channel is what is referred to in AIX 5L as aggregating multiple Ethernet interfaces (channels) to form a single logical Ethernet interface.

About DB2 partition 0 role and the Ether channel cards configuration issue

Four application servers into two p-595 LPARs


Figure 1-16 KPI-A infrastructure set up

Actually we used three System p servers p595 for 128 CPUs in total, one to host the DB servers in 5 LPARS plus one LPAR with three SAP components (sysxci, sysxonl, and sysxbtc), a second one with one LPAR, hosting two application servers, and a third one with one LPAR hosting two application servers. Two p595s would be enough in terms of number of requested CPUs; for facility reasons we used three boxes.

KPI-D considerationsTo run the KPI-D with a database of 20 TB two application servers for data load were added, and the number of Ether channel adapters in the backbone network was increased. The LPAR hosting DB2 partition 0 was running with a 4 GB Ether channel, and the large application servers, and the other DB2 LPARs were each using 2 GB Ether channels as shown in Figure 1-17.

KPI-A required 8 LPARS with 5 DB servers and 4 application servers

IBM

p59

5 64

CP

U 1

.9 G

HZ

512

GB

LPAR 5 (SAP)

CI

AS1

AS2

LPAR 0 (DB2)partition 0

LPAR 1 (DB2)partition 6 to partition 13




LPAR 1 (DB2)10 CPUs

LPAR 2 (DB2)10 CPUs

LPAR 3 (DB2)10 CPUs

LPAR 4 (DB2)10 CPUs

SYS3CI

LPAR 511 CPUs / 28 GB


SYS3AS03

SYS3AS05


SYS3AS04

SYS3AS06

IBM

p59

5 64

CPU

1.

9 G

HZ

256

GB

IBM p595 64 CPU 1.9 GHZ 256 GB

Initial proposal KPI-A infrastructure

IBM

p59

5 64

CPU

1.

9 G

HZ

256

GB

SYS3CI

SYS3AS02

SYS3AS01

LPAR 0 (DB2)12 CPUs

114

GB

KPI-D required more System p5 servers and more application servers


Figure 1-17 The SAP server and the database server implementation for KPI-D

During the KPI-D tests, the overall memory allocation was modified:

� The load application servers (sys2as3-as6) were set up to 92 GB.

� DB2 partition 0 was set up with 24 GB.

� DB2 partitions 1 to 4 were set up with 116 GB.

The System p5 servers needed to be upgraded from 256 GB to 512 GB.

Figure 1-18 on page 39 shows the hardware evolution from the KPI-A to KPI-D.

sys2cisys2onlsys2btc Cpu:32

Mem:92 GB

sys2ci

sys2as03

sys2as04

Cpu:32Mem:92 GB

Cpu:32Mem:92 GB

SAP Servers DB Servers

Cpu:12Mem:24 GB

Cpu:10Mem:116 GB

Cpu:10Mem:116 GB

Cpu:10Mem:116 GB

Cpu:10Mem:116 GB

sys2db0

sys2db1

sys2db2

sys2db3

sys2db4

10.3.13.6210.3.13.69

10.3.13.60

10.10.10.6210.10.10.69

10.10.10.60

10.3.13.110

10.3.13.133

10.3.13.111

10.3.13.134

10.10.10.110

10.10.10.133

10.10.10.111

10.10.10.134

10.10.10.61

10.10.10.63

10.10.10.64

10.10.10.66

10.10.10.67

sys2as05

sys2as06

Cpu:32Mem:92 GB

Cpu:32Mem:92 GB

10.3.13.112

10.3.13.135

10.3.13.113

10.3.13.136

10.10.10.112

10.10.10.135

10.10.10.113

10.10.10.136


Figure 1-18 KPI-D infrastructure setup

1.2.4 Online test results summary

This section summarizes the results of the query, data load, and aggregation scenarios.

Query resultsThe graphs in Figure 1-19 on page 40 and Figure 1-20 on page 40 depict the goals and achievements at 20 TB. The query throughput was intentionally over-achieved, because the granularity of the load was plus or minus a virtual user in the load tool, and it was felt that it was better to remain conservative. The response time criterion was to achieve the throughput with average response times under 20 seconds: both KPI-A and KPI-D were achieved with large margins. Between KPI-A and KPI-D there was a change in the requirements for the queries, which increased the weight of several queries that directly access the tables. It was necessary to implement a limited number of DB2 statviews to achieve the KPI-D results.

IBM

p59

5 64

CPU

1.9

GH

Z 51

2 G

B

LPAR 0 (DB2)12 CPUs

LPAR 1 (DB2)10 CPUs

LPAR 2 (DB2)10 CPUs

LPAR 3 (DB2)10 CPUs

LPAR 4 (DB2)10 CPUs

SYS3CI

LPAR 511 CPUs / 28GB


SYS3AS03

SYS3AS05


SYS3AS04

SYS3AS06

IBM

p59

5 64

CPU

1.

9 G

HZ

256

GB

KPI-A infrastructure

IBM

p59

5 64

CPU

1.

9 G

HZ

256

GB

SYS3CI

SYS3AS02

SYS3AS01

KPI-D infrastructure

IBM

p59

5 64

CPU

1.9

GH

Z 51

2 G

B

LPAR 0 (DB2)12 CPUs

LPAR 1 (DB2)10 CPUs

LPAR 2 (DB2)10 CPUs

LPAR 3 (DB2)10 CPUs

LPAR 4 (DB2)10 CPUs

SYS3CI

LPAR 532 CPUs / 80GB

SYS3CI

SYS3AS02

SYS3AS01

LPAR 132 CPUs /128 GB

SYS3AS03

IBM

p59

5 64

CPU

1.

9 G

HZ

512

GB


SYS3AS05


SYS3AS04

IBM

p59

5 64

CP

U

1.9

GH

Z 51

2 G

B


SYS3AS06

139

GB

114

GB

Online tests achieved for KPI-A.Another design required to succeed KPI-D


Figure 1-19 Query throughput

Figure 1-20 displays the query response time.

Figure 1-20 Query response time

Data load resultsFigure 1-21 on page 41 depicts the data load targets and achievement at 20 TB on the single server hardware (database on one System p5 and one DS8300 storage server). After an intensive study of the scalability factors which affect the upload design, scaling this load became a question of the number and size of the application servers handling the translation rules.

The KPI-A landscape had 64 CPUs available, and the KPI-D landscape had 128 CPUs available.

0.8

1,29

1.25

1,46

0

0,5

1

1,5

Tnx/Sec

KPIA KPID

Query Throughput Over Achieved

Query RequirementQuery Achieved

Well Under Rsp Time Limit

20

16,1

11,5

0 5 10 15 20 25

Seconds

KPIAKPIDSoW Limit


Figure 1-21 Data load throughput

Aggregation resultsThe aggregation presented the greatest problems in scalability. In the case of KPI-D, the target was not fully achieved; see Figure 1-22. Part of this was due to the way the tests were designed.

In KPI-A and KPI-D, the concurrent cube aggregation was triggered all at the same time. Each of these 2 to 8 InfoCubes (depending on KPI) had the same layout. They began a serial aggregation of three very large and complex aggregates, and then finished with seven much smaller and simpler aggregates.

Figure 1-22 Aggregation throughput

A means of parallelizing the aggregation within an InfoCube, because there is no restricting hierarchy, would have possibly allowed the lighter aggregates to overlay with the complex aggregates and improved the overall throughput. However, this method was not implemented in Phase 1.

With the introduction of DB2 V9, and System p5 with POWER5+ technology for the database server, a major improvement in the aggregation was achieved as shown in Figure 1-23 on page 42. These changes doubled the throughput for aggregation.

2527,5

75 77

0

20

40

60

80

Mil Rec/Hr

KPIA KPID

Data Load Over Achieved

Load RequirementLoad-Achieved

5

6,9

15

14

0

5

10

15

Mil Rec/Hr

KPIA KPID

Challenge in Aggregation

AggregateRequirement

Aggregate Achieved


Figure 1-23 Aggregation throughput improvement

1.2.5 Resource requirements

This section summarizes the resource requirements: CPU and memory usage.

Physical CPUWhile moving though the changes in the landscape, a price/performance chart was maintained to follow the trend in throughput per cost of physical CPU utilized end to end. This chart, shown in Figure 1-24 on page 43, is based on the upload throughput because this consumes the most CPUs, but the data comes from a full combined load.

The yellow trend line shows the number of million records loaded per CPU consumed. An increasing trend is proof of improving efficiency. The bars depict the overall throughput for data load achieved during the run. Note the jump achieved in KPI-A, resulting from the simple move from AIX 5L Version 2 to AIX 5L Version 5.3 and SMT.

5

6.9

14

15

16.7

15

28,6

05

1015202530

Mil Rec/Hr

KPID DBV8 p595+ DBV9P+

Challenge in Aggregation AggregateRequirementAggregate Achieved

About 101 p5-595 CPUs consumed for KPI-A and 174 for KPI-D.


Figure 1-24 Price performance on CPU utilization

Figure 1-25 shows the growth of the CPU usage landscape from the baseline, to KPI-A, and then to KPI-D.

Figure 1-25 Number of CPUs used

Load Throughput vs Cost

01020

3040506070

8090

100

7TB 20TB AIX53 KPID Turbo p595+

Mil

Rec-

Hr

00,050,1

0,150,20,250,30,35

0,40,450,5

Mil

Rec/

hr p

er C

PU

Load ThroughputMil/CPU

LPAR0 (DB2)partition0

LPAR1 (DB2)partition6 - partition13




LPAR5 (SAP)CI 00

APP 01

APP 02

IBM p595 64 CPU

AS01 (sysxas03/sysxas05)32 cpu

KPI-A

CI 00APP 01 BatchAPP 02 Online

Move Apps off.. extend DB to 64 CPUS

CI Batch and Online32 CPUs

Baseline

64 CPUs

Two new LPARs added: AS01 and AS02

KPI-D Two new LPARs added for 2 AS: AS03 and AS04One new LPAR added for SAP CI

AS02 (sysxas04/sysxas06)32 cpu

AS01 (sysax03)32 cpu

128 CPUs 224 CPUs*





* these are the number of CPUs available in the configuration, not the CPUs consumed for the tests


Figure 1-26 shows the physical CPU resources consumed per KPI achieved. This is the sum of the physical CPUs consumed over all application servers and all DB LPARs. This is based on the peak load because the KPI can only be achieved, with the documented throughput and runtime, by having covered the peak load requirements.

Note that “shrink wrapping” had not been done (that is, resources reduced to a high average for price/performance, and the strict KPI requirements done with the limited resources to determine what would be the smallest system that could achieve the KPIs.) In this case, because the target is still a larger KPI (KPI-G), this has not yet been done.

KPI-A (AIX 5L Version 5.3) used 101 physical 1.9 GHz System p5 CPUs, and KPI-D used 174 of the same CPU type.

Figure 1-26 Physical CPUs consumed (1.9 GHz System p5)

Memory requirementsFor the purpose of sizing, the maximum memory utilization must be taken into account, because a memory over-commitment would result in paging and change the response time and throughput behavior considerably. The memory utilization for KPI-A is shown in Table 1-5.

Table 1-5 Memory utilization for KPI-A

For KPI-A, the system utilized between 306 GB and 334 GB of application working storage.

Using NMON, the average and maximum memory utilization is captured. In this case we are looking only at the working storage, not including the client or the file system cache. The

Component Average 7 TB(MB)

Average 20 TB(MB)

Max 7 TB(MB)

Max 20 TB(MB)

Database 186,849 203,262 192,870 206,546

Application servers

123,527 82,098 141,241 99,826

Total 310,106 285,361 334,111 306,372

Physical Processors Consumed

31.18 34.5

70.27

140

0

20

40

60

80

100

120

140

160

180

200

KPIA KPID

APP

DB

For KPI-A, about 200 GB used for database and 80 GB used for application servers


reason for this is that the working storage is the application footprint, and non-computational is volatile and will often expand to fill any remaining capacity. However, the working storage does include the operating system computational requirements.

The application server memory requirement is primarily driven by the data load process which is running in massive parallel mode, and therefore has many parallel user contexts. The size of the data block being processed by each of the parallel processes has a significant effect on the size of the individual user contexts and therefore on the total memory requirement.

In the 7 TB test, a block size of 160,000 rows was selected. For the 20 TB tests, a block size of 80,000 rows was used. This is reflected in the increased memory requirements for the application servers in the baseline (7 TB) statistics.

For the KPI-D achievements, Figure 1-27 shows the amount of real memory configured and that utilized by working storage over all LPARs. The database (green) is using slightly less than 500 GB, and the application servers (blue) are using 380 GB.

Figure 1-27 Memory used for KPI-D

Figure 1-28 on page 46 shows the working storage memory requirement across 16 various KPI-D runs.

For KPI-D, about 500 GB used for the DB and 380 GB used for the AS


Figure 1-28 Working storage memory used for KPI-D

In the last 10 runs, the memory footprint for both the database and the application servers has become stabilized at just under 800 GB.

The throughput is affected by the balance of the components (parallelization), the speed of the processors, and by other factors in the load design. The memory on the database is a result of buffer pool settings and the number of active connections.

The application server memory is influenced by the number of instances on the LPAR and by the level of parallelization. It was discovered, for example, that each parallel dialog process, using a block size of 80,000 records, can consume nearly 1 GB of memory for its SAP user context.

Maintaining the same constant setting for the database, the memory requirement will increase with the further parallelization expected for KPI-G in Phase 2. This will be primarily in the application servers. To further parallelize the load, additional application server resources will be necessary, thus resulting in more client connections to DB2. To utilize the additional resources, more parallel dialog tasks will be started, thus increasing the memory requirement for user context storage.

SAP component balance parallel load: AIX 5L Version 5.2 versus Version 5.3

With AIX 5L Version 5.2, where no hardware multi-threading is available, the best balance was determined with the following process configuration:

� Parallel Dia-Process to SAP Dialog Process: 1.1:1

� Dialog Process to Physical CPU: 1:1

With AIX 5L Version 5.3, using hardware multi-threading, there are two logical processors for each physical CPU. The best parallel throughput takes full advantage of the following process configuration:

� Parallel Dia-Process to SAP Dialog Process: 1.1:1

� Dialog Process to Physical CPU: 2:1 (one process per SMT thread)

Working Storage Memory

0

200000

400000

600000

800000

1000000

1200000

KPID

Gig

aByt

e

APPS-WSDB-WS

Throughput per CPU increases with AIX 5L Version 5.3 (versus Version 5.2)


The increase in the number of parallel dialog tasks will be reflected in the throughput, as well as in the memory utilization. The more parallel dialog tasks there are, the more user contexts will be active simultaneously.

Figure 1-29 shows the load throughput achieved using the same load configuration and hardware for AIX 5L Version 5.2 and AIX 5L Version 5.3. For each, the optimal balance of components was used.

For AIX 5L Version 5.3, the number of dialog processes and the parallelism could be increased. However, attempting the same parallelism on AIX 5L Version 5.2 proved to be counterproductive because the optimal balance could not be achieved.

This comparison shows that the total throughput significantly increased, and that the ratio of throughput per CPU also significantly improved. On AIX 5L Version 5.2, 108.5 physical CPUs were used. On AIX 5L Version 5.3, only 99.8 physical CPUs were used.

Figure 1-29 AIX 5L Version 5.3 benefits compared to AIX 5L Version 5.2

1.2.6 Optimization and tuning options

In this section we discuss parameters that influence the performance of each workload of the combined load. For a discussion of other considerations to keep in mind when tuning the upload process, refer to “Number of packets in a request” on page 49.

Characteristics and tuning options of InfoCube loadThe following factors affect the InfoCube load behavior:

� The number of extractors

An extractor process is a batch job. Extractors are also called “info packets” because they define the relationship between the source ODS, the target cubes, the selection criteria, and the processing method.

This batch job reads from the ODS <packet-size> number of records and uses the qrfc method to launch an asynchronous dialog task to process the packet, while it returns to reading for the next packet.

AIX 5.2 vs AIX 5.3

32277692

43277500

05000000

100000001500000020000000250000003000000035000000400000004500000050000000

aix52 aix53

Tota

l Thr

ough

put

050000100000150000200000250000300000350000400000450000500000

Thro

ugpu

t per

CP

U

total-throughputthroughput/CPU


� The number of target InfoCubes

A single packet read from the ODS can have multiple destination cubes. The dialog task handling this packet will process it though the applicable translation rules for each target info cube.

� The size of the data packets

The size of the data packets affects the speed at which the extractor can “spin off” the dialog tasks, as well as the speed at which the dialog tasks finish their processing tasks (turnaround time).

The size of the packets also affects the memory requirements on the application server handling the dialog tasks. The larger the packet, the larger the user context is for each of the tasks—and therefore the combined application-server memory footprint. If the packets are too small, there is an overhead in housekeeping and monitoring which can lead to serialization of the tasks.

� The number of dialog processes spawned per extractor

The target servers to perform the dialog tasks are defined in a logon group, along with the control quotas for the number of existing dialog processes in each server that is available for asynchronous work. These quotas are intended to protect a server from being overrun by this batch-related work, and reserve available capacity for other online activities.

In our case, as shown in Figure 1-30, the application servers were dedicated to this workload and the objective was to achieve, and sustain, scalability over the entire available CPU resource.

Figure 1-30 Multiple instances per LPAR

Number of dialog tasksThe number of dialog tasks spawned depends on the number of dialog processes that are available for qrfc in the logon group, and the limitation defined for each of the source ODS objects. In the target logon group, each application server is entered and a quota for the

Data Loading - load balancing

In order to avoid any slowdown due to gateway or dispatcher conjestion, and smooth out the dialog task distribution "bursts", 2 SAP instances were installed on both application servers 1G

bit -

sys

3as0

3al

ias-

sys

3as0

5

sys3as03data loadsapgw03


1Gbi

t - s

ys3a

s04

alia

s- s

ys3a

s06



sys3ci_02batch instance

extractor batch jobs

sapgw02initiating gateway

LogonGroup

membersand

quotas


available dialog processes is configured. Each ODS object has a configured limit for the number of parallel processes allowed.

It was expected that the extractor would spawn dialog tasks either to the limit of the available dialog processes, or to the limit of allowed parallel processes for the ODS (a restriction based on a percentage of available resources is more flexible and more secure than a fixed value that only would take effect at the beginning of the load).

In fact, however, during these tests it was discovered that the dialog tasks are limited only by the ODS-relevant settings. If there are too few dialog processes available at the participating login group servers, the requests are queued at the target server until a dialog process becomes available. The user contexts related to these qRFC requests consume significant memory on the target servers, and each initiated qRFC requires a gateway session, from the initiating GW to the target GW.

The result of this, in the tests, was a memory over-commitment causing paging on the application server, an overrun of the dispatcher queue and dispatcher errors at the target server, and an eventual collapse of the initiating GW in reaction to the general instability and time-outs.

The workaround is to carefully balance the ODS parallelization with the available resources on the application servers. This has implications for a real-life scenario in that a loss of an application server can have unexpected results stemming from an unplanned imbalance of parallelization to resource. In addition, there can be no dynamic use of new resources made available after the load has started.

Number of packets in a requestThe number of packets in an upload request is calculated by the number of packets multiplied by the number of target InfoCubes. The smaller the packet size, and the more target InfoCubes there are, the greater the number of packets in a load request. In this version of SAP NetWeaver BI, the status monitoring tools that keep track of the health of a load request must manage each of these packets.

For very large requests, the throughput begins to degrade after a given number of packets. The overhead of managing the increasing request size cannot keep up with the potential throughput and each packet takes longer and longer to complete.

The recommendation is to set up less than 1,500 packets per request.

In order to achieve maximum throughput to a limited number of target cubes, a dual extractor was used. This is simply two info-packets with the same source and destinations, but with different data selection criteria. In this way it is possible to reduce the request size as well, because each extractor is tracked as a separate request.

Number of target InfoCubesIn the combined load scenario, any means of reducing load against the database is beneficial because the database will ultimately be the contention point. In regard to the design of the upload scenario, it is possible to avoid reading the same data to process for different target cubes by using a “one-to-many” configuration.

In a one-to-many design, a single extractor (or info-packet) is defined with multiple targets. Each block that is read and given to a dialog-task for processing is processed and written for each of the target cubes.

This has two benefits in our scenario (and, possibly, in a production environment) because it reduces the read requirement against the database, and reduces the scheduling overhead for


initiating a dialog task per block per target info-cube. In our case it also worked positively on behalf of the load balancing.

Figure 1-31 shows the read/write behavior of the 1:4 configuration used in KPI-A. In this case the input is read once (depicted in the column Read/hour) and written four times to four separate info-cubes (depicted in #Update & # Written).

Figure 1-31 Read once, write many

The charts in this section summarize the calibration tests around the design for the data load. This was the first step in the job-chain design for this requirement. A number of recommendations existed regarding the ratio of ODS to target cubes, and dialog tasks per target (parallelization), but these had never been verified in an environment of this size and capacity. We therefore performed a trend analysis in order to quickly establish a direction to follow and to verify the previous recommendations, as described here.

1. We had to answer the following questions:

– The current environment is using block sizes of 100 KB records. Is this optimal? Would larger or smaller block sizes have a benefit on the overall efficiency of the system or throughput?

– The recommendation for degree of parallelism suggested a limitation of 8 dialog tasks. This represents a challenge to scalability and there was no clear statement of justification for this limit. What would limit additional parallelism?

– Being that there is a benefit in reusing a data block from the ODS to load multiple target cubes, what is the optimal ODS-to-target cube relationship?

To answer these questions we built the graph shown in Figure 1-32 on page 51. This is a graph on two axes. The columns and the left axis look at the throughput achieved per dialog task. The number of active dialog tasks in parallel (depicted by the right axis) was taken from the joblogs of the batch extractor job. This represents the number of dialog requests the extractor actually spawned, but does not really show how many were actually able to get dialog process resources simultaneously. The graph also spans two different block sizes, starting at 160 KB and moving to 80 KB.

Conclusion: reduces the load on the extractor and the database

Reads Once (per extractor)

Writes Many (per target)


Figure 1-32 Trend toward optimal parallelism

2. Next, we tried to find the right relationship between ODS and InfoCubes.

From the point of view of application server load, ignoring the less efficient DB utilization, the best price/performance is a 1:2 model: one ODS for two target InfoCubes.

This can be seen in the second point in Figure 1-32: a high throughput per dialog task. This configuration, however, would increase the database load and reduce parallelization, because we are not able to go beyond 24 to 26 parallel dialog tasks when the extractor has to read a block for each two written (refer to the right axis to see the maximum parallelization achieved).

The next best trend point for throughput was a 1:4 model, because we are able to achieve a parallelism of over 60 dialog processes with a single extractor, with good throughput per dialog task.

An improvement on this is the 1:4 model using 80 KB block sizes. Here we see a drop in parallelism but an increase in throughput per dialog process.

The 1:7 info-cubes model has an even better degree of parallelization, achieving about 64 parallel dialog tasks, and with an even higher throughput per dialog task. (Attempting parallelism beyond this point is counterproductive because there are only 64 SAP dialog processes.)

The trend shows a benefit in smaller block sizes and a high number of target InfoCubes.

3. KPI-A selected the 1:4 model as the best possible design for the data layout in Phase 1. In this case we were limited to the number of target InfoCubes actually available for data load.

The recommended limitation of eight parallel dialog processes per target InfoCube did not appear to be a limit of the target InfoCube, but rather of the number of SAP dialog processes available. In the case of four target InfoCubes, the ratio is 16 per InfoCube. In the case of seven target InfoCubes, the ratio is nine per InfoCube.

Figure 1-33 on page 52 is similar to Figure 1-32, except that it shows the total data load over all the dialog tasks. If we consider that the blue line represents the number of active dialog processes, and therefore is an indication of CPU resource utilization on the application server, we can do a somewhat imprecise cost/performance trend. In this case the blue line is cost, and the column is performance. The trend shows the 4-way at 80 KB blocks to be a good performer, and the 7-way being a better performer.

Throughput per DialogProc

220000240000260000280000300000

1 2 4 5 7 4 7 7

Number of Target Cubes

Inse

rts/D

ia

020406080

DiaP

roc

in

Para

llel

Recs/DiaProcDialogProcActive


Figure 1-33 Price/performance verification

The parallelization ratio for AIX 5L Version 5.2 is 1:1:1 (that is, one dialog-task per SAP dialog process, and one SAP dialog process per CPU).

The trends indicate that the more target cubes there are per extractor, the better the throughput and price/performance. At least this held true up to the seven target cubes tested.

4. At the end we wanted to verify what would be the best scenario: one single extractor, or a dual extractor?

Having determined a trend for the design of the job-chains for data loading, it was necessary to verify the cost effectiveness of this design end-to-end by including the database server utilization as well.

The two scenarios shown in Figure 1-34 on page 53 were tested using the same parallelization capability: the total maxproc setting for the ODS sources. This setting, which controls the number of parallel dialog processes that can be spawned, was set such that in both cases the maximum could not exceed 116 dialog-tasks.

The number of SAP dialog processes in both cases was 160, to ensure there would not be any contention for SAP dialog process capacity.

The cost/performance was determined by the number of records loaded per physical CPU consumed in total on the system end-to-end.

Total Throughput vs Level of Parallel

05000000

10000000150000002000000025000000

1 2 4 5 7 4 7 7Target Cubes

Inse

rts/H

r

020

4060

80

# Ac

tive

dial

og p

roc

Total ThroughputDia-Procs-Active


Figure 1-34 Scenarios with single extractor and dual extractor

Table 1-6 summarizes the results of our tests.

Table 1-6 Scenarios with single extractor and dual extractor - test results

In conclusion, dual extractor is the best performer overall, and the most efficient end-to-end.

Configuration Total throughput PHYC Recs per Hr/CPU

Single extractor 29.8 65.6 million per hour 454 000

Dual extractor 41 74.8 million per hour 548 000

ODSExtractor

ICIC

ICICMaxproc

58

4 InfoCubes

1 Source 2 Batch

Extractor

ODSExtractor

ICIC

ICICMaxproc

58

4 InfoCubes

1 Source 2 Batch

Extractor

Single ODS with dual extractor

ODS Extractor

ICIC

ICICMaxproc

29

4 InfoCubes

1 Source 1 Batch

Single extractor per ODS

ODS Extractor

ICIC

ICICMaxproc

29

4 InfoCubes

1 Source 1 Batch

ODS Extractor

ICIC

ICICMaxproc

29

4 InfoCubes

1 Source 1 Batch

ODS Extractor

ICIC

ICICMaxproc

29

4 InfoCubes

1 Source 1 Batch


Characteristics and tuning of queryThe online reporting load is generated via the injector tool. The injector scripts to simulate online users were designed by SAP to meet the specifications of the customer. The scripts simulate HTML queries.

According to an analysis of the query statistics on the production system, which was performed by the customer, it was observed that 80% of all queries return less than 1000 rows, and another 17% return less than 10000 rows.

This distribution was reflected in the query design for the test. It was agreed that for the KPI tests, that up to 60% of the queries could be satisfied either directly from the OLAP cache in the application server, or from the aggregates maintained in the database aggregate buffer pool. These queries represent typical reports for which aggregates or bookmarks have been prepared. Highly selective reporting will run less efficiently and directly against the database fact tables. 40% of the queries were required to simulate this behavior.

One of the objectives of the customer in selecting this criterion was to duplicate the behavior of long-running queries in production so that a solution that might be found in the test could help also alleviate these problems in production.

Figure 1-35 depicts the test reporting criteria, with access to the SAP OLAP cache being the shortest path, the database aggregate cache the next shortest, and access to the database fact tables being longest.

Figure 1-35 Reporting criteria

The two snapshots produced in Figure 1-36 on page 55 were taken after considerable tuning for the ad hoc queries. Nevertheless, the effect of the caching is very evident. The non-cached queries have much more variance in their response times (6 to 14 seconds) and spend up to 70% of the response time in the database.

ApplicationServer OLAP Cache

sys<n>onl

Aggregates

>=40%

<=60% Reporting

DB F-fact tables


Figure 1-36 Cached and non-cached queries

The only tuning allowed for the queries was the use of DB2 statviews, detailed in Chapter 3, “The DB2 perspective” on page 119. A statview is basically a DB2 mechanism for optimizing the access path and data join for cross-table selects. Its effect is shown in Figure 1-37.

Figure 1-37 Effect of DB2 statviews on long-running queries

Using SAP OLAP Cache

No Cache - direct to fact tables

No DB timefor OLAP cache

Compared to those using OLAP cache, "ab hocs" have erratic rsp times.

Statviews activated, query response time drop in queries to fact-tables


Characteristics and tuning of aggregation and job-chain designAggregates are defined on InfoCubes to improve performance for critical query navigations. The requirements of these navigations are known or predefined to allow an effective aggregate to be built. As new data is added to the InfoCube, the aggregates must be modified to reflex this new data. In general, the aggregates should be much smaller than the data target, and normally contain a compressed version of the InfoCube data in that similar data is consolidated.

A typical customer scenario would take the following steps:

1. New data is loaded into the target InfoCube.

2. A rollup of the new data would be done into the existing aggregates.

3. Data in the aggregates would be compressed to reduce the number of records of similar kind in the aggregate. The smaller the aggregate, the better chance it has of being maintained in memory in the database aggregate buffer pool, thereby improving the response time of queries to this aggregate.

From an SQL perspective, this results in both INSERT and UPDATE statements (depending on the amount of records with the same characteristics combination) compared to only INSERT statement when not compressing the aggregates. Not to compress aggregates would only make sense if single requests are frequently deleted out of the cube.

For the purpose of our test, this scenario takes a slight deviation. This is necessary to prepare a repeating scenario with (as near as possible) identical load; the time to reset to the starting point for a rerun must be feasible. In the case of our test, it is not possible to simulate the entire data cycle, nor was it a prerequisite of the KPI definitions.

The test deviations from the typical cycle are as follows:

1. The data loading is separated from aggregate rollup. It is not intended to simulate a complete data cycle.

2. Our goal is to have a reproducible initial state of all aggregates. So before rolling in a complete data deletion, an initial fill of the aggregates is used instead of rolling up into existing data.

3. A single request is then loaded into 10 different aggregates per InfoCube. Because all used aggregates are empty, there is technically no difference between this rollup and reconstruction. Because no compression of the aggregates or merge with existing aggregate data is done, no updates to the F-fact table data are done, only inserts.

1.2.7 The monitoring tool developed

In order to monitor the resource consumption and load distribution over multiple LPARs, and to be able to compare these metrics to those in future configurations, a special tool was developed. This tool takes NMON statistics from multiple LPARs, combines them, and then summarizes selected metrics as shown in Figure 1-38 on page 57.

Because NMON is “Workload Manager-aware”, components sharing an LPAR were separated in WLM classes. So it was possible to monitor the memory and CPU resources for each class. For example, within each DB2 LPAR there are eight DB2 partitions (with the exception of DB partition 0). The load distribution and resource utilization is monitored for each partition.


Figure 1-38 The monitoring tool developed for the test

In order to be able to compare CPU utilization for KPI runs over a changing server landscape, it was necessary to have a common denominator. The CPU utilization is collected in units of physical CPUs consumed, rather than percentage of CPU utilization. This allows comparisons of different server sizes and also the future comparison to a shared processor pool implementation. The combined information is also summarized for a number of defined metrics, with the objective of showing a change in pattern.

Figure 1-38 shows the SAN adapter utilization; depicting the activity on the SAN during a KPI run helps to recognize a change in pattern. This type of summary data is collected for the SAN interfaces and the backbone network, as well as CPU utilization and memory.

Real time monitoringThe AIX XMPERF graphical monitor is used to monitor the system behavior during the runs. This allows a deviation in any component of the landscape to be recognized quickly so that analysis of the situation can begin while it is happening.

This tool also allows the recording and playback of runs that take place during the night.

Figure 1-39 on page 58 is an example of an XMPERF monitor used during KPI-D. There are five individual monitors tracking a number of different components.

Load distribution over individual DB2 nodes in an LPAR (%cpu util)

PhyC for each LPAR in the system.

Summary of Resource Utilization


Figure 1-39 XMPERF Monitor example

In Figure 1-39, the numbered areas correspond to the following explanations:

� Area 1: This shows the ramp-up of the cube load over the four dedicated application servers. This shows how evenly the load is distributed and the level of the loading. In this case, the four application servers are quite evenly loaded and just entering the high-load phase.

� Area 2: This instrument tracks the traffic on the backbone network between the database servers and database partition 0.

� Area 3: These meters show any paging activity on the DB servers.

� Area 4: This shows the CPU utilization and distribution on the database servers. Green represents the user, red represents IOwait, and yellow represents kernel overhead. Blue background is idle.

� Area 5: These pie charts track the load distribution over the eight database partitions in each of the four LPARs. The first instrument is database partition 0 with only one instrument.

Throughput analyses for KPI-AFigure 1-40 on page 59 shows the output of a tool used for analysis. The figure depicts the overall throughput over the three load categories for KPI-A. The left y-axis is the records per hour throughput of data load and aggregates on a per-cube basis. The right y-axis is the average response time of the query load.

1

2

3

45


Figure 1-40 Throughput analysis

This tool was very successful for the baseline calibration and KPI-A, but became difficult to manage for KPI-D and beyond as the number of load and aggregation requests increased by factor 4. For each, the SAP statistics had to be extracted and updated separately.

This tool also uses a combination of SAP and injector statistics for the graph. Note that the injector statistics include front-end network time which was not part of the KPI, and was also not ideally configured.

1.2.8 Using the System p5 virtualization features: thoughts for the next steps

During Phase 1, load profiling was done for all three of the different loads to determine their CPU utilization and load distribution over the database and the application servers. Figure 1-41 on page 60, from the NMON reports, shows the combined utilization over all the LPARs in the system.

Throughput

0

5.000.000

10.000.000

15.000.000

20.000.000

25.000.000

30.000.000

35.000.000

14:2

5

14:3

4

14:4

2

14:5

1

14:5

9

15:0

8

15:1

6

15:2

5

15:3

3

15:4

2

15:5

0

15:5

9

16:0

8

16:2

5

16:3

3

16:4

2

16:5

0

16:5

9

17:0

7

17:1

6

17:2

4

17:3

3

17:4

1

17:5

0

17:5

8

18:0

7

Rol

lup

& U

p-Lo

ad (R

ecor

ds/h

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00

Que

ry R

espo

nse

Tim

e

Rollup 1 Rollup 2 Upload 1 Upload 2 Upload 3 Upload 4 Upload 5 Upload 6 Upload 7 Upload 8 Response Time

2 Cube Aggregations: Throughput from st03

Query RspTimes: Load Runner8 Load Requests: Throughput from ST03


Figure 1-41 Data load profile

The load profile in this example is the data load. It is very application server-heavy and exhibits very constant behavior. This load type can be designed to use all available application server capacity. Its requirements are easy to predict, and are controlled by the job design and settings.

Figure 1-42 on page 61 shows the profile of a very heavy query load, performed during the calibration tests. (Note that there were a number of differing attempts to profile the query load; this simply shows an example of one attempt.) In general, the load represents a 2:1 database utilization-to-application server load, but it is highly erratic.

Application Servers

Full capacity 32Processors

DB Servers - 10 to 12 physical CPUs


Figure 1-42 Heavy query load profile

The aggregation load, shown in Figure 1-43 on page 62, is also very erratic. The load on database partition 0 in the beginning is thought to be the effect of the heavy aggregation rules for the first three aggregates. The load tapers off for the smaller, lighter aggregates.

The batch load on the application server represents about one-third of the overall load. The aggregation method has the effect of function shipping; the jobs trigger complex SQL requests toward the database, and then wait for the database to return the results. This makes this load somewhat difficult to control because the major activity is taking place in the database, where it is anonymous to any kind of work load control tools. However, the profile of the job overall represents peaks and troughs in CPU activity.

Application Servers for Online

DB servers (not stacked)


Figure 1-43 Aggregation load profile

The combined load looks like Figure 1-44 on page 63. In this figure, the individual dedicated LPARs are placed side by side; no sharing of physical resources is done.

However, you can get an idea of what sharing could be possible in a micro-partitioning environment. The load peaks are shown here in relatively large measurement intervals. Processor sharing takes place at a 10 ms interval. So, there would appear to be significant potential to improve the cost/performance ratio by resource sharing in this combined profile. For stage 2 of the tests, micro-partitioning will be introduced to investigate this theory.

Application Server for batch

DB0

DB Load (not stacked)


Figure 1-44 Combined load (for KPI-D)

For stage 2, we will have five System p5 servers with 64 CPU POWER5+ systems for the database and the application servers. Additional LPARs will be used in order to better separate the various load profiles, and the CPU virtualization capability of the POWER5 systems will be used for dynamic resource sharing. The CPU power will be distributed to the different LPARs by means of a defined resource policy.

The first System p5 will have the following configuration:

� One LPAR for DB2 partition 0 and the SAP Central Instance (CI)� One LPAR for the online activities for the queries � One LPAR for the SAP batch, dedicated to aggregates� One LPAR for the SAP batch, dedicated to the data load extractors� One LPAR for storage agent use by TDP for Advanced Copy Services

The intention is to give to DB2 partition 0 and the SAP CI resource priority. Currently, the intention is to place these on the same System p5. Both the DB2 partition 0 and the SAP CI provide global functionality, which has an effect on their sub-components system-wide.

Therefore, the reaction time of these critical resources has a general effect even though these two components themselves are not normally the high-load focus. Initially, the data load extractors are being placed in an LPAR on the same system at the aggregate batch. This is the only way to prioritize load versus aggregation.

The remaining four System p5 partitions will have the following configuration:

� One LPAR for DB2 with eight DB2 partitions� One LPAR for the online activities for queries� One LPAR for the SAP batches, dedicated to data load � One LPAR for storage agent use by TDP for Advanced Copy Services

4 Loading Application Servers

Combined Application servers for Online and Batch

DB Servers(not stacked)


On each p5-595, a similar policy for managing the CPU capacity will be implemented. The initial idea is to give priority to DB2 and the online activities (query) to guarantee good and constant response time. The aggregates drive the database, which has unlimited priority, but are restricted by the number of actual batch jobs running aggregation.

The data load has two load profiles: on the extractor side, it is batch-oriented; on the load side, it is massive parallel. The initial policy here will be to give priority to the aggregates with a limitation. The data load will have no limitation, but will have the lowest priority. The expectation is that the massive parallel part of the data load will consume any CPU capacity unused by the priority workloads.

Each of the columns in Figure 1-45 represents one of the five System p5 in the stage 2 hardware configuration.

Figure 1-45 LPARs in the Shared Processor Pool per p5-595

Table 1-7 lists the micro-partitioning configurations which will be used at the start of Phase 2. The objective is to share as much as possible, while maintaining strict control of the resources.

Table 1-7 Forecast of the micro-partitioning configurations for the 60 TB tests

At the time of writing, Phase 2 testing was not completed.

LPAR Capacity entitlement(Number of CPUs

guaranteed)

Max number of processors

(Number of virtual processors)

Parity(weight)

DB2 8 42 128

Online 16 22 64

Upload 16 58 16

Aggregate 16 42 32

Storage agent 5 8 0

Upload Upload

DB0

DB2DB1 DB3 DB4

Online OnlineOnline Online Online

AggregatesUploadUpload

PRIORITY1

PRIORITY2

PRIORITY3Controlled by ent, vp,

and priority

Storage AgtStorage Agt Storage Agt Storage AgtStorage Agt

PRIORITY4

Extractors

CI


Chapter 2. The SAP NetWeaver BI perspective

This chapter addresses the project details from a SAP technical expert perspective. It covers the following topics:

� The SAP NetWeaver BI definitions

� The SAP NetWeaver BI architecture

� The database configuration

� The System p5 logical partitions used for the tests

� Comments about our specific environment

� The SAP NetWeaver BI processes

� Information about the tuning options for the upload scenario

2


2.1 SAP NetWeaver BI overview

The SAP NetWeaver BI is an information factory solution, the data warehousing solution developed by SAP AG for SAP-centrix data warehouses. SAP NetWeaver BI can be seen as a complete suite of products that enables data warehousing, with reporting and analysis tools. Predefined models known as Business Content enable rapid startup and customization of prebuilt objects.

SAP NetWeaver BI is a combination of databases and database management tools that are used to support management decision making. SAP NetWeaver BI is an integral component of the company's SAP Business Intelligence group of products. It is the primary component of the SAP Business Intelligence offering.

SAP NetWeaver BI is a comprehensive business intelligence product centered around a data warehouse and optimized (but not limited to) for the SAP R/3 environment to enable analytical reporting. The initial purpose was to develop a reporting server for applications running in the R/3 systems. Today, SAP NetWeaver BI is the central reporting tool for almost all SAP business solutions and can be seen as the information backbone engine for SAP and the SAP landscape.

SAP NetWeaver BI is a pre-configured, integrated solution that is able to combine and summarize data from SAP R/3 applications and other external sources into a management-level database, linking a data warehouse with the R/3 applications based on a uniform business process model. SAP NetWeaver BI is comprised of three main components: Business Information Warehouse Server, Business Explorer, and Administrator Workbench. SAP NetWeaver BI can be used to extract and integrate data from a variety of sources, such as relational databases, spreadsheets, flat files, Extensible Markup Language (XML) and so on.

SAP NetWeaver BI includes preconfigured data extractors, analysis and report tools, and business process models. Among the specific features of SAP NetWeaver BI are: Business Application Programming Interfaces (BAPIs) that enable connections to non-R/3 applications; pre-configured business content; an integrated Online Analytical Processing (OLAP) processor; automated data extraction and loading routines; a metadata repository; administrative tools; multiple language support; and Business Explorer, a Web-based user interface.

SAP NetWeaver BI supports industry standards such as XML, XML for Analysis (XMLA), OLE-DB for OLAP, Common Warehouse Metadata Interchange (CWMI), the ABAP programming language, Java™ 2 Platform Enterprise Edition (J2EE™), and JDBC™ interfaces.

Figure 2-1 on page 67 illustrates the SAP NetWeaver BI components.

SAP NetWeaver BI allows extraction and integration of data for business


Figure 2-1 The SAP NetWeaver BI components

2.1.1 The SAP NetWeaver BI information model

The SAP NetWeaver BI information model is a basic structural element in the SAP NetWeaver BI architecture. It supports conceptual layers of data warehousing such as:

� Data warehouse, to hold the data that has been integrated from the business processes across the enterprise

� Operational data store, to hold current data loads and updates from the operational transaction systems of the business

� Multidimensional models, which enable the views of the data required for the analysis

The information model is based on a fundamental building block known as an InfoObject, which contains metadata that describes the data contained in the object such as data type, field length, or business definitions. SAP NetWeaver BI contains several thousand InfoObject templates.

In addition to InfoObjects, the BW information model contains the following key elements:

� DataSources

Data is transferred into SAP NetWeaver BI in a flat structure, and DataSources are the flat data structures containing data that logically belongs together. DataSources are a table rather than a multidimensional data structure. They contain the definitions of the source data, and are responsible for extracting and staging data from various source systems.

� Persistent Staging Area (PSA)

In the SAP NetWeaver BI information model, data is physically stored in Persistent Staging Area (PSA) objects. PSA objects are collections of flat tables holding extracted data that has not yet been transformed. The PSA is the initial storage area of data, where requested

BusinessInformationWarehouseServer

On Line Analytical Processing

Administrator Workbench

BAPIBAPI

OLAP processorOLAP processor

Staging EngineStaging Engine

InfoCubesInfoCubes

ODSODS

Business Explorer

BAPIBAPI

client

Web Reporting 3rd Party Tools

DataSources

PSAPSA

Meta DataRepositoryMeta DataRepository

Understand the key SAP NetWeaver BI objects.

Chapter 2. The SAP NetWeaver BI perspective 67

data is saved unchanged from the source system according to the structure defined in the DataSource.

� InfoSources

InfoObjects that belong together logically, from a business point of view, are grouped into InfoSources. InfoSources are metadata that describe a relationship between the data fields extracted from external source systems and SAP NetWeaver BI InfoObjects. They can represent both transactional data (online and volatile data like the number of ordered pieces) and master data (like the piece number or a customer address).

� Operational Data Store (ODS)

ODS objects describe a consolidated dataset from one or several InfoSources. Data in ODS objects is stored in flat, transparent, database tables. ODS object data can be updated into InfoCubes or other ODS objects using a delta update. Data in an ODS object can be analyzed with the SAP NetWeaver BI Business Explorer (BEx) tool, provided by SAP BI.

� InfoCubes

From a reporting point of view an InfoCube describes a self-contained dataset, for example, of a business-orientated area. This dataset can be accessed by the SAP NetWeaver BI Business Explorer for reporting and Online Analytical Processing (OLAP) analysis.

Technically speaking, an InfoCube is a quantity of relational tables arranged according to the multidimensional star schema: a logical fact table surrounded by several dimension tables that establish the relation between the transactional data in the fact tables and the master data.t

InfoCubes are containers that organize data around its multidimensionality, in terms of business dimensions. They consist of characteristics to facilitate the analysis from various business perspectives, and they are used to answer complex business questions.

� RemoteCubes

A RemoteCube is an InfoCube whose transaction data is managed externally rather than in SAP NetWeaver BI. Only the structure of the RemoteCube is defined in SAP NetWeaver BI.

� InfoProviders

InfoProviders are objects that can be analyzed are called InfoProviders; they include all the data targets: InfoObjects, InfoCubes, ODS and master data tables.

� MultiProviders

MultiProviders are virtual information providers; they are used to combine data from various objects and make it available for reporting and analysis. A MultiProvider itself does not contain any data.

� InfoSet

An InfoSet describes data sources that are defined as a rule as joins of DataStore objects, standard InfoCubes and InfoObjects. An InfoSet is a BI-specific view of data. InfoSets allow you to report on several InfoProviders by using combinations of master data-bearing characteristics, InfoCubes and ODS objects. The information is collected from the tables of the relevant InfoProviders.

In our test environment, there were two InfoCubes, each having a different structure. They were chosen to ensure that a complex scenario of intensive DB and application server workload could be demonstrated while stressing the system with many simulated users online.


2.1.2 The SAP NetWeaver BI functions and technologies

This section defines the main technologies used by SAP NetWeaver BI. For a more detailed breakdown of the key features of SAP NetWeaver BI, refer to IBM Redbook Building and Scaling SAP Business Information Warehouse on DB2 UDB ESE, SG24-7094.

The extended star schemaTo define and create InfoCubes, SAP NetWeaver BI is based on the extended star schema, a technique that organizes data in terms of data facts and business dimensions. This database design has a central database table, known as a FactTable, which is surrounded by dimension tables.

� The FactTable stores key figures (for example, fixed costs, number of employees, revenue), and the dimension tables store the characteristics needed for evaluating and reporting (for example, a period or a region).

� The dimension tables are independent of each other; they contain references to the tables that contain master data; they are linked to the FactTable using unique keys, one key per dimension table; and each dimension key uniquely identifies a row in the associated dimension table.

� The SAP NetWeaver BI extended star schema is implemented with two FactTables: the F-fact table used as a staging fact table, and the E-fact table which contains the final, consolidated data.

Figure 2-2 describes the extended star schema.

Figure 2-2 The extended star schema

The extended star schema is the physical representation of a SAP NetWeaver BI InfoCube. An InfoCube can have up to 16 dimensions in SAP NetWeaver BI.

Understandthe SAP NetWeaver BI technologies.


The extract, transform, and load processWhen data is loaded into SAP NetWeaver BI, it is integrated, standardized, synchronized, and enriched: this is performed through processes known as extract, transform and load (ETL). These processes ensure that the full range of required data is loaded and that the multiple formats and data types are understood. In SAP NetWeaver BI, it also serves as the staging area for intermediate data storage.

SAP NetWeaver BI has a fully featured ETL toolset, which enables standard ETL routines to be built and linked together as process chains. These process chains can be scheduled to run as either time-based or event-based processes. SAP NetWeaver BI can work with IBM Websphere DataStage™1 and other products to enable very large volumes of data processing, if required.

The monitoring of scheduled jobs can be managed through a Graphical User Interface (GUI) front-end that allows users to see successful, failed, and running jobs, along with detailed statistics about how a job has processed its load.

A data modelling tool is available to build new objects, and can build InfoCubes, infosets, MultiProviders and so on. This tool can make use of existing business content and expand upon that.

A simple but useful metadata management toolset exists to view data structures and provide logical views of star schemas and data flows.

The Online Analytical Processing engineThe Online Analytical Processing (OLAP) engine is the functional component of SAP NetWeaver BI that processes the query. Infocubes are accessed by this component. OLAP splits the request into several database queries. The system then looks for the best possible aggregate for each of the database queries and generates SQL statements. The SQL statements are executed on the underlying database system; it resolves potential hierarchy restrictions, consolidates the results of the executed SQL statements and sends this information back to the Business Explorer.

Multidimensional OLAP (MOLAP) systems store data in the multidimensional InfoCubes. Relational OLAP (ROLAP) systems store data in the relational database.

Business Intelligence SuiteA number of options to deliver data to users or other applications are available. Web-based spreadsheet (an Excel® tool known as Business Explorer) reporting is available, as well as various options to extract data to flat files, XML, and so on. A Web application design tool is an integral part of the software suite; it allows the front-end to be heavily customized in order to deliver information in a variety of formats that a business may require. Integration to predesigned portlets can be built, with portal integration features available within the tool.

SAP NetWeaver BI specialtiesSAP NetWeaver BI is somewhat different from a classical data warehouse. Because its history lies with SAP R/3, it is seen as independent of the underlying DB technology (which brings many challenges), and the physical schema associated with the star schema is complex. In particular:

� SAP NetWeaver BI generates complex SQL queries which must be optimized “on the fly” (they cannot be bound to a particular access path).

1 For more information about Websphere DataStage, visit: https://www-306.ibm.com/software/data/integration/datastage/


https://www-306.ibm.com/software/data/integration/datastage

� The SQL generator can sometimes use in excess of 60 tables in one SQL statement, thus making the explain plans very complex, and the use of specific data warehouse capabilities such as “star joins” unlikely.

� SAP NetWeaver BI, in general, does not exploit specific database functionality. For example, all new data added to a table is nearly always managed by inserts rather than DB2 load utilities, thus making very heavy workloads on the DB server in terms of transaction logging.

� Although SAP NetWeaver BI can be used with external ETL tool vendor software (such as Ascential™ or Informatica), many clients choose to use the SAP internal features or build customized ABAP scripts. This can lead to scenarios that heavily stress SAP NetWeaver BI's application server. Therefore, in this IBM Redbook, we focus particular attention on the application design of the solution around this area.

2.1.3 The SAP NetWeaver BI architecture summary

Figure 2-3 shows an overview of the SAP NetWeaver BI architecture and SAP NetWeaver BI data flow. SAP NetWeaver BI is flexible and allows a variety of techniques to load data into InfoCubes. Data can flow via a PSA, which can be used to transform and cleanse data as in any data warehouse.

Transactional data (the detailed atomic data that holds facts about the business) can be held here in its raw stage, along with the reference data that allows the transactional data to be analyzed in a variety of ways that are useful to the business. This second type of data is known as master data within SAP NetWeaver BI.

Data can be loaded from source systems (such as SAP R/3, flat files, or virtually any external DataSource) into the PSA. Both transactional data and master data are stored in the PSA. SAP NetWeaver BI can process master data into specific, independent master data tables that are shared across all InfoCubes, and transactional data can be loaded into ODS objects or directly into InfoCubes.

Figure 2-3 Simple schematic of BW application layer

The SAP NetWeaver BI data flow.

SAP BW System

FlatFiles

SAP R/3SAP APOSAP CRMSAP BBP

ExternalProvider

OLAP Processor

Meta Data

InfoCubes

ODS

PSA

Master Data

SourceSystems

UserInterface


ODS objects are simply flat tables and generally hold very detailed data. They are often used as staging areas themselves to seed InfoCubes, or used as means to integrate different areas of data (sales, stock and so on) into secondary ODS objects to preprocess data before loading to InfoCubes.

For example, it may be that complex update rules (transformations) can be managed in the ODS layer once, and then reused several times into many InfoCubes rather than having to run the update rules for each individual InfoCubes. ODS objects can also be used for simple SAP NetWeaver BI queries, such as for getting information about a single invoice. Finally, ODS objects may be used as input data store for other SAP NetWeaver BI or external system (by use of the infohub capability now in SAP NetWeaver BI).

SAP NetWeaver BI InfoCubes are multi-dimensional data structures represented by multiple tables in a relational database (SAP sees this as an extended star schema). They are designed to isolate the master data from the dimensional models and potentially offer good performance.

As with other data warehouses, online analytical processing (OLAP) is the primary use of the SAP NetWeaver BI. When executing a SAP NetWeaver BI query from the user interface, the query is transferred to the SAP NetWeaver BI application server, where it is processed by the OLAP processor. Based on the SAP NetWeaver BI meta data, the OLAP processor generates SQL statements that access the InfoCube, ODS, and master data tables. It then returns the SAP NetWeaver BI query result to the user interface.

2.2 SAP NetWeaver BI solution configuration: the logical views

IBM has for many years been developing the Global Services Method2, which is an approach to design for many different IT-related solutions. One of these approaches is known as the Business Intelligence Method.

The Business Intelligence Method uses several generic work products, along with specific BI work products, to deliver a robust and well-tested method to develop a BI solution. Within this methodology, the BI Reference architecture is used to describe the major functional components of a Business Intelligence solution.

The IBM Business Intelligence reference architecture is a layered logical interpretation of an Enterprise Data Warehouse (EDW). From SAP NetWeaver BI3.1 onward, SAP now has a solution that supports all the components necessary to build an EDW. Figure 2-4 on page 73 shows how the key objects (master data, ODS, info objects) support such an architecture.

2 For more information, visit: http://w3-5.ibm.com/services/emea/3emgs.nsf/pages/GSMHomepage


http://w3-5.ibm.com/services/emea/3emgs.nsf/pages/GSMHomepage


Figure 2-4 The SAP NetWeaver BI key objects architecture

In our case, not all SAP NetWeaver BI components were used for the KPI tests. Our primary goal was to test loading, aggregate builds, and query throughput individually and, when run in parallel, to identify any bottlenecks at either the infrastructure or the application level.

The high level logical component model, shown in Figure 2-5 on page 74, is comprised of the following:

� SAP NetWeaver BI connects via HTTP for Web- based front-ends or TCP/IP connections to the thick client SAP GUI. From here, all connectivity to the DB server is managed by the Web application server. One or more application servers were used during the tests to scale the solution appropriately.

SAP SourcesSAP RetailSAP FinanceSAP HR…

Staging areaCleaningTransformation…

Operational systems ETL Layer Warehouse layer

SAP Master Data

SAPODS

SAPODS

SAPODS

SAPODS

Propagationlayer Analytics Access

SAP BWMarts Client

Load data from operational systems into staging area.This is a copy of operational data.SAP uses its own extractors based on ALE or third party ETL tools

Known as Persistent Staging Area in SAP, it is a transient data store whose primary purpose is to decouple operational systems from the BW environment

ODS is a set of relational tables that can be constructed to deliver the function the business requires.It can consist of several layers that could relate to System of Record, Distribution, Feedback and external.

Process to standardize data, cleanse and prepare for loading to ODS area

SAP BW marts, known as InfoCubes, are star schemas that are relationally based. Many default schemas exist.Source of OLAP analyssisfor SAP

Standard reporting tool is BEx, can use third party tools.SAP can deliver through PC, browser, portal, or mobile devices.

BEx can report at all levels within the SAP environment, in particular the InfoCube and ODS layer. On rare occasions, PSA reporting may be done for near real time analysis.


Figure 2-5 The logical architecture

� A three-tier configuration was deployed (thin client, application server(s) and DB server) to deliver results to the end user. However, to simulate many users working in parallel, an external load tool was used.

This tool allowed front-end mouse clicks (and more) to be captured, and for scripts to be developed from these. A set of scripts that could generate complex query load was developed to enable long-running tests to be run, either on their own or in conjunction with data load jobs (built in process chains).

� We used SAP NetWeaver BI Version 3.5, based on customer preference and on the relative newness of SAP NetWeaver 2004s BI (which is also known as SAP NetWeaver BI 7.0). Although this choice limited using some of the newer functions available in DB2, it was considered important to retain a close match to the customer scenario to ensure the testing was valid.

� SAP NetWeaver BI connects to the DB server via the DB2 client and uses TCP/IP to communicate over the network. DB2 used the Data Partitioning Feature (DPF), a specially licensed feature to distribute a large database over multiple partitions (logical or physical), utilizing a shared-nothing architecture.

DPF can be beneficial both to standalone SMP servers and to environments consisting of more than one server. The DB2 component is described in Chapter 3, “The DB2 perspective” on page 119.

� Storage was managed via a large SAN (DS8300), capable of holding up to 64 TB of data. Tivoli and the backup components of the solution are covered in “The storage physical environment” on page 175 and “Using IBM Tivoli Storage Manager to manage the storage environment” on page 193.

Figure 2-6 on page 75 shows which components are part of the different tests:

� The query tests imply mainly the client interface and the SAP NetWeaver BI data marts.

� The aggregate build tests involve mainly the SAP NetWeaver BI application servers.

� The data load tests imply mainly the warehouse and the propagation layers.

Loadrunner

Injectors

Loadrunner

Injectors

SAP GUI

Loadrunner

Injectors

SAP WASNODE

Database Storage Database Server

SANFabric

Node 1Log

TempData+Index

Node N+1Log

TempData+Index

Node NLog

TempData+Index

DB2 Node 1

DB2 node N

DB2 Node N+1

Application Server

Front-endAccess tools


Figure 2-6 SAP NetWeaver BI components used for the test scenarios

2.3 SAP NetWeaver BI database configuration

It was planned that several tests would be run, including online querying of the database while data loads were taking place. The layout simulated how data would be laid out in real life to enable multiple geographies to be run from a single DB2 instance.

Initially, the data was spread over 6 partitions in a relatively simple manner. The plan was to move this to 33 partitions to enable better parallelism where appropriate, and offer a better way of managing multiple geographies from a single site. The initial database partition layout is shown in Figure 2-7.

Figure 2-7 Original table layout for SAP NetWeaver BI

PSA, temp logs, and so on are not shown here because they were not part of the migration effort to move data to more nodes; they could simply be recreated at a later date. The database was migrated to a more distributed model that used 32 nodes and was closer to the recommendations as carried out by the BI best practices.

Warehouse layer

SAP Master Data

SAPODS

SAPODS

SAPODS

SAPODS


SAP BWMarts Client

Aggregate build test

Data load test

Query test

Warehouse layer

SAP Master Data

SAPODS

SAPODS

SAPODS

SAPODS


SAP BWMarts Client

Aggregate build test

Data load test

Query test

You need to move from6 DB partitionsto 33 DB partitions.

Dim tablesMaster dataBW temp tables

ODS Objects

InfoCubes

Aggregates

Partition 0 Partition 1 Partition 2 Partition 3 Partition 4 Partition 5


The BI best practices are derived from a set of goals defined by discussion with customers over a number of years. These are:

� A balanced system design provides a scalable performance ratio of disk I/O, memory, CPU and network.

� Stability.

� Scalability.

– Proven TPCH and SAP NetWeaver BI benchmarks.– Inherent scalability of shared nothing architecture.

� Fault tolerance.

� Industry-standard components.

� Minimize solution permutations.

� Modular growth at a predictable cost.

� Ease of installation and implementation.

Simple rough sizing guidelines - 150 GB active data per CPU.

This led to the development of the Balanced Configuration Unit (BCU), which is a standard set of components that offers best practices utilizing a proven building block methodology on top of a fully validated, all-IBM solution; see Figure 2-8.

Figure 2-8 The balanced configuration unit design

For more details about the BCU, visit:

http://www-306.ibm.com/software/data/db2bi/tab_bcu.html

The goal of our project was to merge these best practices for DB2 BI with the client’s current practices. The aims were to:

� Ensure minimal impact while the change to the new design was taking place.

� Ensure the investment in existing technology could be built upon.

� Retain applicable existing SAP BI best practices.

BPU BPU BPU BPU

DB2BPU BPU BPU BPU

DB2 DB2 DB2 DB2

DB2 DB2 DB2

Memory

CPU CPU CPU CPU

CPU CPU CPU CPU

AIX

BPU BPU BPU BPU

DB2BPU BPU BPU BPU

DB2 DB2 DB2 DB2

DB2 DB2 DB2

Memory

CPU CPU CPU CPU

CPU CPU CPU CPU

AIX

BPU BPU BPU BPU

DB2BPU BPU BPU BPU

DB2 DB2 DB2 DB2

DB2 DB2 DB2

Memory

CPU CPU CPU CPU

CPU CPU CPU CPU

AIX

BCU 1 BCU 2 BCU n

Data Warehouse



However, based on BCU best practices, the solution needed significant changes to the initial architecture:

� The disk subsystem must be isolated from other applications. Without this arrangement, experience has shown that contention between applications and the data warehouse results in disks suffering “hot spots” and impacting overall performance of the solution.

� All the LPARs are configured as 8-way nodes. This ensures that the operating system and file systems are less stressed.

� Increase the number of DB2 partitions to ensure a similar ratio of disk-to-CPU as that used by the BCU. The chosen configuration caters for growth and enables the capacity of the machines to be doubled from the initial starting point.

The migration process is covered in detail in 3.5, “The process for redistributing the DB2 partitions” on page 145. Figure 2-9 describes the final tablespace layout after data migration.

Figure 2-9 The final tablespace layout

2.4 DB2 partitions and LPAR evolution

As described in Figure 2-10 on page 78, initially five LPARs were created to support the 33 DB partitions created for DB2:

� LPAR 0 was used to support the coordinator node,

� Each remaining LPAR (1 to 4) was supporting 8 partitions of data.

Dim tablesMaster dataBW temp tables

ODS Objects

InfoCubes

Aggregates

Partition 0 Partition 1 Partition 5 Partition 6 Partition 37

Empty

ODS and InfoCubes distribution in DB partitions and LPARs.


Figure 2-10 LPARs initial configuration

� We started with data partitions that had 4 GB to 5 GB of storage per CPU, based on previous experiences. LPAR 0 had a greater ratio (9 GB per CPU) because of the additional workload of managing the connections and coordinating the workload across all the other partitions. We retained 8 CPUs and 36 GB of RAM to cater for any additional testing or expansion that may be needed.

� LPAR 5 was created to support the SAP NetWeaver BI central instance (CI) and application servers. We allocated 30 CPUs to this LPAR, based on the number of dialogs the system could run in parallel during batch load. Essentially each data packet would use a single CPU, which meant we could run up to 30 dialog processes in parallel.

A single ODS was laid out over 12 DB partitions spread over 3 (of the 4) LPARs available, as shown in Figure 2-11 on page 79. This figure illustrates how the first 4 ODS objects are laid out across all the partitions (and LPARs); the remaining 4 simply continue the same pattern, which ensures an even balance of data across all partitions. For example, you find the ODS01 in LPARs 1, DB partitions 1-3-5-7, LPAR2 DB partitions 1-3-5-7, and LPAR3 DB partitions 1-3-5-7: three LPARs and 12 DB partitions.

Note: Later iterations of the tests changed the LPAR configurations, and these changes are detailed in 1.2, “The execution of the project - a technical summary” on page 26. Figure 2-10 on page 78 describes the starting point for the initial set of tests.


4 CPUs36 GB

LPAR 1 (DB2)partitions 1 to 8

8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB

LPAR 5 (SAP)CI / App1/ App2

30 CPUs48 GB

Syst

em p

595

64 C

PUs

256

GB


4 CPUs36 GB


4 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


8 CPUs36 GB


30 CPUs48 GB


30 CPUs48 GB

Syst

em p

595

64 C

PUs

256

GB


Figure 2-11 The ODS layout

A single InfoCube was laid out over 8 DB partitions spread over all 4 data LPARs, as shown in Figure 2-12.

Figure 2-12 The InfoCube layout

In this way, all 100 InfoCubes (and their indices/aggregates) could be spread over all the data LPARs.

When reviewing the size of InfoCubes, we identified that no single InfoCube was greater than 250 GB, so it was decided that distributing the data over 8 data partitions would be sufficient. By using two data partitions per LPAR, we also ensured that the distribution of work over all LPARs was balanced.

This also produced a ratio of approximately 1.33 CPUs per DB2 partition, which was considered to be a good rule of thumb for the 1.9 GHz CPU. Each partition contained no more than approximately 580 GB of data and was distributed evenly over all LPARs. The partitioning key used by SAP NetWeaver BI for each InfoCube is automatically generated and gave a good spread of data over the partitions used for that InfoCube.

Figure 2-13 on page 80 shows the distribution of data over all 33 partitions (0 for the coordinator partition, 6-37 for the data partitions).

Partition1 Partition2 Partition3 Partition4 Partition5 Partition6 Partition7 Partition8

LPAR1

01

03

02 01 02 01 02 01 02

03 03 03


LPAR2

01

04

02 01 02 01 02 01 02

04 04 04


LPAR3

01

04

03 01 03 01 03 01 03

04 04 04


LPAR4

02

04

03 02 03 02 03 02 03

04 04 04

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR1

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR2

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR3

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR4


Figure 2-13 Data distribution among partitions

Finally, a MultiProvider was created for a set of InfoCubes. This enabled two levels of partitioning: one at the database, and one at the application layer. DB2 level partitioning can break tables over table spaces which are held over a number of partitions, enabling parallelism, and SAP NetWeaver BI can use many InfoCubes together, which are viewed as a MultiProvider. The MultiProvider simply enables SQL statements to be run against each InfoCube in a MultiProvider in parallel.

Figure 2-14 on page 81 shows how a MultiProvider can be used with InfoCubes to exploit both types of partitioning.


Figure 2-14 The multicube layout

This example shows how a MultiProvider can be built over 6 InfoCubes to report across them. This scenario enables SAP NetWeaver BI to build one query against the MultiProvider, but will launch 6 parallel SQL statements against the underlying InfoCubes, assuming enough dialog processes are available (in our environment, we restricted all queries to run against one InfoCube in order to have a better control of the number of query executions).

Each of the 6 InfoCubes is partitioned at the database level. That means that the fact tables inside the respective star schema, which physically represents an InfoCube, are partitioned. In this case we spread each InfoCube and its aggregates across 8 DB partitions which are supported by 2 LPARs.

2.5 The profile of InfoCubes and the population process

We used the SAP program Sap_Infocube_designs to check the structure of the InfoCubes to ensure they had a “good” profile (a good profile means that the dimension tables were small against the fact table row).

As shown by the report in Figure 2-15 on page 82, there was only one dimension table (indicated by T in the figure) that did not meet these criteria. It was considered that this one dimension alone was not a sufficient reason to change the structure of the InfoCubes, and it was left “as is” for the duration of the tests.

Multicube

11 12 13 14 15 16

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR1

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR2

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR3

C1

C97

C2

C98

C1

C97

C2

C98

C3

C99

C4

C100

C3

C99

C4

C100


LPAR4

6 InfoCubes make up 1 multicube. Each InfoCube has its own aggregates

Each set of InfoCubes and aggregates are spread across all 32 DB partitions and exploit all LPARs

The processto populate 4,35 billion rows of data.


Figure 2-15 Checking the InfoCube design

Each InfoCube contained between 43 and 48 million rows of data. These were loaded from the ODS objects in three requests. The initial load been small, to seed the Fact table and rollup aggregates. The initial plan was that the next two requests would contain data created from the original small load such that, when rolled up, the aggregates would simply be updated rather than additional rows been created.

However, during the population of the second request, it became clear that the data was not in this form, and new aggregate rows were created. We left the solution “as is” because it was decided that this was a more realistic model of growth of the system, and because the time to roll back all the aggregates and repopulate the aggregates would have been inhibitive.

Figure 2-16 illustrates the basic flow of data for the population process.

Figure 2-16 Basic flow of data for population process

TT

"F" Fact Cube - req 3

"F" Fact Cube - req 2

"F" Fact Cube - req 1 "E" Agg 1

"E" Agg 2

"E" Agg X

ODS Object - request 1




Each InfoCube was populated from an ODS object. When each request is managed, the data is loaded into an InfoCube with the appropriate dimensions created for that InfoCube. Each SAP NetWeaver BI InfoCube physically contains two fact tables, the 'F' fact table and the 'E' fact table.

� The 'F' table was designed for fast data loads. The primary key retains the request identification (request id) as part of the key to allow data to be rolled out easily if identified as invalid.

� The 'E' table was designed for querying. When loaded from the 'F' table, the request ID is removed and duplicate rows merged.

The load was designed, via process chains, to load 20 million records within a request into each 'F' table. The aggregates were then built and compressed (that is, these records were compressed from the 'F' aggregate tables and into the 'E' aggregate tables). This process then truncates the 'F' aggregate tables, leaving no data within those tables.

InfoCubes were populated in batches to ensure that enough log and temp space were available to complete them. Once complete, each InfoCube and its aggregates held the number of rows shown in Figure 2-17.

Figure 2-17 Number of rows in aggregates - InfoCube A

There were 25 InfoCubes of type A (out of 50 InfoCubes in total), used for query reporting which had filled and activated aggregates. Thus, we had approximately 48 million*25 rows across the fact tables and 62 million*25 rows across the aggregate tables, for a combined total of 2.75 billion rows.

Figure 2-18 Number of rows in aggregates - InfoCube B

Cube AZGTFC065

Rows in 'E' Table Rows in 'F' Table Agg Number Rows in 'E' aggregates Rows in 'F' aggregates102485 /BIC/E102485 90,896 /BIC/F102485 0

0 48,680,468 102487 /BIC/E102487 81,408 /BIC/F102487 0102489 /BIC/E102489 3,290,616 /BIC/F102489 0102491 /BIC/E102491 71,944 /BIC/F102491 0102493 /BIC/E102493 1,830,328 /BIC/F102493 0102495 /BIC/E102495 264,336 /BIC/F102495 0102496 /BIC/E102496 4,152,752 /BIC/F102496 0102498 /BIC/E102498 3,932,712 /BIC/F102498 0102500 /BIC/E102500 2,731,320 /BIC/F102500 0102502 /BIC/E102502 2,778,128 /BIC/F102502 0102504 /BIC/E102504 91,964 /BIC/F102504 0102507 /BIC/E102507 3,246,496 /BIC/F102507 0102508 /BIC/E102508 3,924,872 /BIC/F102508 0102510 /BIC/E102510 4,057,312 /BIC/F102510 0102512 /BIC/E102512 3,496,592 /BIC/F102512 0102514 /BIC/E102514 1,191,632 /BIC/F102514 0102516 /BIC/E102516 3,262,672 /BIC/F102516 0102518 /BIC/E102518 3,889,536 /BIC/F102518 0102520 /BIC/E102520 6,145,912 /BIC/F102520 0102522 /BIC/E102522 5,043,656 /BIC/F102522 0102525 /BIC/E102525 8,669,608 /BIC/F102525 0

Totals 0 48,680,468 62,244,692 0

Cube BZGTFC064

Rows in 'E' Table Rows in 'F' Table Agg Number Rows in 'E' aggregates Rows in 'F' aggregates

0 43,056,546 102466 /BIC/E102466 6,832 /BIC/F102466 0102468 /BIC/E102468 164 /BIC/F102468 0102470 /BIC/E102470 140,152 /BIC/F102470 0102472 /BIC/E102472 139,752 /BIC/F102472 0102473 /BIC/E102473 224,936 /BIC/F102473 0102475 /BIC/E102475 297,592 /BIC/F102475 0102477 /BIC/E102477 604,656 /BIC/F102477 0102479 /BIC/E102479 5,570,632 /BIC/F102479 0102481 /BIC/E102481 7,241,784 /BIC/F102481 0102483 /BIC/E102483 7,488,696 /BIC/F102483 0

Totals 0 43,056,546 21,715,196 0


As shown in Figure 2-18 on page 83, we also used 25 of this type of InfoCube (type B) for reporting, again having their aggregates filled and activated. This resulted in a combined total of 43 million*25 rows in the fact table and 21 million*25 rows across the aggregate tables, which yields a combined total of 1.60 billion rows.

2.6 SAP NetWeaver BI processes

In order for the BW scalability benchmark to provide any meaningful and re-usable results, the specific processes defined for the benchmark needed to be based on an existing productive implementation. In this way, we could verify that the results obtained in the benchmark matched the present experience. This would also give credibility to the benchmark results achieved when scaled from 20 TB to 60 TB in Phase 2 (which is not documented in this book).

Keep in mind, however, that the specific processes benchmarked here only represent a subset of the processes that a production system would be subject to. Nevertheless, we believe these processes either define the core processes, or processes that may be exposed to potential bottlenecks.

The following processes were benchmarked:

� The upload process

In this process, data is uploaded from ODS to target InfoCubes. On its way, it is subjected to expensive transformation rules implemented in the start routines.

� The query process

In this process, the BW system is subject to users running queries.

� The rollup process

In this process, aggregates are updated with new uploaded data from the underlying InfoCubes.

2.6.1 The upload process

The upload process reads data from one or more ODS sources and inserts the data into one or more InfoCubes. The data read from ODS is usually subjected to transformation rules before being uploaded into the respective InfoCubes.

Figure 2-19 on page 85 describes the separate steps taken by the upload process. In our example, we assume a “1: 2 pipe”; that is, a single source ODS and two target InfoCubes.

Benchmarking the core SAP NetWeaver BI processes: the upload, the query and the rollup.

The BI background management functions.


Figure 2-19 The upload process steps

The process flow must be defined in a process chain in order for it to be run in a SAP NetWeaver BI system; this process chain is depicted in Figure 2-20. Note that the process chain depicted fulfills the requirements for the tests run during the benchmark, but may not fulfill the requirements for a production customer.

In terms of best practices, a production customer would drop indexes before uploading and recreate them after the upload. Dropping indexes before the upload allows for a much faster upload.

Figure 2-20 The upload process chain

When a process chain is scheduled, a BGD_trigger process3 is started. This process represents the start block in the process chain, and it triggers the subchains. This process runs against an instance defined when the process chain is scheduled. In our example, the instance used was sys1btc. Transaction SM37 provides the information shown in Figure 2-21 on page 86.

3 To learn more about he background processes, visit: http://help.sap.com/saphelp_erp2005vp/helpdata/en/38/4f6e420c48c353e10000000a1550b0/frameset.htm

ODS: ZGTF0004

1 BGD_extract

x DIA_submit

IC - PAZGTFC026

x DIA_upload

1 BGD_extract

x DIA_submit

IC - PAZGTFC028

x DIA_upload


http://help.sap.com/saphelp_erp2005vp/helpdata/en/38/4f6e420c48c353e10000000a1550b0/frameset.htm

Figure 2-21 Upload process: example step 1

Subsequently, the BGD_trigger process starts two BGD_loading processes, which represent the two subchains in the process chain. At this point, the BGD_trigger process changes its state to Finished. The BGD_loading processes run against the same user-defined instance as the BGD_trigger process. Transaction SM37 provides the information shown in Figure 2-22.


The BGD_loading processes each start a BGD_extract process, which runs for the remainder of the upload process. At this point, the BGD_loading processes change their state to Finished. The BGD_extract process runs against a server defined in the ROIDOCPRMS table. Transaction SM37 provides the information shown in Figure 2-23.


The main role of the BGD_extract process is to mine data from the source ODS object and submit it to a DIA_submit process. The DIA_submit processes have very short runtimes. The DIA_submit processes run under a single user, which is RFC_BIWEB8S. This user remains logged on until the end of the upload process. To see these users, use transaction AL08.

The DIA_submit process starts another DIA_update process via RFC and transfers the package data. The new DIA_update process runs on one of the instances provided in the logon group defined in transaction SM59. In our example, the logon group used is EB8PRBW102.

DIA_update processes apply transformation rules and update a single data package for all target InfoCubes in sequence. The DIA_update processes runs under a single user, which is RFC_BIWEB8. Transaction SM66 shows these users in action.

After a request has completed and been uploaded to an InfoCube, the aggregates will be rolled up if automatic rollup has been switched on.

Parameters affecting the upload processThe parameters described in these sections have a profound impact on the way the upload process runs.


Note that the parameters we specified are only a subset of the potential parameters; that is, the ones which were meaningful in our environment. Other parameters may be more meaningful in other environments.

� RFC-related parameters: rdisp/rfc_min_wait_dia_wp

This parameter represents the quota for the number of dialog work processes that should be kept free for users. When no more dialog work processes are free, no resources are given to the calling program.

� DIA process and packet sizes

The following parameters are available in tables ROOSPRMS and/or ROIDOCPRMS, as illustrated in Figure 2-24.

Figure 2-24 DIA process and packet sizes parameters

In particular:

– MAXPROCS: specifies the maximum number of parallel DIA_update processes.

– MAXSIZE: the individual records are sent in packages of varying sizes in the data transfer to the Business Information warehouse. Using these parameters, you determine the maximum size of such a package and therefore how much of the main memory may be used for the creation of the data package—especially the memory requirement of the updating dialog work processes in the BW system.

Defining the right number of processesDefining the number of processes required for the upload process requires an understanding of the relationship between the following objects or processes, as explained here:

� InfoPackage to ODS and InfoCube

The InfoPackage defines a relationship between source and target objects of an upload process. It defines what sources can send information to what targets, or in our case, what InfoCubes can be uploaded with data from which ODS objects.

The InfoPackage is always related to a single DataSource (ODS Export DataSoure object), but may allow several target InfoCubes. Furthermore, it defines restrictions on the data that is extracted from the DataSource using selections on source fields. The InfoPackage is used in process chains and allows us to specify the parameters for the upload.

� ODS-to-InfoCube

The relationship of ODS-to-InfoCube is determined by the InfoPackage. In our test scenario, the InfoPackage definition allowed for a maximum ODS-to-InfoCube relationship of 1:7; we speak of a “1 to 7 pipe”.

Note, however, that in a production environment this ratio is rarely this high and a 1:1 or 1:2 ratio is much more common.


� ODS-to-BGD_extract

For every InfoPackage and, respectively, ODS source, a single BGD_extract process will be started. The ratio ODS-to-BGD_extract is therefore always 1:1.

If a process chain contains multiple InfoPackages, then the total number of BGD_extract processes started will be equal to the number of InfoPackages and ODS sources, respectively.

� CPUs-to-DIA_update

After a DIA_update process is started, it will consume an entire CPU until it has completed its processing of the update rules and has submitted the data to the database. This process is very CPU-intensive and clock speed is crucial in terms of overall turnaround time.

The maximum number of parallel DIA_update processes that can be started is specified by the MAXPROCS parameter in table ROOSPRMS and/or ROIDOCPRMS. If we assume a 1:1 relationship of DIA_update processes to CPUs, then MAXPROCS is set to the number of available CPUs.

The number of available CPUs is in relation to the logon group. For example, if we have two application servers with 32 CPUs each, then assuming a 1:1 relationship, the maximum number of CPUs available to the MAXPROCS parameter is 64.

With a 1:1 relationship and a configuration of 2 application instances with 32 CPUs each, we have observed average CPU utilization rates of around 90%. In order to drive utilization even higher, it is possible to increase the relationship of DIA_update processes to CPUs to 1: 1.1, for instance.

Note, however, that driving the utilization to levels exceeding 90% may result in memory or CPU bottlenecks, and an overall adverse effect. In our tests we mainly used a 1:1 relationship.

Also note that, depending on how fast the BGD_extract can provide the packages, there will be up to MAXPROCS DIA_update running simultaneously.

Finally, we need to point out that the SAP instance profiles have to be configured with sufficient DIA processes.

� DIA_submit-to-DIA_update

There is no one-to-one relationship between the DIA_submit and DIA_update processes. However the DIA_submit processes are only short-running compared to the DIA_update processes, and are only triggered by the extractor if the total number of DIA_update processes for the running request is lower than MAXPROCS.

To calculate the number of DIA_update processes and BGD_extract processes required, and to set the MAXPROCS parameter, we use the following formulas:

– MAXPROCS = (Application Server CPUs/Number of InfoPackages in the process chain) * Ratio

– N DIA_update = MAXPROCS + rdisp/rfc_min_wait_dia_wp

– N BGD_extract = Number of InfoPackages in the process chain

ExampleIn this example, we assume two application servers with 32 CPUs each, and apply a 1:1 ratio DIA_update processes to CPUs. Figure 2-25 on page 89 depicts the process flow.


Figure 2-25 Upload process flow - Example

Figure 2-26 shows the process chain for this process flow.

Figure 2-26 Upload process chain - Example

In our example, MAXPROCS needs to be set to 32 for ODS ZGTFCO01 and ZGTFCO03 if we want to use all application resources available. The absolute minimum number of DIA processes required per application server(s), assuming a value of three for rfc_min_wait_dia_wp, is therefore 35. In terms of the BGD_extract process, a minimum of two batch processes would be required on the executing system.

We used the following numbers:

� MAXPROCS = ((32+32)/2) * 1 = 32

� N DIA_update = 32 + 3 = 35

� N BGD_extract = 2

IC - PAZGTFC002

ODS: ZGTFC0011 BGD

_extractx DIA

_submitx DIA

_upload

ODS: ZGTFC0031 BGD

_extractx DIA

_submitx DIA

_upload

IC - PAZGTFC004

IC - PAZGTFC008

IC - PAZGTFC010

IC - PAZGTFC029

IC - PAZGTFC033

IC - PAZGTFC035

IC - PAZGTFC037


Defining the data to be loadedThe data selected by the BGD_extract process is definable in the InfoPackages (accessible also via transaction RSPC). For the unit tests in our benchmark environment, we limited the number of records read from the ODS object in order to reduce the runtimes from 5 hours to 1 hour. Figure 2-27 depicts the limitation in records read by specifying specific timeframes.

Figure 2-27 Limiting the number of records

At subsequent phases in the benchmark, we used this technique extensively to drive significantly higher throughput volumes by way of parallelization. For instance, the process chain depicted in Figure 2-28 represents six 1:6 pipes from a technical point of view. From an Infopackage or business point of view, however, we are still looking at three 1:6 pipes. By parallelizing the workload we potentially doubled the throughput capacity (provided, of course, that we do not stumble on any bottlenecks).

Figure 2-28 A process chain with 1:6 pipes

Monitoring the upload processMonitoring the upload process requires an understanding of the types of information available through SAP and other monitoring tools. In addition, it requires an understanding of the phases that make up the overall upload run.


The upload process consists of four main phases:

� Phase one represents the BGD_extract process reading data from the ODS.

This phase is marked by significant activity on the DB servers, mainly due to the reading of data from the ODS object table. At this point there is no activity on the application server instances defined in the logon group, and only a very small amount of activity on the instance on which the BGD_extract processes are running.

� Phase two represents the ramp-up phase and is defined by the DIA_submit processes handing over data packets to the DIA_update processes.

As more and more data packets are handed over (up to the amount set by MAXPROCS), the CPU utilization on the application servers increases steadily. Note that the steepness of the ramp-up curve is dependent on the number of BGD_extract processes, in that parallel BGD_extract processes can fetch data from ODS in parallel and therefore speed up the allocation of available DIA processes.

MAXSIZE, which refers to the number of records read by the BGD_extract process, also has a significant impact in that a smaller value will result in this process handing over data to subsequent DIA_submit processes and an increased pace.

� Phase three represents the core of the upload phase.

During this phase, the DIA_update processes apply the transformation rules to the data packets received and submit the updated data to the DB. If the system has been well configured, then this phase is extremely CPU-intensive on the application instances defined in the logon group. With a 1:1 ratio of DIA_update processes to CPUs, an average utilization of 90% and higher is easily achieved. The load on the application server supporting the BGD_extract and DIA_submit processes is negligible.

Note that these observations are specific to this benchmark and may not apply to other implementations.

� Phase four represents the ramp-down phase.

In this phase, the number of DIA_update processes reduces as the last data packets are churned and as a result the CPU utilization, steadily decreases.

You can monitor the upload process in either of two ways:

� By looking directly into the table RSDDSTATWHM, which provides the source data to ST03 RSDDSTATWHM

� By using transaction ST03

In the following sections, we describe these monitoring methods in more detail.

RSDDSTATWHMTo view the data in RSDDSTATWHM, call transaction SE16 (the SAP data browser) and enter RSDDSTATWHM. In the selection screen you can specify, for instance, the name of the InfoCube you are monitoring to see what requests were successfully uploaded. An example is shown in Figure 2-29 on page 92.


Figure 2-29 Checking the number of loads

The details of your selection are listed in Figure 2-30 (this is only a subset of the information available).

Figure 2-30 Details of number of loads

From the information in the table depicted, you can determine the number of records uploaded for a particular request, as well as determine the source ODS, start time, overall duration and so on.

Note that unsuccessful uploads are not recorded in this table. An unsuccessful upload can be determined via transaction RSA1 by managing the data target. Figure 2-31 on page 93 provides an example of rollup that would not be recorded in RSDDSTATWHM.


Figure 2-31 Checking the number of unsuccessful loads

ST03Transaction ST03 provides the same information as RSDDSTATWHM, but it is presented in a different (and possibly, more intuitive) way. To view upload information through ST03, start transaction ST03 and change to the Expert user mode. From the BW System Load tree menu, select Last Minute's Load and specify a time frame larger than the actual upload Job (that is, a starting time earlier than the job start time and a finishing time later than the job finishing time). Finally, select Load Data from the Analysis Views tree menu.

Figure 2-32 shows an example of the resulting information.

Figure 2-32 Using ST03 to checking the number of loads

From this figure, you can determine what data was uploaded in the specified time frames, from which ODS to which InfoCube, and how long it took.

Keep in mind that, regardless of which method you chose, it is imperative that the upload is successful and in the case of using ST03, that the time frames against which you query span


the entire upload job. This also presents a difficulty with reporting, in that the entire upload phase indirectly determines the KPI time frames.

2.6.2 The query process

Queries are what SAP NetWeaver BI is all about. SAP NetWeaver BI provides a data warehouse framework against which queries are run. From a user point of view, a SAP NetWeaver BI system is “good” if the query response time is “low”. Good in this contest depends on SAP NetWeaver BI being optimally configured in terms of ODS, InfoCubes and aggregates; and low implies anything under 20 seconds (although this is quite subjective).

Introducing the query processIn order to induce a query-load on the system, injectors were used. These injectors, which are systems that simulate users in executing queries against SAP NetWeaver BI, have been preconfigured with templates of queries suitable for the benchmark, or more specifically, with queries that simulate a load currently experienced on the customer productive system. Figure 2-33 provides a high level logical overview of the infrastructure involved with the injectors.

Figure 2-33 Infrastructure for queries injection

The following physical hardware configuration was used:

� Injectors controllers: IBM Thinkcentre workstation, Pentium® IV 3.4 GHz, RAM 2 GB, HD 70 GB.

� Injectors: 10 x IBM xSeries x330, dual Pentium III 1 GHz, RAM 1 GB, HD 20 GB.

The following software was configured (note that all operating systems of load generators and controllers ran the latest service packs and patches):

� For the ten generators: Windows 2000 Advanced server + SP4 + latest patches.

� For the two controllers: Windows XP + SP2 + latest patches.

� The same tool was used for both the generators and the controllers.

Defining the queriesIn order to simulate a workload that would represent a customer's environment, ten queries were defined with different target behaviors. However, not all queries are equal in terms of the number of records scanned and returned.

Queries simulated from customer experience using a load generator tool.


In practice, the majority of queries will scan a relatively small number of records and return an even smaller number of records. Some ad hoc queries, however, have the potential of scanning a large number of records, which potentially has a significant impact on the overall system performance and response time.

To simulate the query weight, the following definitions apply:

� 80% of queries returning less than 1,000 rows while scanning 1,000 - 90,000 rows. Queries 1, 2, 3, 4, 5, 6, 7, 8 are affected.

� 20% of queries returning between 1,000-10,000 rows while scanning 20,000-900,000 rows. Queries 9, 10 are affected.

Level of aggregationQueries based on aggregated data will induce considerably less load on the system and will have much improved response time. However, it is not feasible to base queries solely on aggregated data. The following behaviors have been defined to reflect this:

� 40% of queries 1, 2, 3, and 4 are based on aggregates of 0-1.2 million rows.

� 20% of queries 5 and 6 are based on aggregates of 3-6 million rows.

� 20% of queries 7 and 9 are based on aggregate of 7-9 million rows.

� 20% of queries 8 and 10 are running on fact table.

CacheCache plays an important factor in terms of query response times. Information available in cache does not need to be read from disk, which is a factor slower than memory-resident cache.

In practice, the cache hit ratio is very much dependent on the size of cache as well as the variety and number of queries run. We make a further distinction between OLAP cache (which resides with the SAP NetWeaver BI Application Servers) and the database cache.

Also note that in terms of our test, the query load would be triggered first and left to run for a while in order to warm up the cache. Subsequently, the rollup and upload jobs would be triggered. The following behaviors have been defined to reflect this:

� 50% of all queries hit OLAP cache (parameter defined).

� No cache for queries 2, 4, 6, 8 and 10 (OLAP CACHE switched OFF).

� OLAP cache mode 1 for queries 1, 3, 5 and 7.

� OLAP cache mode 4 for query 9.

� DB cache hit ratio: 99% on fact, 99.5% on aggregate.

The cache mode determines whether and in what ways the query results and navigational states calculated by the OLAP processor as highly compressed data are to be saved in a cache. Cache data can be held in main memory, distributed to an application server or in a network. The option that you choose depends on various parameters. These include:

� How often the query is requested

It is recommended that you save queries that are requested particularly frequently in the cache. The main memory cache is particularly fast, but restricted by its size. Swapping cached data does cancel out limitations on the main memory, but simultaneously affects performance.


There are practically no limitations on memory space available in the database or in the file system for the persistent cache. Accessing compressed data directly in the persistent cache is also beneficial in terms of performance.

� The complexity of the query

Caching does bring advantages in terms of performance particularly with more complex queries, because evaluating these requires a higher level of effort. It is recommended that complex data processed by the OLAP processor is held in the cache.

� How often data is loaded

Using the cache is barely advantageous if query-relevant data is often changed and therefore has to be loaded frequently. The cache has to be regenerated every time. If cached data is held in the main memory, data from frequently called-up queries can be displaced. Subsequently your callup takes more time.

The following cache modes are available:

� Mode 0: the cache is inactive. The cross-transactional cache is deactivated.

� Mode 1: main memory cache without swapping

The cache data is stored in the main memory. When the cache memory has been exhausted, excess data is removed according to the LRU algorithm (that means, deleted from the memory). When a query is subsequently requested, the result is read from the InfoProvider once again.

� Mode 2: main memory cache with swapping

The cache data is stored in the main memory. When the cache memory is used up, the excess data is written to a background and can be loaded back into the cache memory when making another request.

� Mode 3: persistent cache per application server

The cache data is stored persistently as a database table, or as a file in a directory attainable from the application server.

� Mode 4: cross-application server persistent cache

The cache data is stored persistently as a cross-application server database table or file in a file system in the network, accessed from the application server. In this mode there is no displacement of data and no restriction on memory size. More space is required, but this method also saves time.

– Database table: strain is put on the database instance for the table.

– File: strain is put on the network node operating system for the file.

In addition, the application server that calculates the data and network communication has to be taken into account with both storage types.

Query overviewBased on the query behaviors defined in the previous sections, Table 2-1 provides an overview of the ten defined queries.

Table 2-1 Queries summary

The cache plays an important role for query response time.

Query number

MultiProvider Query Template Cache mode

Aggregate name

Records read

Records returned

1 ZGTFCMP01 Z60_CCS_A Z60_CCS_A 1 101480 15.777 6

2 ZGTFCMP01 Z60_PLR_A Z60_PLR_A No 101480 19.959 61


This describes the queries based on the MultiProviders ZGTFCMP01 and ZGTFCMP02.

Query variablesA set of 10 scripts executes each query with random customer-definable variables. An overview of a subset of these variables is provided in Appendix B, “Query variables” on page 287.

Monitoring queriesMonitoring queries requires an understanding of the types of information available through SAP and other monitoring tools, but also requires an understanding of the phases that make up the overall rollup run.

In contrast to the upload process, the query process does not really exhibit distinct phases apart from a start, runtime, and stop phase. The runtime phase requires most of our attention in terms of monitoring.

Monitoring of the query phase was achieved in our environment through the transaction ST03 or injectors scripts.

ST03To view query process information through ST03, start transaction ST03 and change to the Expert user mode. From the BW System Load tree menu, select Last Minute's Load and specify a time frame larger than the actual upload Job (that is, a starting time earlier than the job start time and a finishing time later than the job finishing time). Finally, select Query Runtimes from the Analysis Views tree menu and switch to the Average Times tab.

Figure 2-34 on page 98 provides an example of the query results at the MultiProvider level.

3 ZGTFCMP02 Z60_YTD_A Z60_YTD_A 1 101490 16.026 42

4 ZGTFCMP02 Z60_STR_C Z60_STR_C No 101512 9 7

5 ZGTFCMP01 Z60_RIG_A Z60_RIG_A 1 101477 89.769 60

6 ZGTFCMP02 Z60_YTD_B Z60_YTD_B No 101488 20000 400

7 ZGTFCMP02 Z60_STR_A Z60_STR_A 1 101487 35.844 205

8 ZGTFCMP02 Z60_STR_B Z60_STR_B No FACT 630 3

9 ZGTFCMP01 Z60_CCS_B Z60_CCS_B 4 101508 410.203 3.530

10 ZGTFCMP02 Z60_STR_D Z60_STR_D No FACT 40.000 2.000

Query number

MultiProvider Query Template Cache mode

Aggregate name

Records read

Records returned


Figure 2-34 Monitoring queries with ST03; MultiProvider level

At query level and sorted, we obtain the information as shown in Figure 2-35.

Figure 2-35 Monitoring queries with ST03; query level

According to the data in these diagrams, the average query response time across all queries is 9.4 seconds.

2.6.3 The rollup process

After the InfoCubes have been populated with new data as a result of the upload phase, this data needs to be rolled up into the aggregates. The actual term for the new data that needs to be processed is a request.

Rolling up data in effect adds data to an aggregate, or updates existing data with relevant new information. For instance, if a report is run at the end of each week to determine the number of units sold for that a particular week, then there are two main approaches to this:

� Either all relevant data for that week is read by a query from an InfoCube and distilled to present the relevant data

� Or an aggregate can be called upon to very quickly and efficiently provides the same data

From a systems performance point of view, the use of aggregates in queries considerably increases query response times and reduces overall system load. A well-defined aggregate makes a query much more efficient by basing it on aggregated information, as opposed to data fetched from InfoCubes or ODS objects.

On the other hand, defining the right aggregates requires a good understanding of the queries that will use the relevant aggregate. And equally important, any changes to the source data (for example, InfoCubes with information relevant to a certain aggregate) require

Aggregates influence query response time and system load.


that aggregate to be rebuilt. Traditionally, aggregates are rolled up as part of a nighttime batch process.

Figure 2-36 depicts the roll-up process and its accompanying process chain.

Figure 2-36 The roll-up process; formal representation

This schema can be represented as in Figure 2-37.

Figure 2-37 The roll-up process; descriptive representation

When a process chain is scheduled, a BGD_trigger process is started. This BGD process represents the start block in the process chain and triggers the subchains. This process runs against an instance defined when the process chain is scheduled. In our example, the instance used was sys2btc. Transaction SM37 provides the information presented in Figure 2-38.

Figure 2-38 The trigger process

Subsequently, BGD_roll processes are started for each subchain of the process chain. At this moment, the BGD_trigger process changes its state to Finished. The BGD_roll processes run against the same user-defined instance as the BGD_trigger process. Transaction SM37 provides the information presented in Figure 2-39 on page 100.


Figure 2-39 Results provided by the SM37 transaction

The runtime for the BGD_roll processes is short because there was no new data that required rolling up in this example. In practice, however, the rollup phase may take many hours. In our tests, 7 hours was not unheard of; but keep in mind that this is very specific to each implementation.

Defining the data to be rolled upSimilar to the upload phase, we can specify what data is to be rolled up. Unlike the upload phase, however, there are only two variables that can (or need) to be specified: the number of requests that you do not want rolled up, or only those requests that were loaded a certain number of days ago. Figure 2-40, taken from transaction RSPC, depicts the options described.

Figure 2-40 Options for the rolled-up process

Figure 2-41 on page 101, obtained through the transaction RSA1, depicts the status of requests. In this case, all requests have been rolled up.


Figure 2-41 Status of requests

The aggregate treeDepending on the implementation, a single InfoCube will be the source for numerous aggregates. In Figure 2-42, we can see the aggregates and their technical names as they have been defined for a specific InfoCube (in this case, InfoCube 30). This information can be retrieved through transaction RSA1.

Figure 2-42 The aggregate tree - Example 1

In some cases, aggregates are dependent on other aggregates, creating a hierarchy or aggregate tree. As a result of this hierarchy, aggregate rollups are sequential jobs. This can be clearly seen in the details of the job logs. For example, in Figure 2-43 on page 102, aggregate 101946 cannot be rolled up before aggregate 101985 for InfoCube 30.



In Figure 2-44 we have provided the sequence in which these aggregates were rolled up, as well as the elapsed time. Parallelization could be achieved to a certain degree if the specific sequence is known. Unfortunately, this functionality is not available in SAP NetWeaver BI 3.5 and therefore CPU clock speed remained our strongest ally.


Monitoring the rollup processMonitoring the rollup process requires an understanding of the types of information available through SAP and other monitoring tools. It also requires an understanding of the phases that make up the overall rollup run.


In contrast to the upload process, the rollup process does not really exhibit distinct phases apart from a start, runtime, and stop phase. It is the runtime phase that requires our attention in terms of monitoring.

Monitoring of the rollup phase is achieved through transaction ST03. To view rollup process information through ST03, start transaction ST03 and change to the Expert user mode. From the BW System Load tree menu, select Last Minute's Load and specify a time frame larger than the actual upload job; that is, a starting time earlier than the job start time and a finishing time later than the job finishing time.

Finally, select Aggregates from the Analysis Views tree menu. Figure 2-45 provides an example of the aggregated results at the InfoCube level.

Figure 2-45 Rollup process monitoring - the Analysis View

From this data we can determine which InfoCubes have been rolled up, the time elapsed, and the number of records involved. In order to get a more detailed view, and preferably at aggregate level, we change the aggregation level to Aggregate. This provides the information as shown in Figure 2-46; this information has been sorted to the data relevant to InfoCube 30.

Figure 2-46 Rollup process monitoring - the Aggregate view

Note that ST03 will not report on InfoCubes where not all aggregates have rollup jobs (in other words, where a request has not entirely been serviced).


2.7 Load distribution methods

We needed to test first what distribution method was most suitable to succeed in our tests and two load distribution methods were tested:

� Best: the server for which the best quality was calculated is used for the next logon.

� Round robin: the load will give rotational to each server.

In this section we describe the implementation of these load distribution methods for the dialog work processes for the upload scenario. We also explain why we chose to use the round robin load distribution method.

2.7.1 The Best load distribution case

The Best load distribution method implies the following steps:

� A first recommendation was to change the value rdisp/autoabaptime from 60 seconds to 15 seconds. This means that every 15 seconds, a program known as SAPMSSY6 will run. This program collects the alert values and performs profile checks.

� The dialog processes for the upload should run on the application servers as03 and as04. We assigned the transaction SMLG with these two application servers to the Logon Group RFC_BW, as shown in Figure 2-47.

Figure 2-47 The Best load case: configuration (1/2)

� We changed the table RZLLICLASS for the classname RFC_BW, where RFC_BW is the logon-group for the application servers as03 and as04.

� We changed the classname RFC_BW to the distribution type Best Performance, through the two parameters TIMERED and FACTYPE, as shown in Table 2-2.

Table 2-2 Best performance parameters set up

Figure 2-48 on page 105 illustrates the table RZLLICLASS after the changes.

The round robin load distribution method was chosenfor the KPIs.

Parameter: TIMERERD 120 → 15 Logon Group Time Interval in seconds

FAVTYPE R → B BEST


Figure 2-48 Best load case: configuration (2/2)

The process chain described in Figure 2-49 has been created.

Figure 2-49 Best load case: the process chain

� For the tests we adjust the table ROOSPRMS for ODS 1 and 3 with the following values, shown in Figure 2-50:

– Packet size (MAXSIZE) to 80000

– Number of processes (MAXPROCS) to 36

Figure 2-50 Best load case: parameters set up


The NMON shown in Figure 2-51 indicates that the CPU utilization for sys3as03 and sys3as04 is very unbalanced: this was not a good situation for our tests.

Figure 2-51 Best load case: NMON report

2.7.2 The round robin load distribution case

The round robin load distribution method implies the following steps:

� We changed the value rdisp/autoabaptime back from 15 seconds to 120 seconds (actually for round robin, this functionality is ignored). We changed this value because we did not want to raise problems if too many requests needed to go to the message server.

� To switch the load distribution from Best to round robin, we changed the table RZLLICLASS for the classname RFC_BW (RFC_BW is the Logon-group for the application servers as03 and as04), and we changed the classname RFC_BW to the distribution type round robin as shown in Table 2-3.

Table 2-3 Round robin load case

Figure 2-52 on page 107 shows the table RZLLICLASS after the changes.

Parameter TIMERERD 15 → 120 Logon Group Time Interval in seconds

FAVTYPE B → R Round robin


Figure 2-52 Round robin load case: table RZLLICLASS parameters

� We changed the Logon Group RFC_BW to enable the external RFC-enabled, as shown in Figure 2-53.

Figure 2-53 Round robin load case parameters

The process chain, as described in Figure 2-49 on page 105, is not changed.

NMON then reported a very balanced CPU behavior for both application servers, as shown in Figure 2-54 on page 108.


Figure 2-54 Load Robin load case: NMON report

That is the behavior we needed, because we wanted to use the CPU to its full capacity.

2.8 Maximizing the upload scenario

In this section, we describe the different tests we performed before the KPI to adjust some of the parameters in order to build the most suitable architecture and environment. Some tuning options are discussed in 1.2.6, “Optimization and tuning options” on page 47.

2.8.1 Understand the upload processes

The upload process consist of several processes and this is important to understand where these processes are running. The full process is made of the following sub-processes:

� BI_PROCESS_TRIGGER

This is the process chain trigger that simply starts the chain running. It will run on the selected application server in a batch work_process. It is a very short duration process.

� BI_PROCESS_LOADING

This batch process step starts the actual data extractor. It runs in the same application server as the TRIGGER and last only long enough to initiate the actual extractor.

� BIREQU_XXX

This generated jobname is used for the extractors. Extractors are batch jobs which run on the application server defined in the customization (described later). This process generates the parallel dialog-tasks which do the actual data translation and updating of the target InfoCubes.

Understand where the processes run


The application server defined as host for these jobs will require a batch WP for each extractor, and a number of dialog processes to handle the qRFC distribution. The SAP Gateway of this application server will be used to distribute the dialog tasks across the system.

� DIA_submit process

This is the dialog process which is resident on the same application server as the extractor, and represents the local half of the load distribution. This dialog process initiates a qRFC request over the gateway and distributes the data packets to the actual dialog-tasks used for upload across the system.

When the remote dialog-task finishes processing its current block, it reports its status back to the local dia process. Each distributed data packet is presented by an end-to-end conversation across the local gateway to the remote gateway of the processing application server.

� DIA_update processes

This is the dialog process, spawned by the qRFC request, which actually processes the datablock and updated the target InfoCubes. This process location is determined by the logon group (which application servers are participating) and the distribution policy used (round robin or best).

The definition where the processes are running is different. In particular:

� You define in which application server the processes BI_PROCESS_TRIGGER and BI_PROCESS_LOADING run when you start the upload process as shown in Figure 2-55.

Figure 2-55 Upload processes: defining the application servers (1/2)


� For the BIREQU_XXX process:

– Use the transaction SBIW to display the IMG.– Expand Data Transfer to the SAP NetWeaver BI.– Expand General Settings.– Maintain Control Parameters for Data Transfer.– Maintain here the parameter for Data Transfer.– Save the changes in Workbench Transport request.

Figure 2-56 shows the entry for the source system EB8PRBW102. EB8PRBW102 is an RFC destination defined in the SM59 transaction. With this entry, the BIREQU_XXX jobs are running on the application server sys3as03.

Figure 2-56 The BIREQU_ parameter process

To run the RFC-Requests from the batch job with the best performance, we changed a few parameters. The standard values for the quotas are set restrictively, because it is likely that dialog users are also using the system and are treated with preference. To receive the maximum resources for the batch processes, we used the values shown in Table 2-4 in our tests.

Table 2-4 Upload application servers parameters set up

2.8.2 Tests to select the right amount of data

The KPI-A consisted of three different tests: queries, rollup, and uploads. To limit the time and the number of records to load, we selected a time frame in the process chain. We tested different time frames to find the best values for our environment.

rdisp/rfc_max_own_login 1 to 100

rdisp/rfc_max_own_used_wp 20 to 100

rdisp/rfc_max_login 90 to 100

rdisp/rfc_max_login 90 to 100


We used the 0PSTNG_DATE field from the data transfer structure; this field is used to partition the data to get requests with a controllable amount of records, rather than doing a full load of all data contained in the ODS.

Table 2-5 shows the results in our environment.

Table 2-5 f0PSTNG_DATE values

� Upload time frame

– To test the upload scenario, 1 hour will be enough. Therefore we will use the timeframe 01.01.1900 to 01.10.2004 in the InfoPackages.

– For the KPI-A test we will need an upload duration of 2 hours. In this case, the timeframe will be 01.01.1900 to 01.05.2005.

Figure 2-57 on page 112 shows an example of setting up the time frame.

From 0PSTNG_DATE

To 0PSTNG_DATE

Number of record from

ODS

Approximate time for upload

Approximate time for rollup

01.01.1900 01.05.2004 4.000.000

01.01.1900 01.10.2004 8.000.000 1 h

01.01.1900 01.01.2005 10.000.000

01.01.1900 01.05.2005 14.000.000 2 h 2,5 h

01.01.1900 01.07.2005 18.000.000

01.01.1900 01.09.2005 23.000.000 3 h 3,5 h

35.000.000


Figure 2-57 Test scheduling

� Rollup time frame

About 3.5 hours will be needed to test the rollup scenario.

2.8.3 Tests to select the right parameters for the upload scenario

A number of factors affect the throughput of rows per hour:

� The number of processes� The number of processors� The memory allocated� The system model� The packet size (or blocksize)� The number of InfoCubes loaded per ODS� The number of extractors� The number of SAP Dialog processes (DIA)

In this section we discuss some of these factors that affect the parallelization. We ran the tests listed in Table 2-6 in order to determine the best combination of these parameters.

Table 2-6 Upload preliminary tests parameters

Test number

Number of ODS

Number of InfoCubes

Maxprocs Processors Maxsize Max DIA

1 1 1 64 32 & 44 160,000 2 * 48

2 1 2 64 32 & 44 160,000 2 * 48

A series of tests done to choose the right parametersfor the critical factors.


The results of these tests are summarized in Table 2-7.

Table 2-7 Parallelization tests results

Table 2-8 on page 114 provides some ratio calculation.

� The ratio DIA/CP is the number of SAP dialog processes per physical CPU in the application servers. This ratio gives an idea of the balance of physical resource to SAP work process. If there are too many work processes per CPU, they begin to compete with each other for physical resources. This is detrimental to the throughput.

� The scalability CP is the number of records per Dialog Process multiplied by the number of physical processors. This gives an idea of what could be expected with additional CPUs or a scalling of the physical landscape.

3 1 4 64 32 & 44 160,000 2 * 48

4 1 4 64 32 & 44 80,000 2 * 48

5 1 4 64 32 & 44 320,000 2 * 48

6 1 5 64 32 & 44 160,000 2 * 48

7 1 7 64 32 & 44 80,000 2 * 48

8 1 7 64 32 & 44 160,000 2 * 48

9 1 7 64 32 & 44 40,000 2 * 48

10 1 7 73 32 & 44 80,000 2 * 48

11 1 7 88 2 * 40 80,000 2 * 48

Test

nu

mb

er

Nu

mb

er o

f In

foC

ub

es

Nu

mb

er o

f E

xtra

cto

rs

DIA

use

d

Turn

aro

un

d

Th

rou

gh

pu

t/h

r

Rec

ord

s/C

PU

Rec

ord

s/D

IA

Rec

ord

s/IC

1 1 1 15 5 mn 4,066,977 63,547 271,132 4,066,977

2 2 1 26 9 mn 7,508,086 117,314 288,773 3,754,043

3 3 1 48 17 mn 13,062,090 204,095 272,127 3,265,522

4 4 1 51 9 mn 14,686,996 229,484 287,980 3,671,749

5 1 Long-running packets have negative impact

6 5 1 61 22 mn 15,425,640 241,026 252,879 3,085,128

7 7 1 63 18 mn 18,548,141 289,815 294,415 2,649,734

8 7 1 63 30 mn 17,399,421 271,866 276,181 2,485,632

9 1 SAP locking problems

10 7 1 72 N/Aa

a. Not measured

19,642,708 306,917 272,815 2,806,101

11 7 1 88 N/Aa 21,345,016 266,813 242,557 3,049,288


� The scalability IC is the total throughput divided by the total number of target InfoCubes. This gives an idea of the scalability by adding additional target InfoCubes.

� BKG is the number of dialog processes divided by the number batch extractors; it would give the ratio of parallelism achieved by a single extractor. The total number of target cubes per number of extractors gives the ratio of cube to extractor. The level of parallelism achieved by a single extractor divided by the number of InfoCube addressed by the extractor would show how many parallel processes it achieved per InfoCube.

Table 2-8 Parallelization test results ratios

BlocksizeThe objective for parallelism is to get as many dialog processes active as possible.

� If too big, the data package size will require too much memory for the dialog processes.

� If too small, the dialog processes return too quickly to the extractor and few parallel processes will be initialized (in extreme cases, the total number of data packages will grow too high which might lead to locking problems on table RSREQDONE and the application will suffer bottlenecks).

Data package sizes from 40 KB to 320 KB were been tested.

Number of target InfoCubesThe dialog tasks take each block received and process it through the transformation rules for each target InfoCube. Each block read by the extractor can therefore be used multiple times, thus reducing the load on the data base and the extractor.

Additional target InfoCubes extend the work cycle of the dialog task: the longer the dialogs work, the more parallel processes that the extractor can keep active simultaneously. The goal is to achieve a balance between the turnaround time of the DIA tasks and the extractors’ capability to spin off new packets.

One to seven target InfoCubes were tested.

Test

Nu

mb

er

Rat

io D

IA/C

P

BK

G t

o D

IA

Sca

lab

ility

CP

Sca

lab

ility

IC

1 1.0 15.0 17,352,435 203,348,850

2 1.0 13.0 18,481,443 93,851,078

3 1.0 12.0 17,416,120 40,819,031

4 1.0 12.8 18,430,739 45,896,861

6 1.0 12.2 16,184,278 30,851,281

7 1.0 9.0 18,842,556 18,926,675

8 1.0 9.0 17,675,602 17,754,511

10 1.1 10.3 17,460,185 20,043,579

11 1.1 12.6 15,523,647 21,780,627


Multiple conclusions can be drawn from the results:

� For a single InfoCube, the scalability is very limited.

� The scalability increases when multiple target InfoCubes are involved.

� A single extractor continues to scale up in parallel in a one-to-many scenario when increasing the number of target InfoCubes.

� A reduced blocksize results in quicker turnaround.

Number of data extractorsThe extractor represents a bottleneck for parallelization in a model with a minimum number of target InfoCubes. To parallelize a “one-to-few” load model, we tested a double extractor with a selection criteria on a data range.

For AIX 5.2, the traditional 1-4 configuration, as shown in Figure 2-58, was found to be the most efficient because the extractors could feed up to 42 dialog processes during the KPI tests. This number of DIA processes in parallel was the limitation of the 32-way application servers.

Figure 2-58 Number of data extractors - the traditional 1-4 configuration

To increase the parallelization for AIX 5.3 we tested three configurations, as shown in Figure 2-59 on page 116. In these tests, each extractor could drive another set of DIA tasks, but each extractor added additional query and read load on the database.

ODS Extractor

ICIC

ICICMaxproc

64

4 InfoCubes

1 Source 1 Batch

ODS Extractor

ICIC

ICICMaxproc

64

4 InfoCubes

1 Source 1 Batch


Figure 2-59 Number of data extractor tests

Table 2-9 on page 117 shows the results of these tests.

ODS ExtractorICICICICMaxproc

58

4 InfoCubes

1 Source 1 Batch

Test 1 Current KPI-A Config Parallel Streams


58

4 InfoCubes

1 Source 1 Batch

ODS Extractor

ICICICICMaxproc58

4 InfoCubes

1 Source 2 Batch

Extractor

ODS Extractor

ICICICICMaxproc58

4 InfoCubes

1 Source 2 Batch

Extractor

Test 2 Split streams


29

4 InfoCubes

1 Source 1 Batch

Test 3 Multiple Parallel Streams


29

4 InfoCubes

1 Source 1 Batch


29

4 InfoCubes

1 Source 1 Batch


29

4 InfoCubes

1 Source 1 Batch


Table 2-9 Number of data extractors tests results

Figure 2-60 summarizes the tests.

Figure 2-60 Number of data extractors results graphs

The conclusion is that multiple parallel streams bring nearly the throughput of the split stream with more CPU cost, and that the split stream offers the best price/performance throughput as well as the highest throughput.

To summarize, we observed that we had the best throughput per hour overall in test 11. That means, in terms of throughput per hour, we achieved the best results with the configuration of:

Test Number of

ODS

Target InfoCubesper ODS

Extractorper ODS

DIA MAXPROCS Recordsper hourper CP

CPUused

1 2 4(Total = 8)

1 40(Total = 160)

58(Total = 116)

454,000 65.6

2 2 4(Total = 8)

2 40(Total = 160)

29(Total = 116)

548,000 74.8

3 4 4(Total = 16)

1 40(Total = 160)

29(Total à 116)

415,000 96

Rec Per Hr per CPU

0

100000

200000

300000

400000

500000

600000

ParallelStream

Split Stream Multi-Stream

Rec

ords

/Hr

Rec Per Hr per CPU

0

100000

200000

300000

400000

500000

600000

ParallelStream

Split Stream Multi-Stream

Rec

ords

/Hr

Throughput vs CPU Util

05

1015202530354045

ParallelStream

SplitStream

Multi-Stream

Mill

ion

Rec

ords

per

Hr

0

20

40

60

80

100

120

Phy

CPU

sR

equi

red

throughputCpu util

Throughput vs CPU Util

05

1015202530354045

ParallelStream

SplitStream

Multi-Stream

Mill

ion

Rec

ords

per

Hr

0

20

40

60

80

100

120

Phy

CPU

sR

equi

red

throughputCpu util


one ODS with seven InfoCubes and maxsize = 80,000, and when the ration of DIA-to-processor is 1.1.

Note: Looking only at the records per InfoCube value, then we had the best throughput per InfoCube if we loaded one ODS to one InfoCube. But with this configuration we used 15 CPU per InfoCube to get this throughput. In other words:

� If the customer is limited to the number of CPUs, the best result is obtained when the upload is done when uploading multiple ICs from one ODS.

� If the customer is limited to the number of InfoCubes and has enough CPUs, the customer can achieve the best throughput when uploading one InfoCube from one ODS.


Chapter 3. The DB2 perspective

This chapter addresses the project details from a database technical expert perspective. It covers the following topics:

� The basic components of the IBM DB2 UDB (Universal Database) ESE (Enterprise Server Edition) V 8.2 used in our tests

� The components used in our test scenarios

� Details of the DPF feature

� Tools we used to monitor the DB2 environment

� Details of why and how we distributed the DB2 partitions

� Details of how we balanced the resources in our specific scenario

� The major DB2 options and parameters used for our tests

3


3.1 DB2 overview

IBM DB2 UDB is composed of several objects. It relies on relational database concepts.

The following sections describe the IBM DB2 UDB concepts needed to understand our test scenarios. Some of the information provided here may vary in other scenarios, due to operating system particularities.

3.1.1 Instances (database manager)

In IBM DB2 UDB, the instance is the highest entity in the structure. It is responsible for setting up communication parameters, Fast Communication Manager1 (FCM) memory parameters, authentication methods, authorization, and so on. The highest roles in DB2 UDB are configured at the instance level (for example SYSADM, SYSCTRL and SYSMAINT).

An instance must always be assigned to an operating system user, normally called the instance owner. This user and its primary group is automatically added on the SYSADM group by default. On UNIX and Linux systems, when you start the DB2 UDB processes, they are owned by the instance owner user identification (userid), creating a pool of resources required by the databases.

The IBM DB2 UDB structure is also known as shared-nothing. This means that the data portion managed by one of the instances is not visible and cannot be accessed by any other instance in the multiple instance structure. This structure relies on a control instance, which is responsible for coordinating requests throughout the remaining instances.

All instances in a partitioned scenario must run under the same userid: this is accomplished by the configuration of a file called db2nodes.cfg; refer to Example 3-1.

Example 3-1 A db2nodes.cfg file example

0 fabiohas_linux 01 fabiohas_linux 12 fabiohas_linux 23 fabiohas_linux 3

Example 3-1 displays these values:

� This configuration creates a partitioning instance structure with four instances, starting from the first field on the left.

� The second field is the server name (in our case, this is called fabiohas_linux and is the same for all of them).

� The last field relates to the instance communication port and must match the service entries in the services file, as depicted in Table 3-1.

Table 3-1 Port partition definitions

1 For more information about the Fast Communication Manager feature of DB2, visit: http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0501nisbet/

DB2_fabiohas 60000/tcp

DB2_fabiohas_1 60001/tcp

DB2_fabiohas_2 60002/tcp

DB2_fabiohas_END 60003/tcp

The shared-nothing architecture and the role of the instance.


http://www-128.ibm.com/developerworks/db2/library/techarticle/dm-0501nisbet/

Table 3-1 on page 120 shows that DB2_fabiohas is the port for partition 0, DB2_fabiohas_1 is the port for partition 1, and so on. It is very important to understand the difference between these ports and the communication port configured in the SVCENAME instance parameter.

In our tests, remote shell (rsh) was used to enable the communication between servers, but using secure shell (ssh) trust based on keys would also be an option. Prior to IBM DB2 V8.2 fixpack 3, the only method supported was by the use of rsh. In the case of ssh, the generation of ssh keys and the addition of the public key is made into a file called authorized_keys. The ssh key is generated by the command ssh-keygen.

The Data Partitioning Feature (DPF) formally affects directly the instance structure: it multiplies the processes of one instance for multiple instances. This structure can be used to create multiple instances in the same server or in separate servers, spreading data and processing load across multiple servers.

When configuring DPF on multiple distinct servers or LPARs, an additional service must also be considered: database management systems for relational databases rely on transactions and the generation of logs, so the time stamps must be as accurate as possible. To achieve that, a time synchronization service must be configured and normally requests the use of the network time protocol (NTP).

In our tests, this was accomplished by the use of the xntpd daemon. The xntpd daemon sets and maintains a UNIX system time-of-day in compliance with Internet standard time servers. The xntpd daemon is a complete implementation of NTP.

To verify that you have xntpd running, use the lssrc command as shown in Example 3-2; the Peer field will show which time server is configured. (Alternatively, you can also check the server parameter in the /etc/ntp.conf file.)

Example 3-2 Output of the lssrc command

root@sys3db0p:/root/ # lssrc -ls xntpd Program name: /usr/sbin/xntpd Version: 3 Leap indicator: 11 (Leap indicator is insane.) Sys peer: no peer, system is insane Sys stratum: 16 Sys precision: -17 Debug/Tracing: DISABLED Root distance: 0.000000 Root dispersion: 0.000000 Reference ID: no refid, system is insane Reference time: no reftime, system is insane Broadcast delay: 0.003906 (sec) Auth delay: 0.000122 (sec) System flags: bclient pll monitor filegen System uptime: 265199 (sec) Clock stability: 0.000000 (sec) Clock frequency: 0.000000 (sec) Peer: nesdata flags: (configured) stratum: 16, version: 3 our mode: client, his mode: unspecifiedSubsystem Group PID Status xntpd tcpip 807146 active

When using DPF with multiple LPARs, time synchronization is needed.

Chapter 3. The DB2 perspective 121

For further information about NTP and the xntpd daemon, visit the AIX Information Center:

http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.doc/cmds/aixcmds6/xntpd.htm

3.1.2 Database considerations

An IBM DB2 UDB database is a physical entity, and must be created on a database path.

Configuration optionsAll the authorization levels that are related to the database level are granted or revoked as Data Definition Language (DDL) in the database. When a database is created, three basic table spaces are created and four hidden system buffer pools are also created, one for each page size: 4 KB, 16 KB, 32 KB, and 64 KB. These buffer pools are very small and are only used when a table space with a specific page size does not match with any explicitly-created buffer pool or any error or deferred memory allocation of the explicit buffer pool allocation.

A database can be configured to use circular log or linear log. When using circular log, logs are overwritten as needed and no online backup is allowed. When using linear log, a user exit method should also be configured to archive inactive logs.

In DB2 UDB 8.2.2, a new clause was added to the database creation command which relates to automatic storage. This clause allows you to manage storage and space without interventions.

The automatic storage feature allows the database manager to determine which containers are to be assigned to the table space, based upon the storage paths that are associated with the database. Table spaces managed by automatic storage automatically increase in size when they become full; automatic storage is enabled for the database during the database creation time, and storage paths can be added subsequently.

After automatic storage is enabled in the database level, new table spaces can be optionally created with automatic storage. In DB2 8.2.2, if a table space is not created with automatic storage, this table space cannot be changed to automatic storage afterwards and automatic storage management is not supported for multipartition database. So even if you do not need this feature enabled now, it is a best practice to create the database with automatic storage enabled because no conversion or enablement of this feature on an existing database is supported at the time of writing. This feature is implicitly set to On.

A database is composed of several objects that may relate each other on a referential way. The main database objects are:

� Table spaces� Buffer pools� Tables and indexes

Table space categoriesA DB2 database is a logical aggregation of data containers or data files. A container is a way of defining what location on the storage device will be made available for storing database objects. Containers may be assigned from file systems by specifying a directory. Containers may also reference files that reside within a directory. Additionally, containers may reference raw devices, which are identified as DEVICE containers.

Buffer pools, logs, and storage management.

DMS is the SAP DB2table space definition.




DB2 supports two main categories of table spaces: System Manages Space (SMS) table spaces, and Data Managed Space (DMS) tablespaces, as explained here:

� System Managed Space (SMS) table spaces

With SMS, the operating system’s file manager allocates and manages the space where the table space is to be stored. A list of directories on the file system is assigned to a table space when an SMS table space is defined.

For an SMS, the space is allocated on demand: for every database object, several files are created and they grow or shrink depending on the amount of data within the object.

Containers are represented by directories in the file system. For very large SAP systems, the directories should have their own file system and reside on separate disks. By default, table spaces are created as SMS at SAP installation.

� Data Managed Space (DMS) table spaces

With DMS, the database manager controls the storage space. A list of files or devices is selected to belong to a table space when the DMS table space is defined. The space on those devices or files is managed by the DB2 database manager.

DMS table spaces use striping to ensure an event distribution of data across all containers. There are two container options with DMS: FILE-based containers for preallocated files as a container, and DEVICE-based containers for raw devices in UNIX and partitions in Windows. When working with file containers, the database manager allocates the entire container when the table space is created.

The table space is considered full when all of the space within the containers has been used. However, unlike SMS, with DMS you can add or extend containers using the ALTER TABLESPACE statement, thus allowing more storage space to be given to the table space.

DMS is the SAP standard DB2 table space definition (except for temporary table spaces), and the recommendation is to associate each container with a different disk to take advantage of parallel I/O operations.

These containers or data files are allocated and assigned to a specific table space, which is responsible for balancing data across containers by the use of table space maps and table space extent maps. These maps define stripes, which are continuous blocks of extends spanning throughout containers.

Three basic table spaces are created by default:

� The SYSCATSPACE table space holds system catalog information like the definition of indexes, tables, buffer pools, and so on. These table spaces and tables are unique for each database, meaning that there is no way to share information between databases without the use of federation or replication techniques.

� The TEMPSPACE table space is a system temporary table space used whenever persistent storage is needed, as in a table reorganization.

� The USERSPACE table space is a user table space that can be used to store user table data.

Although not created by default, another table space type that may also be required is the user temporary table space. This table space is required for applications which need the use of user temporary tables.

Table spaces can be created with different page sizes: 4 KB, 8 KB, 16 KB, 32 KB, and 64 KB.

� When the page size is 4 KB, the row length can be up to 4005 bytes.

� When the page size is 8 KB, the row length can be up to 8101 bytes.

� When the page size is 16 KB, the row length can be up to 16 293 bytes.


� When the page size is 32 KB, the row length can be up to 32 677 bytes.

Having a larger page size facilitates a reduction in the number of levels in any index. If you are working with online transaction processing (OLTP) applications, which perform random row reads and writes, using a smaller page size is better, because it wastes less buffer space with undesired rows. If you are working with decision support system (DSS) applications, which access large numbers of consecutive rows at a time, then using a larger page size is better, because it reduces the number of I/O requests required to read a specific number of rows.

An exception, however, is when the row size is smaller than the page size divided by 255. In such cases, there is wasted space on each page (there is a maximum of 255 rows per page). To reduce this wasted space, using a smaller page size may be more appropriate.

A new feature introduced in DB2 UDB V8.2.2 is the auto-resize option in the CREATE TABLESPACE statement.

The auto-resize feature DMS table spaces also have a feature called “auto-resize” (which is sometimes called “auto-extend”). Because space is consumed in a DMS table space that can be automatically resized, DB2 UDB may extend one or more file containers. (SMS table spaces have similar capabilities for growing automatically, but the term auto-resize is used exclusively for DMS.)

By default, the auto-resize feature is not enabled for a DMS table space. To enable the auto-resize feature, specify the AUTORESIZE YES clause as part of the CREATE TABLESPACE statement, as shown in Example 3-3:

Example 3-3 Enabling the AUTORESIZE feature in the CREATE statement

CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES

You can also enable (or disable) the auto-resize feature after a DMS table space has been created by using the AUTORESIZE clause on the ALTER TABLESPACE statement, as shown in Example 3-4.

Example 3-4 Enabling the AUTORESIZE feature in the ALTER statement

ALTER TABLESPACE DMS1 AUTORESIZE YESALTER TABLESPACE DMS1 AUTORESIZE NO

Two other attributes, MAXSIZE and INCREASESIZE, are associated with the auto-resize table spaces.

� Maximum size (MAXSIZE)

The MAXSIZE clause on the CREATE TABLESPACE statement defines the maximum size for the table space. Example 3-5 illustrates how to create a table space that can grow to 100 MB (per partition, if the database has multiple partitions).

Example 3-5 The MAXSIZE attribute in the CREATE statement

CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES MAXSIZE 100 M

The MAXSIZE NONE clause specifies that there is no maximum limit for the table space. The table space can grow until a file system limit or DB2 table space limit has been reached2. No maximum limit is the default if the MAXSIZE clause is not specified when the auto-resize feature is enabled.

Auto-resize:a new feature in DB2 V8.2.2.Must be enabled.


The ALTER TABLESPACE statement changes the value of MAXSIZE for a table space that has auto-resize already enabled, as shown in Example 3-6.

Example 3-6 The MAXSIZE attribute in the ALTER statement

ALTER TABLESPACE DMS1 MAXSIZE 1 GALTER TABLESPACE DMS1 MAXSIZE NONE

If a maximum size is specified, the actual value that DB2 enforces may be slightly smaller than the value provided because DB2 attempts to keep container growth consistent. It may not be possible to extend the containers by equal amounts and reach the maximum size exactly.

� Increase size (INCREASESIZE)

The INCREASESIZE clause on the CREATE TABLESPACE statement defines the amount of space used to increase the table space when there are no free extents within the table space, and a request for one or more extents has been made. The value can be specified as an explicit size or as a percentage, as shown in Example 3-7.

Example 3-7 The INCREASE attribute in the CREATE statement

CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES INCREASESIZE 5 M;

CREATE TABLESPACE DMS1 MANAGED BY DATABASE USING (FILE '/db2files/DMS1' 10 M) AUTORESIZE YES INCREASESIZE 50 PERCENT;

A percentage value means that the increase size is calculated every time that the table space needs to grow, and growth is based on a percentage of the table space size at that time. For example, if the table space is 20 MB in size and the increase size is 50%, then the table space grows by 10 MB the first time (to a size of 30 MB) and by 15 MB the next time.

If the INCREASESIZE clause is not specified when the auto-resize feature is enabled, DB2 determines an appropriate value to use, which may change over the life of the table space. Like AUTORESIZE and MAXSIZE, you can change the value of INCREASESIZE by using the ALTER TABLESPACE statement.

If a size increase is specified, the actual value used by DB2 may be slightly different than the value provided. This adjustment in the value used is done to keep growth consistent across the containers in the table space.

For table spaces that can be automatically resized, DB2 attempts to increase the size of the table space when all of the existing space has been used and a request for more space is made. DB2 determines which containers can be extended in the table space so that a rebalance does not occur. DB2 extends only containers that exist within the last range of the table space map (the map describes the storage layout for the table space), and they are all extended by an equal amount.

The importance of sizing buffer poolsMost data manipulation takes place in buffer pools. Therefore, configuring buffer pools is the single most important aspect of tuning. Only large objects and long field data are not manipulated in buffer pools.

2 For information about Structured Query Language (SQL) limits, refer to the SQL reference at: http://nevada.torolab.ibm.com/docs/db271/db2s0/frame3.htm#vdummy


http://nevada.torolab.ibm.com/docs/db271/db2s0/frame3.htm#vdumm

A buffer pool is memory used to cache table and index data pages as they are being read from disk, or being modified. The buffer pool improves database system performance by allowing data to be accessed from memory instead of from disk. Memory access is much faster than disk access, so the less often the database manager needs to read from or write to a disk, then the better the performance.

When an application accesses a row of a table for the first time, the database manager places the page containing that row in the buffer pool. The next time any application requests data, the database manager looks for it in the buffer pool. If the requested data is in the buffer pool, it can be retrieved without disk access, resulting in faster performance.

Memory is allocated for the buffer pool when a database is activated, or when the first application connects to the database. Buffer pools can also be created, dropped, and resized while the database manager is running. If you use the IMMEDIATE keyword when you use the ALTER BUFFERPOOL statement to increase the size of the buffer pool, memory is allocated as soon as you enter the command if the memory is available.

If the memory is unavailable, then the change occurs when all applications are disconnected and the database is reactivated. If you decrease the size of the buffer pool, memory is deallocated at committal time. When all applications are disconnected, the buffer pool memory is deallocated.

To ensure that an appropriate buffer pool is available in all circumstances, DB2 creates small buffer pools, one with each page size: 4 KB, 8 KB, 16 KB, and 32 KB. The size of each buffer pool is 16 pages. These buffer pools are hidden from the user. They are not present in the system catalog or in the buffer pool system files. You cannot use or alter them directly, but DB2 uses these buffer pools in the following circumstances:

� When a buffer pool of the required page size is inactive because insufficient memory was available to create it after a CREATE BUFFERPOOL statement was executed with the IMMEDIATE keyword

In this case, a message is written to the administration notification log. If necessary, table spaces are remapped to a hidden buffer pool. Performance might be drastically reduced.

� When the ordinary buffer pools cannot be brought up during a database connect

This problem is likely to have a serious cause, such as an out-of-memory condition. Although DB2 will be fully functional because of the hidden buffer pools, performance will degrade drastically. You should address this problem immediately. You receive a warning when this occurs, and a message is written to the administration notification log.

Pages remain in the buffer pool until the database is shut down, or until the space occupied by a page is required for another page. The following criteria determine which page is removed to bring in another page:

� How recently the page was referenced.

� The probability that the page will be referenced again by the last agent that looked at it.

� The type of data on the page.

� Whether the page was changed in memory but not written out to disk (changed pages are always written to disk before being overwritten).

Note: To reduce the necessity of increasing the size of the dbheap database configuration parameter when buffer pool sizes increase, nearly all buffer pool memory (including page descriptors, buffer pool descriptors, and hash tables) comes out of the database shared memory set and is sized automatically.

Buffer pool size is a key tuning parameter.


It is considered a best practice to split data and indexes between different table spaces to avoid I/O contention.

Table and performance optionsA table consists of data logically arranged in columns and rows. All database and table data is assigned to table spaces. Table data is accessed using the Structured Query Language (SQL). When creating a table, you can decide to store all related objects (for example, indexes and large object data) in the same table space, or keep them in separate table spaces.

A referential constraint may be defined in such a way that either the parent table or the dependent table is a part of a table hierarchy. In such a case, the effect of the referential constraint is as follows:

� Effects of INSERT, UPDATE, and DELETE statements

If a referential constraint exists, in which PT indicates a parent table and DT indicates a dependent table, the constraint ensures that for each row of DT (or any of its subtables) that has a non-null foreign key, a row exists in PT (or one of its subtables) with a matching parent key. This rule is enforced against any action that affects a row of PT or DT, regardless of how that action is initiated.

� Effects of DROP TABLE statements

– For referential constraints in which the dropped table is the parent table or dependent table, the constraint is dropped.

– For referential constraints in which a supertable of the dropped table is the parent table, the rows of the dropped table are considered to be deleted from the supertable. The referential constraint is checked and its delete rule is invoked for each of the deleted rows.

– For referential constraints in which a supertable of the dropped table is the dependent table, the constraint is not checked. Deletion of a row from a dependent table cannot result in a violation of a referential constraint.

When any table is created, the definer of the table is granted CONTROL privilege. When a subtable is created, the SELECT privilege that each user or group has on the immediate supertable is automatically granted on the subtable with the table definer as the grantor.

The maximum number of bytes allowed in the row of a table is dependent on the page size of the table space in which the table is created. Table 3-2 lists the row size limit and number of columns limit associated with each table space page size.

Table 3-2 The row size limit

Table 3-3 lists the byte counts of columns by data type. This is used to calculate the row size. The byte counts depend on whether or not VALUE COMPRESSION is active. When VALUE COMPRESSION is not active, the byte count also depends on whether or not the column is nullable.

Page size Row size limit Column count limit

4 KB 4 005 500

8 KB 8 101 1 012

16 KB 16 293 1 012

32 KB 32 677 1 012


If a table is based on a structured type, an additional 4 bytes of overhead is reserved to identify rows of subtables, regardless of whether subtables are defined. Additional subtable columns must be considered nullable for byte count purposes, even if defined as not nullable.

Table 3-3 Byte counts of column by data type

Note: The value compression listed in Table 3-3 is different from the row compression introduced in DB2 V9.

Data type VALUE COMPRESSION is

active

VALUE COMPRESSION is not active

Column is nullable Column is not nullable

SMALLINT 4 3 2

INTEGER 6 5 4

BIGINT 10 9 8

REAL 6 5 4

DOUBLE 10 9 8

DECIMAL The integral part of (p/2)+3, where p is the precision

The integral part of (p/2)+2, where p is the precision

The integral part of (p/2)+1, where p is the precision

CHAR(n) n+2 n+1 n

VARCHAR(n) n+2 n+5 (within a table) n+4 (within a table)

LONG VARCHAR 22 25 24

GRAPHIC(n) n*2+2 n*2+1 n*2

VARGRAPHIC(n) n*2+2 n*2+5 (within a table) n*2+4 (within a table)

LONG VARGRAPHIC 22 25 24

DATE 6 5 4

TIME 5 4 3

TIMESTAMP 12 11 10

DATALINK(n) n+52 n+55 n+54

Max LOB2 length 1024 70 73 72

Max LOB length 8192 94 97 96

Max LOB length 65 536 118 121 120

Max LOB length 524 000 142 145 144

Max LOB length 4 190 000 166 169 168

Max LOB length 134 000 000 198 201 200

Max LOB length 536 000 000 222 225 224

Max LOB length 1 070 000 000 254 257 256

Max LOB length 1 470 000 000 278 281 280

Max LOB length 2 147 483 647 314 317 316


Recommendations when creating tables across partitionsThere are performance advantages to creating a table across several partitions in a partitioned database. For example, the work associated with the retrieval of data can be divided among the database partitions. Before creating a table that will be physically divided or partitioned, however, you need to consider the following:

� Table spaces can span more than one database partition. The number of partitions they span depends on the number of partitions in a database partition group.

� Tables can be collocated by being placed in the same table space or by being placed in another table space that, together with the first table space, is associated with the same database partition group.

� You must select a partitioning key carefully, because it cannot be changed later. Furthermore, any unique indexes (and therefore, unique or primary keys) must be defined as a superset of the partitioning key. That is, if a partitioning key is defined, unique keys and primary keys must include all of the same columns as the partitioning key (they may have more columns).

� The size limit for one partition of a table is 64 GB, or the available disk space, whichever is smaller (this assumes a 4 KB page size for the table space.) The size of the table can be as large as 64 GB (or the available disk space) times the number of database partitions.

– If the page size for the table space is 8 KB, the size of the table can be as large as 128 GB (or the available disk space) times the number of database partitions.



� Creating a table that will be a part of several database partitions is specified when you are creating the table in the CREATE TABLE statement. There is an additional option when creating a table in a partitioned database environment: the partitioning key.

A partitioning key is a key that is part of the definition of a table. It determines the partition on which each row of data is stored.

If you do not specify the partitioning key explicitly, the following defaults are used:

– If a primary key is specified in the CREATE TABLE statement, the first column of the primary key is used as the partitioning key.

– If there is no primary key, the first column that is not a long field is used.

– If no columns satisfy the requirements for a default partitioning key, the table is created without one (this is allowed only in single-partition database partition groups).

A row of a table, and all information about that row, always resides on the same database partition.

For more information about DB2 objects and concepts, visit the DB2 Infocenter:

http://publib.boulder.ibm.com/infocenter/db2luw/v8//index.jsp

3.2 The major DB2 processes used for our test

All client connections to DB2, whether they are local or remote applications such as SAP, are linked with the DB2 client library. Each SAP NetWeaver BI work process (for example dialog, background, and so on) has its own database connection to DB2. Client database



connections communicate using shared memory for local clients and TCP/IP for remote clients.

The following processes need to be well understood for the tests done in the project:

� The engine dispatch units

With UNIX systems, DB2 activity is managed by DB2 processes (these are called DB2 threads on other environments). These are commonly referred to as engine dispatch units (EDUs).

All client connections are allocated a coordinator agent (db2agent). This agent performs all database requests on behalf of the application. Subagents can be assigned if the server has multiple processors or is a partitioned database. All agents and subagents are managed using a pooling algorithm that minimizes the creation and destruction of EDUs.

As explained in “The importance of sizing buffer pools” on page 125, buffer pools are areas of database server memory and are key determinants of database performance. The configuration of the buffer pools, as well as prefetcher and page cleaner EDUs, controls how quickly data can be accessed and impacting response time for applications.

� Prefetchers and page cleaners

Prefetchers retrieve data from disk into buffer pools that may be used by applications, in advance of the data request, thus improving performance and response time. For example, an application needing to scan through large volumes of data would have to wait for data to be moved from disk into the buffer pool if there were no data prefetchers. Agents of the application send asynchronous read-ahead requests to a common prefetch queue. As prefetchers become available, they fulfill those requests to fetch the requested pages from disk into the buffer pool.

Page cleaners look for, and write out, pages from the buffer pool that are no longer needed. Page cleaners can ensure that there is room in the buffer pool for the pages being retrieved by the prefetchers.

Without the existence of prefetchers and page cleaners, applications would have to do all of the reading and writing of data between the buffer pool and disk storage. As a result of having prefetchers and page cleaners, therefore, applications can run faster because transactions are not forced to wait while they write pages to disk.

� The following list includes some of the important threads and processes used by each database:

– db2pfchr, for buffer pool prefetchers.

– db2pclnr, for buffer pool page cleaners.

– db2loggr, for manipulating log files to handle transaction processing and recovery.

– db2loggw, for writing log records to the log files.

– db2logts, for collecting historical information about which logs are active when a table space is modified. This information is ultimately recorded in the DB2TSCHG.HIS file in the database directory. It is used to speed up table space roll forward recovery.

– db2dlock, for deadlock detection. In a multi-partitioned database environment, an additional process called db2glock is used to coordinate the information gathered from the db2dlock process on each partition. Note that db2glock runs only on the catalog partition.

� The following threads and processes can be started to carry out various tasks:

– db2gds, the global daemon spanner on UNIX-based systems that starts new processes.


– db2wdog, the watchdog on UNIX-based systems that handles abnormal terminations, one db2wdog exists per database partition, and will be the parent process for all db2 processes for a partition.

– db2fcmdm, the fast communications manager daemon for handling inter-partition communication (used only in multi-partitioned databases).

– db2pdbc, the parallel system controller, handles parallel requests from remote nodes (used only in a partitioned database environment).

– db2cart, for archiving log files when accessing a database configured with USEREXIT enabled.

– db2panic, the panic agent, handles urgent requests after agent limits have been reached at a particular node (used only in a partitioned database environment).

3.3 Database Partitioning Feature

The Database Partitioning Feature (DPF) allows DB2 Enterprise Server Edition to partition a database within a single server or across a cluster of servers. The DPF capability provides multiple benefits including scalability to support very large databases or complex workloads, and increased parallelism for administration tasks. Databases can be partitioned within a single server (formally known as logical partitioning), or across separate servers (formally known as physical partitioning.) You can mix both partitioning features, having multiple partitions on distinct servers.

A partitioned database environment allows a database to remain a logical whole, despite being physically divided across more than one partition. The fact that data is partitioned remains transparent to most users. Work can be divided among the database managers; each database manager in each partition works against its own part of the database.

As described in “The DB2 initial layout” on page 6, the DB2 environment was created based on an existing customer infrastructure. The customer database was basically spread across five DB2 partitions. In the database layout for SAP NetWeaver BI, the best approach to distribute data load between partitions is based on an 1+n partition environment that relies on the movement of all dimension tables hosted on DB2 partition 0 on a dedicated physical server, and all remaining data like operational DataSources, InfoCubes, and aggregate tables to the remaining nodes and physical servers.

The database used in this project was a partitioned database. The original database was configured to use 1+5 database partitions. However, this original design was changed to accommodate the growth of the database to 20 TB; the number of database partitions was increased to 1+32 partitions, with the first (called DB2 partition 0) being the main partition with all dimension tables, and 32 all the remaining data and nodes. This environment is described in this section.

A partitioning key is a column (or group of columns) that is used to determine the partition in which a particular row of data is stored. A partitioning key is defined on a table using the CREATE TABLE statement.

If a partitioning key is not defined for a table in a table space that is divided across more than one database partition in a database partition group, then one is created by default from the first column of the primary key. If no primary key is specified, the default partitioning key is the first non-long field column defined on that table (a long field includes all long data types and all large object (BLOB) data types).

The project needed to grow from 5 DB2 partitions to 33 DB2 partitions.


If you create a table in a table space associated with a single-partition database partition group, and you want to have a partitioning key, you must define the partitioning key explicitly. Partitioning keys are not created by default.

If no columns satisfy the requirement for a default partitioning key, the table is created without one. Tables without a partitioning key are only allowed in single-partition database partition groups. You can add or drop partitioning keys at a later time, using the ALTER TABLE statement. Altering the partitioning key can only be done to a table whose table space is associated with a single-partition database partition group.

Choosing a good partitioning key is important. You should take the following points into consideration:

� How tables are to be accessed

� The nature of the query workload

� The join strategies employed by the database system

If collocation is not a major consideration, then a good partitioning key for a table is one that spreads the data evenly across all database partitions in the database partition group. The partitioning key for each table in a table space that is associated with a database partition group determines whether the tables are collocated. Tables are considered “collocated” when:

� The tables are placed in table spaces that are in the same database partition group.

� The partitioning keys in each table have the same number of columns.

� The data types of the corresponding columns are partition-compatible.

These characteristics ensure that rows of collocated tables with the same partitioning key values are located on the same partition.

An inappropriate partitioning key can cause uneven data distribution. Columns with unevenly distributed data, and columns with a small number of distinct values, should not be chosen as a partitioning key. The number of distinct values must be great enough to ensure an even distribution of rows across all database partitions in the database partition group. The cost of applying the partitioning hash algorithm is proportional to the size of the partitioning key. The partitioning key cannot be more than 16 columns, but fewer columns result in better performance. Unnecessary columns should not be included in the partitioning key.

The following points should be considered when defining partitioning keys:

� Creation of a multiple partition table that contains only long data types (types with LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, or DBCLOB properties) is not supported.

� The partitioning key definition cannot be altered.

� The partitioning key should include the most frequently joined columns.

� The partitioning key should be made up of columns that often participate in a GROUP BY clause.

� Any unique key or primary key must contain all of the partitioning key columns.

Hash partitioning is the method by which the placement of each row in the partitioned table is determined. The method works as follows:

� The hashing algorithm is applied to the value of the partitioning key, and generates a partition number between zero (0) and 4095.


� The partitioning map is created when a database partition group is created. Each of the partition numbers is sequentially repeated in a round-robin fashion to fill the partitioning map.

� The partition number is used as an index into the partitioning map. The number at that location in the partitioning map is the number of the database partition where the row is stored.

3.4 Monitoring tools and scripts

To monitor our environment, we used both well-known tools (such as IBM DB2 Performance Expert and NMON), and other tools that were built to satisfy specific requests. This section describes the tools and scripts used to monitor our environment, and presents some of the key reports that were generated.

As described earlier, our environment is composed of 100 SAP NetWeaver BI InfoCubes spread over DB2 partitions. Half of these InfoCubes were used for queries, and half were expected to be used for upload and aggregates. However, because of the exhaustive amount of testing, some DB2 partitions began to present “out of space” issues. To prevent run failures due to this problem, we created specific DB2 checking scripts.

3.4.1 IBM DB2 Performance Expert (DB2 PE)

DB2 performance was monitored by the use of IBM DB2 Performance Expert (DB2 PE). Using DB2 PE to monitor the DB2 environment was crucial because of its ability to store DB2-related data for a specific amount of time, feeding information for DB2 performance reports, overall tests, and KPIs achievement. The DB2 PE version used in our environment is displayed in Example 3-8.

Example 3-8 DB2 PE version used in our project

$ pelevel==============================================================================IBM DB2 Performance Expert Server V2 ==============================================================================IBM (c) DB2 Performance Expert Server for MultiplatformsIBM (c) DB2 Performance Expert Server for WorkgroupsVersion 2.2.1.0.419, code level U221_GAHOT-U419,R221_GAHOT-L1299,N221_GAHOT-E401==============================================================================

In our environment, we configured two DB2 PE instances that relate to the scenarios used for the KPI-A and KPI-3. The sample startup DB2 PE output is shown in Figure 3-1 on page 134.

DB2 PE, NMON and specific scripts were used to monitor DB2.


Figure 3-1 The startup DB2 PE output

A sample instance information report from DB2 PE is displayed in Figure 3-2 on page 135; it shows the DB2 information split in nodes in our database environment.


Figure 3-2 Instance information report from DB2 PE

A sample and very useful report about buffer pool hit ratio, sort overflows, catalog and package cache hit ratio, and application waiting on locks generated by DB2 PE is shown in Figure 3-3 on page 136.


Figure 3-3 An example of a DB2 PE report

3.4.2 NMON

NMON, a free tool that analyzes AIX and Linux performance, is not officially supported. The following site provides information about NMON:

http://www-128.ibm.com/developerworks/eserver/articles/analyze_aix/

NMON was used in this project to collect critical information such as memory usage, CPU usage, I/O activity, and network. NMON was started in all DB2 LPARs as part of a group of servers. One master server was configured with rsh access throughout all partitions, and it was responsible for starting the NMON daemon on all other servers that belonged to the system. Information was generated every 2 minutes and was dumped into a file. The script that we used to start an NMON instance on our DB2 servers is shown in Figure 3-4 on page 137. The following fields are displayed:

� The script is based on a time stamp formatted by 12 digits of format YYYYmmddHHMMSS.

For example, if the instance were started at 21:13:17 on July 3 2006, the time stamp would read: 20060703211317 (see line 3).

� The MACHINES_FILES variable (see line 10) lists all servers that must have the NMON instance running, and it is used on the for loop.

� The LOCK_FILE variable (see line 14) is used to hold the start time stamp so it can be visible to the stop script for finding find the right process on the target server and stopping it.



Figure 3-4 NMON start script

Starting NMONThe nmon command is started with the following options, which are described in Table 3-4:

nmon -s120 -c2700 -T -t -L -W -N -S -A -m ${DIR_REMOTE_TMP} -F ${DATE}

Table 3-4 NMON start command options

The script also runs a lscfg on the AIX to check the configuration at the start and stop of the run. The script is very straightforward and is based on rsh enablement between servers. To run the script, a parameter must be specified which is the server group, in our case, varying from SYS1, SYS2, and SYS3 as being the pool of servers in our environment.

For example, to start NMON monitoring on SYS1, we executed this command as root:

[root@cusdata]/XmRec/CUSTOMER/SEB: start_direct_run.sh sys1

Stopping NMONThe stop nmon script was written following the same concepts, and is shown in Figure 3-5.

Option Description

-s 120 refreshed every 120 seconds

${DIR_REMOTE_TMP} nmon directory on the target server

-F ${DATE} file name: YYYYmmddHHMMSS


Based on the time stamp previously stored on the lock file, the stop script goes through servers in the list used in the start script and kills the nmon process. The output file is then copied to the local server and moved to a Samba shared area, which we could access by Windows workstations.

Figure 3-5 NMON stop script

To stop NMON on SYS1, we executed this command as root:

[root@cusdata]/XmRec/CUSTOMER/SEB: stop_direct_run.sh sys1

Collecting the dataThe collected data is parsed and analyzed by a free tool called NMON analyser. For more information about this tool, visit:

http://www-941.haw.ibm.com/collaboration/wiki/display/Wikiptype/nmonanalyser

This tool is based on a Microsoft® Excel spreadsheet with Visual Basic® for Application (VBA) responsible for parsing and creating all NMON graphics and charts. The base version used in our project was Version 3.1.7 from May 5, 2006. We modified this version and added two new buttons: the facilities Analyze and Collect Data and Collect Analyzed Data.



Based on this tool, we created a hybrid version of the provided spreadsheet by implementing a third button to generate all default reports and also create a consolidated report in a Microsoft Word format. An example of the spreadsheet used in our tests is shown in Figure 3-6.

Figure 3-6 NMON analyzer tool modified for the project

The Analyze and Collect Data button was implemented over the logic created on the public available tool plus an implementation to generate a report with consolidated data about all servers in the server group. This button would invoke the function that parses nmon files and generates all the regular reports as well as a consolidated report, which helped us to analyze the entire structure.

The last button would generate the consolidated report-based reports that were generated by the first button; thus, no nmon files would need to be parsed again because the first report generated by the analyze part would feed the consolidation of a second report.

An example of one chart generated by the consolidation feature we created is shown in Figure 3-7 on page 140.


Figure 3-7 NMON analyzer customized report

3.4.3 DB2-specific monitoring scripts

To check the progress and better understand the runs, we created two scripts:

� Script A, to monitor the evolution of records inserted in all tables involved for each run

� Script B, to monitor the rollup phase

Script A: the upload DB2 monitoring scriptWe created a special script to monitor the upload phase. This script collected all the InfoCube names that were passed as parameters, and then monitored both the E-Fact table and the F-Fact table, or only the E-Fact table and only the F-Fact table.

The E-Fact and F-Fact tables are the main tables for an InfoCube. When you execute the upload phase, data from the ODS is pulled, transformed, and then inserted in the fact tables. In the E-Fact table, the information is compressed which helps performance and space usage, but requires more processing time and other resources. The SAP NetWeaver BI pattern is to add the E or F letter at the beginning of the InfoCube name; examples are listed in Table 3-5.

Table 3-5 E-Fact table and F-Fact table names

Note: These scripts are provided in “DB2 monitoring scripts” on page 256.

InfoCube name: ZGTFC060 F-Fact Table: /BIC/FZGTFC060

InfoCube name: ZGTFC060 E-Fact Table: /BIC/EZGTFC060


The script shown in Example A-1 on page 256 monitored all cubes listed on the CUBE_LIST parameter (see line 18). It stored all the counts for a specific point in time in a table named accordingly to the parameter TABLE_NAME. In our case, our table was named FABIO.STAT${DATE}, where ${DATE} follows a time stamp standard with 12 digits.

The describe command of one table generated by the script is depicted in Figure 3-8; it generates the table based on the number of InfoCubes that you are monitoring. In our example, we were monitoring InfoCubes ZGTFC002, ZGTFC004, ZGTFC008, ZGTFC010, ZGTFC029, ZGTFC033, ZGTFC035, and ZGTFC037. An additional column with the total number of records was inserted, and a time stamp column was also created.

Figure 3-8 The describe command generated by the upload script

The output of an uploading test would generate results similar to those displayed in Figure 3-9.

Figure 3-9 Uploading test results using the script


This output shows the number of records being inserted in a specific set of InfoCubes. This information helped us to create charts based on the number of records inserted per minute, as well as on how the numbers of records were inserted in a specific fact table. One of the charts that we generated based on this information is depicted in Figure 3-10.

Figure 3-10 Number of records inserted based on time

The chart in Figure 3-10 is related to the trend on the number of inserts per minute that is shown in Figure 3-11. Notice that in the beginning and end of the chart, the ramp-up and ramp-down phases were considered, but for KPI, the relevant results were measured during the full run period.

Figure 3-11 Number of records inserted per minute

Script B: the rollup DB2 monitoring scriptThe rollup phase was also monitored by the use of scripts. Because of the design of the rollup, we could not measure the amount of records added to an aggregate table during the


period that the aggregate job was running over the table due to its lock mechanism, which locked the whole table exclusively. The information that we gathered in this way may not reflect the database behavior on the rollup/aggregates test. The script that checks rollup aggregates insertions is provided in Example A-2 on page 260.

As you can see, the script looks like the first one described in “Script A: the upload DB2 monitoring script” on page 140 to monitor the upload. It generates a table with a column for each aggregate for a specific InfoCube, and stores table counts in a specific table created with a name defined by the parameter TABLE_NAME (see line 28).

(A different approach would be to use another script that is also created to count records on a table. By updating statistics on the table and checking the CARD column in SYSCAT.TABLES, you may see the real state of how records are being inserted. However, the overhead involved in issuing such frequent statistic updates may not worth the effort. We did not test this approach.)

The chart generated by the information retrieved from this script is depicted in Figure 3-12. This chart shows spikes that may not be the right behavior in the database but since we are unable to check the table due to the exclusive lock caused by SAP this is the best information that we could generate online. The overall statistics after the run is not impacted by this behavior since we measure the number of records inserted per a specific amount of time.

Figure 3-12 Rollup script output

The matrix of values in one of the tables used to monitor rollup and aggregates, as shown in Figure 3-13 on page 144, clearly demonstrates the SAP NetWeaver BI rollup job that holds the aggregate table from the beginning of the aggregation to its end. One peculiarity of this script is the use of view SAPR3.RSDDAGGR_V (see line 63 in Example A-2 on page 260) to retrieve the aggregate tables for a specific InfoCube.


Figure 3-13 Rollup script matrix output

Following the same structure for the upload DB2 monitoring script, the describe table would show each aggregate table and a TOTAL, with a SNAPSHOT to store the time stamp showing when the data was collected; see Figure 3-14.

Figure 3-14 The describe command generated by the rollup script

3.4.4 DB2 checking scripts

Prior to the run, some points needed to be checked on the InfoCubes, whether they were being used for upload or rollup, due to the lack of free space, which became an important issue in our scenarios.


To overcome this problem, we developed scripts to check the free space on table spaces and file systems. We checked the file systems because auto-resize was enabled in all table spaces. This new feature, introduced in IBM DB2 V8.2.2, really helped to reduce the complexity of managing the space if regular DMS table spaces were used. But due to its simplicity, it was also a trap.

We discovered that some InfoCubes were used exhaustively, which generated an imbalance between free space on some InfoCubes. Why did this occur even with all InfoCubes balanced across DB2 partitions? Using the 32-partition approach, InfoCube fact tables were spread over 8 partitions each, with 2 partitions on each LPAR. But although this arrangement did balance things from a processing and LPAR point of view—it did not ensure that space was being balanced.

Thus, to help us balance space, we developed the following scripts:

� Script C, which is provided in Example A-3 on page 264, was used to generate a report for a specific InfoCube. This script would go through all the table spaces and containers, checking space. It was used to generate a report for InfoCube fact tables and aggregate tables.

Script C could handle more than one InfoCube at a time. However, it only ran serially, checking one InfoCube after another. So it took a significant amount of time per InfoCube to check all the information that we required.

� We soon created a second simple script, which would perform InfoCube verification in parallel. This script is depicted in “Output - Script D” on page 270.

Example 3-9 shows the verification of 8 InfoCubes: ZGTFC002, ZGTFC004, ZGTFC008, ZGTFC010, ZGTFC029, ZGTFC033, ZGTFC035, and ZGTFC037, executing in parallel.

Example 3-9 DB2 checking - Script D execution

check_series_infocubes.sh ZGTFC002 ZGTFC004 ZGTFC008 ZGTFC010 ZGTFC029 ZGTFC033 ZGTFC035 ZGTFC037

Although it needed some time to process all the information, this script returned some of the most useful output of the testing. It provided information about free space on the table spaces, and about whether the table space would be able to be increased in size by the auto-resize-enabled option of the table space.

This script is provided in Appendix A, Example A-4 on page 270. Referring to that output, notice that information regarding free space on the InfoCube fact table space for each specific partition is shown on line 10, and information regarding free space on the file system for each table space container is shown on line 14. Also notice the following points:

– On line 156, you can see that it moves to the index table space for the InfoCube fact table.

– On line 308, it starts analyzing the space for the aggregates data table space.

– On line 472, it provides space information for the aggregates index table space. It retrieves all the space information required prior any type of run, uploads or rollup/aggregates.

3.5 The process for redistributing the DB2 partitions

The structure created for this benchmark was based on a customer environment with a database was architectured on 1+5 DB2 partitions. Following SAP recommendations,

The 4 steps needed to define 33 DB partitions from the 5 initial DB partitions.


partition 0 stored dimensional and master data tables like catalogs. The remaining nodes stored operational data stores and InfoCubes fact tables and aggregates.

In the initial structure, shown in Figure 2-7 on page 75, all partitions were created on a unique physical server. InfoCubes fact tables and aggregate tables, and persistent storage area (PSA) and operational data stores (ODS) tables were spread over partition 1 to partition 5.

To achieve better results and improve scalability, the number of DB2 partitions dedicated to InfoCubes fact tables and aggregate tables and operational data store tables was increased to 32 partitions. The main node (DB partition 0) was configured with a larger amount of CPU resources because, in this environment (and normally on any large scale database), it is the DB partition that experiences the highest impact and load.

You might consider that using 32 DB partitions is excessive, especially when compared with regular database environments of the same size. In our case, however, we chose to use 32 DB2 partitions in order to improve both scalability and some infrastructure maintenance tasks such as backup, restore, and roll forward. Be aware that such a configuration may incur additional overhead and complexity, so you would need to decide whether having a higher number of nodes would benefit your environment.

So far, we have shared our experience when increasing the number of DB partitions and the number of LPARs. In the following sections, we describe the steps needed to redistribute the nodes.

3.5.1 Step 1 - Restoring the initial image in our environment

The initial image was restored by using the DB2 regular restore process. A redirect restore was executed to redirect containers that were being used externally to containers that we created in our server. Figure 3-15 on page 147 illustrates the situation that existed at that stage.

This step simply created a replica of the initial database into our environment following the same node group concepts with 5+1 DB2 partitions. The initial database size was 8 TB. This database was restored in approximately 48 to 50 hours.


Figure 3-15 Step1: Restoring the initial image

3.5.2 Step 2 - Adding new LPARs and creating 32 new DB partitions

In our scenarios, 4 database-dedicated LPARs were added. For each LPAR, 8 additional nodes were created, which resulted in the infrastructure shown in Figure 3-16 on page 148.

xxxxxxxxxxxxxx

xxxxxxx

xxxxxxx xxxxxxx

LPAR0

ODSinfocubes

DimensionTables

Catalogtables

partition 0

partition 1

partition 2

partition 3

partition 3

partition 4


Figure 3-16 Step 2: LPAR distribution

The redistribution process presented a number of challenges, which we evaluated in order to select the most suitable method. The primary challenges were the time it would take, the number of database logs created, the total number of tables to move, and the number of large tables.

We evaluated the following methods to perform redistribution:

� The SAP r3load utility tool

This is a tool designed to export and/or import data into new or already existing databases. This tool can work with some of the SAP data dictionary logic and information using SAP data classes to move large numbers of tables.

We used this utility to move the majority of tables. This tool also provides compressing ratio for storing files created during the export.

� The IBM DB2 High Performance Unload (DB2 HPU) utility

This is a high-speed DB2 utility for unloading and loading tables.

We used this method to move the largest tables in the system (approximately 50 tables) representing the heaviest amount of data (approximately 60% of the data in the system).

� The DB2 redistribute command

This is a standard DB2 command.

We did not use this method because it would take too long to complete, and would also generate a massive amount of logs (it uses a select/insert method to perform the move).

xxxxxxxxxxxxxx

xxxxxxx

xxxxxxx xxxxxxx

LPAR0

ODS infocubes

LPAR1 LPAR2 LPAR3 LPAR4

DimensionTables

Catalogtables

partition 0

partition 1

partition 2

partition 3

partition 3

partition 4

partition 6

partition 7

partition 8

partition 9

partition 10

partition 11

partition 12

partition 13

partition 14

partition 15

partition 16

partition 17

partition 18

partition 19

partition 20

partition 21

partition 22

partition 23

partition 24

partition 25

partition 26

partition 27

partition 28

partition 29

partition 30

partition 31

partition 32

partition 33

partition 34

partition 35

partition 36

partition 37

new


� The db6conv SAP ABAP utility

This is an SAP utility for table movement, and it can be configured to use DB2 Call Level Interface (CLI) load so it does not log the inserts. The utility is best suited to moving single tables, but it can be configured to use a list of tables as input.

We did not use this method because of the large number of tables that had to be moved.

� The db2move native DB2 tool

This DB2 tool is suitable for moving move large numbers of tables using the export and import load utilities of DB2.

We did not use this method because it does not work with the SAP data dictionary and uses “load from cursor”, which means that data must pass through the coordinator node, namely, DB2 partition 0.

3.5.3 Step 3 - Executing the data redistribution

After adding new LPARs, and in order to overcome future scaling challenges, data redistribution from the original database to the new DB2 partitions was required. As mentioned, we decided to use DB2 HPU (which was originally created for z/OS® environments but is becoming more common in RS/6000® and xSeries environments) and SAP r3load.

As shown in Table 3-6, the majority of the data was moved using DB2 HPU, and the majority of the tables was moved using r3load. We chose this approach because r3load is SAP data classes-aware, which helped the overall process. And the use of DB2 HPU high performance for the largest tables was also extremely positive.

Table 3-6 Tools used for redistribution

In the following sections, we provide a brief description of these tools and how they were used in our tests.

IBM DB2 High Performance Unload (DB2 HPU)In our environment, using DB2 HPU was crucial for node redistribution, because it has the capability and sensitivity to unload data based on the hashing key of the new partitioned table. This feature improved the load of this data because it would be already split, based on the new table hashing key. The hashing key is tightly related to the partitioning key, and in our scenario the partitioning key did not change, so we only had a newer hashing key based on the new partitioning schema. We used DB2 HPU Version 2.2.4 in our tests.

DB2 HPU can extract tables from full, offline backups including backups of system managed space (SMS) or database managed space (DMS) table spaces or backups of the database. For backups of table spaces where a suitable backup of the DB2 catalog exists, DB2 HPU can extract the tables from the backup even if DB2 is down.

Number of tables Amount of data (TB) Approach used

48 4.5 DB2 HPU

41716 3.5 SAP r3load

Note: Because it is hashing key-aware, DB2 HPU is very useful for SAP NetWeaver BI/DB2 partition redistribution, and for situations in which a massive amount of data needs to be redistributed.

We used DB2 HPU and SAP r3load to redistribute data.


Version 2.2 also offers new repartitioning capabilities for data redistribution on the same system, or for data migration to a new system. High speed data unloads and splitting can now be done in a single operation, thus eliminating the use of unnecessary disk space.

In our case, we avoided the issue of unnecessary disk space by using pipes. We exported the table to a pipe and imported it back to the new table and partition.

A script was created to execute the data unload and load phase. The DB2 HPU export/import script we used is shown in Example 3-10. The script is split into different parts:

1. Lines 1 to 3 represent the header and for loop in all tables that were chosen.

2. Lines 5 to 13 provide the function to gather table space information, index space information, and generated Data Definition Language (DDL) creation statements for the table views and indexes.

3. Lines 14 to 22 provide the function to rename the real SAP table name to a temporary name. If the table has dependent views, the views are dropped, the table is renamed and the views are recreated. The function also generates indexes renaming clauses.

4. Lines 23 to 24 provide the function to execute the renaming statements and to create the table with the real SAP name on new DB2 partitions, which will be used as a target table.

5. Lines 25 to 41 provide the function to generate the DB2 HPU control file.

6. Lines 42 to 52 provide the function to create a pipe and to start data unload and load processes for each new DB2 partition.

7. Lines 53 to 59 provide the function to remove temporary unused files and close the loop.

Example 3-10 DB2 HPU export/import script

1 #! /bin/ksh 2 for table in `cat tables.list` 3 do 4 echo "starting process for table " $table " -- " `date` >> redistribute.log 5 # get tablespace and indexspace corresponding to table 6 ts=`grep $table redistribute.list | awk '{print $2}'` 7 is=ècho $ts | sed ' {s/ODSD/ODSI/ 8 t done 9 s/D/I/ 10 :done 11 }' ` 12 # generate ddl for table 13 db2look -d eb8 -z SAPR3 -e -o redistribute1.ddl -xd -tw $table > /dev/null 2>>redistribute.err 14 # generate RENAME statements in tom1 file 15 rm rename view 16 sed -n -f tom1.sed redistribute1.ddl 17 # modify generated ddl to add rename statements from previous step and create table in target tablespace 18 sed '/CONNECT TO/ { 19 r view 20 r rename 21 } 22 s/IN ".*" INDEX IN ".*"/IN "'$ts'" INDEX IN "'$is'"/' redistribute1.ddl > redistribute2.ddl 23 # run ddl to rename source table and create target table 24 db2 +c -stvf redistribute2.ddl 1>> redistribute.log 2>>redistribute.err


25 # customize HPU control file 26 sed '/@@@/ { 27 { 28 s#@@@SOURCE@@@#'$table'# 29 s/BIC/TOM/ 30 } 31 s#@@@OUTPUT@@@#'$table'# 32 s#@@@TARGET@@@#'$table'# 33 } ' tomunload.template > tomunload.ctl 34 # customize load control file 35 sed '/@@@/ { 36 { 37 s#@@@OUTPUT@@@#'$table'# 38 s#/BIC/## 39 } 40 s#@@@TARGET@@@#'$table'# 41 } ' tomload.template > tomload.ctl 42 # prepare pipes for loads 43 for node in 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 44 do 45 mkfifo /tmp/pipes$table.del.0$node 46 done 47 # run load 48 db2 -tvf tomload.ctl 1>> redistribute.log 2>>redistribute.err & 49 # now run HPU 50 db2hpu -f tomunload.ctl -m redistribute.msg 1>> redistribute.log 2>>redistribute.err 51 # wait 2 1/5 minutes for Load to finish building index 52 #sleep 150 53 for node in 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 54 do 55 rm /tmp/pipes$table.del.0$node 56 done 57 58 echo "finished with process for table " $table " -- " `date` >> redistribute.log 59 done

The DB2 HPU control file we used for load and unload is shown in Example 3-11. In the part shown here, terms like @@@<word>@@@ are used to be replaced in the data unload and load script by the real SAP table names.

In the unload control file, the clause TARGET TABLE ensures that data unload is executed using the hashing key of the new table created on the new DB2 partitions.

Example 3-11 DB2 HPU control file

Unload control file: 1 GLOBAL CONNECT TO EB8 2 QUIESCE NO LOCK NO 3 ; 4 UNLOAD TABLESPACE DB2 NO 5 SELECT * FROM SAPR3."@@@SOURCE@@@" ; 6 OUTPUT ("/tmp/pipes@@@OUTPUT@@@.del" )


7 OPTIONS DOUBLE DELIM ON 8 FORMAT DEL 9 TARGET TABLE ( SAPR3."@@@TARGET@@@" ) 10 ; Load control file: 1 LOAD FROM @@@OUTPUT@@@.del 2 OF DEL 3 SAVECOUNT 1000000 4 MESSAGES @@@OUTPUT@@@.msg 5 REPLACE 6 INTO SAPR3."@@@TARGET@@@" 7 NONRECOVERABLE 8 ALLOW NO ACCESS 9 LOCK WITH FORCE 10 PARTITIONED DB CONFIG 11 PART_FILE_LOCATION /tmp/pipes/BIC 12 MODE LOAD_ONLY 13 ;

The DB2 HPU basic configuration is shown in Example 3-12.

Example 3-12 DB2 HPU configuration

# HPU default configurationbufsize=4194304db2dbdft=EB8db2instance=db2eb8maxunloads=1nbcpu=32maxsockets=32insthomes=db2eb8:/db2/db2eb8instusers=db2eb8:db2eb8doubledelim=offportnumber=54002db2version=V8

The process, shown in Figure 3-17 on page 153, is summarized as follows:

� Phase 1 is the initial phase: all SAP NetWeaver BI tables were spread over 5 nodes in a single LPAR.

� Phase 2: this step checks and drops dependent views.

� Phase 3: the SAP NetWeaver BI table and indexes get renamed to temporary table and indexes names.

� Phase 4: the empty SAP NetWeaver BI table is created using the new DB2 partition group, which is spread over 8 nodes in 4 distinct LPARs.

� Phase 5: the data unload and load phase occurs (shown as a horizontal red arrow in the Phase 5 section of the figure).

After these phases were complete, tests were executed to validate the new tables, and then the temporary tables were dropped.


Figure 3-17 DB2 HPU phases

This structure offers straightforward benefits:

� Improved fallback strategy

In our environment, we did not have to deal with production constraints. However, by using this strategy, if something does go wrong during the load phase, a fallback is extremely fast because the table can be renamed to the original name.

� Reduced overload

Because the whole database did not have to be stopped, only processes that were dependent on the table being unloaded and loaded were affected. Of course, the redistribution process itself is an overload and runs in parallel, but the system itself suffered only a minor impact, from the usage point of view.

The following actions (which we did not use in our case due to complexity and time constraints) may have improved the full process:

� Postponing index creation until the end of the load phase

This would considerably improve the load time, but would add another step (index creation) at the end of the phase.

� Parallelizing the number of tables being unloaded and loaded

In our environment, one table at time was processed. Parallelizing would have considerably improved the redistribution. However, this would also add complexity to the scripts and have some processing impact on the DB2 coordinator node.

Also note the following considerations:

� The db2hpu process is very CPU-intensive. Consider and plan the execution of this activity to best fit your own environment.

Phase 1 Phase 2 Phase 3 Phase 4 Phase 5

Views Views Views

table

data

table

data

temp_tb

data data

tabletemp_tb temp_tb

data

indexes indexes temp_idx temp_idx indexes temp_idx

Views

table

data

indexes


� Although we faced minor issues in using the tool, some of the issues were related to not having the latest level of AIX in our environment.

SAP r3loadThe SAP r3load tool is provided by the SAPinst installation tool. The r3load tool helps you to export and import data based on SAP data classes. In our case, tables that were not redistributed by using DB2 HPU were exported and imported using SAP r3load. Overall, it took 25 hours to export and import 3.5 TB of data.

Redistributed scenarioFigure 3-18 illustrates our new environment at the completion of the redistribution steps. InfoCube fact tables, aggregates, ODSs, and PSAs were spread over the new 4 LPARs, and LPAR0 was totally dedicated to the DB2 coordinator partition (partition 0) and to store dimension, SAP NetWeaver BI temporary tables, and catalog tables.

The approach adopted in our redistribution scenario improved system availability. The period that required SAP NetWeaver BI to be stopped occurred during the r3load redistribution phase, because the massive amount of data was redistributed before by using DB2 HPU utility.

The initial partitions 1-5 were just omitted in the db2nodes.cfg file, so this generated a sequence of partitions like 0, 6, 7, 8, … 37. This was done merely to simplify the scenario and environment, and did not impact the testing at any level.

Figure 3-18 Final nodes distribution

xxxxxxxxxxxxxx

xxxxxxxxxxxxxx

xxxxxxx

xxxxxxx xxxxxxx

LPAR0 LPAR1 LPAR2 LPAR3 LPAR4

partition 06, 7, 8, 9, 10

11,12, 1314, 15, 16, 1718, 19, 20, 21

22, 23, 24, 2526, 27, 28, 29

30, 31, 32, 3334, 35, 36, 37

BW temptables

Catalogtables

DimensionTables

Infocubes

Aggregates

Operational Data Stores

PSAs


3.6 Balancing processing and disk usage

After redistributing data, a set of 100 InfoCubes was created to be used in the test. The task of balancing processing and disk usage requires a significant amount of time, especially in a large environment. Precautions must be taken and complexity grows exponentially. In our case, DB2 partition 0 was used to store SAP NetWeaver BI temporary tables, system catalog tables and dimension tables. The remaining 32 DB partitions were used to balance InfoCubes fact tables, aggregate tables, PSAs and ODSs.

A pattern was created to evenly balance processing and disk usage, as shown in Table 3-7.

Table 3-7 Pattern for evenly balancing disk usageODS balancing

The ODS tables were spread over 16 DB partitions. In a 32 DB partitions environment, this means that two distinct DB2 partition groups were created: one starting in partition 6, and the other one starting in partition 7. Two table spaces and two index spaces were created as follows:

� Data table spaces

– YMODSD01– YMODSD02

� Index table spaces

– YMODSID01– YMODSID02

Table 3-8 lists the DB2 partition group and nodes for these table spaces. The only concern was to use the same node group in the matching data table space and index table space.

Table 3-8 ODS distribution

In total, 8 operational data stores were created. Figure 3-19 on page 156 shows the data and index structure of the first 5 ODSs; the three remaining ones follow the same pattern.

Object Number of DB2 partitions

ODS 16

InfoCube aggregate table 8

InfoCube fact table 8

PSA for fact tables and ODSs 8

Note: The figures in this section use the term “DB node group”; the formal term is DB partition group.

Tablespace Nodegroup Partitions

YMODSD01 NG_YMO01 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36

YMODSI01 NG_YMO01 6, 8, 10, 12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34, 36

YMODSD02 NG_YMO02 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37

YMODSI02 NG_YMO02 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33, 35, 37

The task of balancing disk usage and processing requires time.


Figure 3-19 ODS data and index structure (ods_balancing.jpg)

The table name for active rows for ODS in SAP NetWeaver BI follows the naming pattern shown in Example 3-13.

Example 3-13 ODS naming pattern

Naming pattern:/BIC/A<ODS_NAME>00where: <ODS_NAME> is the ODS name for example, the active rows table for ODS ZGTFCO01 would be:/BIC/AZGTFCO0100

Next, we show detailed statistics of the balancing for each ODS table for active rows.

� Figure 3-20 on page 157 illustrates the balancing for ODS ZGTFCO01; the distribution starts on partition 6 and goes through all even nodes.

LPAR1 LPAR2 LPAR3 LPAR4

partitions 6, 7, 8, 9, 1011,12, 13

14, 15, 16, 1718, 19, 20, 21

22, 23, 24, 2526, 27, 28, 29

30, 31, 32, 3334, 35, 36, 37

Infocube:ZGTFC001partitions 6, 8, 10 , 12

partitions14, 16, 18, 20



Infocube:ZGTFC002partitions 7, 9, 11, 13








Infocube:ZGTFC002partitions 7, 9, 11, 13









Figure 3-20 Partition 1 balancing

� Figure 3-21 illustrates the balancing for ODS ZGTFCO02; the distribution starts on partition 7 and goes through all uneven nodes.

Figure 3-21 Partition 2 balancing

For all the remaining ODS tables for active rows, the same pattern existed: operational data stores that end with an uneven number, and operational data stores that end with an even number. For example, operational data stores 1, 3, 5, 7 are distributed on a specific partition group and operational data stores 2, 4, 6, 8 are distributed on another specific partition group.


3.6.1 InfoCube aggregate table balancing

As mentioned, 100 InfoCubes were created to be used in the tests. The design balancing pattern created for InfoCube aggregate tables was based on five different partition groups. We cover only the data table spaces here because all index table spaces were spread over the same partition groups.

Figure 3-22 illustrates the distribution of each InfoCube on a partition group that ranges over partition 6, 10, 14, 18, 22, 26, 30 and 34. In this partition group we spread 35 InfoCubes aggregate tables.

Figure 3-22 InfoCube aggregate balancing for partition group 1

Figure 3-23 on page 159 illustrates the distribution of each InfoCube on a partition group that ranges over partition 7, 11, 15, 19, 23, 27, 31 and 35. In this partition group we spread 20 InfoCubes aggregate tables.


Figure 3-23 InfoCube aggregate balancing for partition group 2

Figure 3-24 illustrates the distribution of each InfoCube on a partition group that ranges over partition 7, 11, 16, 20, 22, 26, 33 and 37. In this partition group we spread 20 InfoCubes aggregate tables.

Figure 3-24 Aggregate balancing for group partition 3

Figure 3-25 on page 160 illustrates the distribution of each InfoCube on a partition group that ranges over partition 8, 12, 16, 20, 24, 28, 32 and 36. In this partition group we spread 20 InfoCubes aggregate tables.


Figure 3-25 Aggregate balancing for partition group 4

Figure 3-26 (aggregates_ng5.jpg) illustrates the distribution of each InfoCube on a partition group that ranges from partition 9, 13, 17, 21, 25, 29, 33, and 37. In this partition group we spread 20 InfoCubes aggregate tables.

Figure 3-26 Aggregate balancing for partition group 5


3.6.2 InfoCube fact table balancing

The InfoCubes fact table balancing was spread linearly over 4 distinct partition groups. Because we had only 2 different InfoCubes, we chose to have 2 distinct InfoCubes types on each table space. In this structure, the division of 100 by 4 partition groups would result in an uneven number, which made us balance InfoCubes as follow:

� Nodegroup 1 - 26 InfoCubes




We only considered data table spaces, but all index table spaces were spread over the same node groups.

Figure 3-27 illustrates the distribution of each InfoCube on a node group that ranges from node 6, 10, 14, 18, 22, 26, 30 and 34. In this node group we spread 26 InfoCubes fact tables.

Figure 3-27 InfoCube fact table balancing (1/4)

Figure 3-28 on page 162 illustrates the distribution of each InfoCube on a node group that ranges from node 7, 11, 15, 19, 23, 27, 31 and 35. In this node group we spread 26 InfoCubes fact tables.



Figure 3-29 illustrates the distribution of each InfoCube on a partition group that ranges from partition 8, 12, 16, 20, 24, 28, 32 and 36. In this partition group we spread 24 InfoCubes fact tables.


Figure 3-30 on page 163 illustrates the distribution of each InfoCube on a partition group that ranges from partition 9, 13, 17, 21, 25, 29, 33 and 37. In this partition group we spread 24 InfoCubes fact tables.


Figure 3-30 InfoCube fact table balancing (4/4))

The InfoCube fact table balancing pattern is depicted in Figure 3-31. The first two InfoCubes are on the same partition group.

Figure 3-31 The InfoCube fact table balancing pattern

Ultimately, processing and disk usage balancing were achieved through the implementation of this design.

xxxxxxxxxxxxxx

xxxxxxxxxxxxxx

xxxxxxx

xxxxxxx xxxxxxx

LPAR0 LPAR1 LPAR2 LPAR3 LPAR4

Partition 06, 7, 8, 9, 10

11,12, 1314, 15, 16, 1718, 19, 20, 21

22, 23, 24, 2526, 27, 28, 29

30, 31, 32, 3334, 35, 36, 37

BW temptables

Catalogtables

DimensionTables

Infocube:ZGTFC001partitions: 6, 10

Fact Tablepartitions: 14, 18 partitions: 22, 26 partitions: 30, 34




Fact Tablepartitions: 15, 19 partitions: 23, 27 partitions 31, 35






One action that may have helped to decrease the design complexity would have been to create larger table spaces with a greater number of containers (even though this would result in minor deviations from the Balanced Configuration Unit (BCU) best practices, which is an optimized product for Business Intelligence environments).

3.7 The DB2 configuration

In this section, we describe the most important DB2 options and parameters used in our environment.

3.7.1 The DB2 instance

As a best practice and standard for SAP NetWeaver BI, a single database instance was created. The instance name was db2eb8, and it was split on several database partitions. The instance was created as 64 bits to allow better performance and memory utilization.

Instance levelThe instance created was based on the IBM DB2 UDB version and fixpak shown in Example 3-14.

Example 3-14 DB2 level for KPI-A

db2eb8@sys3db0p:/db2/db2eb8/ # db2levelDB21085I Instance "db2eb8" uses "64" bits and DB2 code release "SQL08025" withlevel identifier "03060106".Informational tokens are "DB2 v8.1.1.112", "s060429", "U807381", and FixPak"12".Product is installed at "/usr/opt/db2_08_01".

The initial database was spread over 1+5 nodes, with the first node being responsible for the catalog and the dimension tables and the 5 remaining nodes responsible for the data, including operational data stores, InfoCubes, and aggregation tables. In the database redistribution, a new set of 32 partitions was added to the scenario, resulting in the db2nodes.cfg entries listed in Example 3-15.

Notice that partition 0 is on the first LPAR (sys3db0), and that next partition is 6 on the second LPAR (sys3db1p). The “missing” DB partitions (2 to 5) were used to restore the first SAP NetWeaver BI image following its initial layout; as mentioned earlier, this is irrelevant and did not impact our KPI-A tests.

Example 3-15 Database distribution over the DB2 partitions

0 sys3db0p 06 sys3db1p 07 sys3db1p 18 sys3db1p 29 sys3db1p 310 sys3db1p 411 sys3db1p 512 sys3db1p 613 sys3db1p 714 sys3db2p 015 sys3db2p 116 sys3db2p 2


17 sys3db2p 318 sys3db2p 419 sys3db2p 520 sys3db2p 621 sys3db2p 722 sys3db3p 023 sys3db3p 124 sys3db3p 225 sys3db3p 326 sys3db3p 427 sys3db3p 528 sys3db3p 629 sys3db3p 730 sys3db4p 031 sys3db4p 132 sys3db4p 233 sys3db4p 334 sys3db4p 435 sys3db4p 536 sys3db4p 637 sys3db4p 7

A major reason for adopting this distribution was because partition 0 in IBM DB2 and SAP NetWeaver BI environments normally tends to be overloaded and become a bottleneck in the infrastructure because it has a coordinator partition role.

Instance settingsThe db2eb8 instance was configured following IBM and SAP best practices and recommendations. The instance parameters are illustrated in Figure 3-32 on page 166.


Figure 3-32 Database configuration parameters


In particular, note the following use of some of the parameters:

� SHEAPTHRES

Because SAP NetWeaver BI creates a significant amount of sort and hash joins, consider configuring it with a considerable number of pages. In our scenario, this parameter was set to 1,600,000 pages of 4 KB.

This parameter is tightly related to the database parameter SORTHEAP and the number of concurrent agents running against the database.

� RQRIOBLK

Row blocking reduces database manager overhead for cursors by retrieving a block of rows in a single operation. This parameter is also used to determine the I/O block size at the database client when a blocking cursor is opened. This memory for blocked cursors is allocated out of the application's private address space, so you should determine the optimal amount of private memory to allocate for each application program. If the database client cannot allocate space for a blocking cursor out of an application's private memory, a non-blocking cursor will be opened.

In our scenario, this parameter was set to the highest size (65,535 bytes).

� FCM_NUM_BUFFERS

This parameter specifies the number of 4 KB buffers that are used for internal communications (messages) both among and within database servers.

In our scenario, we designed the architecture with logical nodes on the same machine, which also requires a considerable increase in the value of this parameter (we set this parameter to 32,768 pages of 4 KB).

Registry and environment settingsOther parameters that influence database and instance performance are set in registry and environment levels which are called registry and environment parameters.

With DB2 Version 8.2, a new registry setting was implemented to optimize the settings for SAP environments. There are a number of internal and external features and changes designed for SAP workloads in DB2 V8.2.2. Many of these features are enabled via registry variables or configuration parameters. To make the configuration and maintenance of SAP on DB2 more transparent, the SAP tuning knob can be used in DB2 V8.2.2 to set the DB2 environment to be optimized for SAP.

This feature is implemented as a single registry variable, DB2_WORKLOAD, that you set to SAP. This feature alleviates the complexity of ensuring that all of the SAP features are enabled for your environment, and correctly sets the corresponding registry variables for that matter. Quite simply, just set DB2 on the SAP setting.

You can use the db2set -gd DB2_WORKLOAD=SAP command to list the default settings for this aggregate registry variable, as shown in Example 3-16.

Example 3-16 Registry and environment parameters

DB2_USE_FAST_PREALLOCATION=ONDB2_WORKLOAD=SAPDB2_TRUNCATE_REUSESTORAGE=IMPORT [DB2_WORKLOAD]DB2_MDC_ROLLOUT=YES [DB2_WORKLOAD]DB2_SKIPINSERTED=YES [DB2_WORKLOAD]DB2_VIEW_REOPT_VALUES=YES [DB2_WORKLOAD]DB2_OBJECT_TABLE_ENTRIES=65532 [DB2_WORKLOAD]DB2_OPTPROFILE=YES [DB2_WORKLOAD]


DB2_IMPLICIT_UNICODE=YES [DB2_WORKLOAD]DB2_FORCE_APP_ON_MAX_LOG=YESDB2_USE_LATCH_TRACKING=YESDB2_INLIST_TO_NLJN=YES [DB2_WORKLOAD]DB2_MINIMIZE_LISTPREFETCH=YES [DB2_WORKLOAD]DB2_UPDATE_PART_KEY=YES [DB2_WORKLOAD]DB2_REDUCED_OPTIMIZATION=4,INDEX,JOIN [DB2_WORKLOAD]DB2NOTIFYVERBOSE=YES [DB2_WORKLOAD]DB2_INTERESTING_KEYS=YES [DB2_WORKLOAD]DB2_EVALUNCOMMITTED=YES_DEFERISCANFETCH [DB2_WORKLOAD]DB2_VENDOR_INI=/db2/EB8/dbs/tsm_config/vendor.envDB2_APM_PERFORMANCE=1,2,4,5,6,7,8,9DB2_ANTIJOIN=EXTEND [DB2_WORKLOAD]DB2_STRIPED_CONTAINERS=ONDB2_CORRELATED_PREDICATES=SDB2ATLD_PORTS=6000:6500DB2_HASH_JOIN=YESDB2MEMMAXFREE=2000000 [O]DB2MEMDISCLAIM=YESDB2ENVLIST=INSTHOME SAPSYSTEMNAME dbs_db6_schema DIR_LIBRARY LIBPATHDB2_RR_TO_RS=YES [DB2_WORKLOAD]DB2_BLOCK_ON_LOG_DISK_FULL=ONDB2_FORCE_FCM_BP=YES [O]DB2DBDFT=EB8DB2COMM=TCPIP [O]DB2CODEPAGE=1208DB2_PARALLEL_IO=*

You can check the registry and environment parameters by issuing the SQL query, as shown in Example 3-17.

Example 3-17 SQL query to check the registry and environment parameters

SELECT SUBSTR(REG_VAR_NAME, 1, 25) AS "REG_VAR_NAME", SUBSTR(REG_VAR_VALUE, 1, 15) AS "REG_VAR_VALUE", IS_AGGREGATE, SUBSTR(AGGREGATE_NAME, 1, 15) AS "AGGREGATE_NAME", LEVEL FROM TABLE(SYSPROC.REG_LIST_VARIABLES()) AS REGISTRYINFO

In our scenario, the execution of this query returned the data shown in Figure 3-33 on page 169, where:

� LEVEL indicates the level at which the DB2 registry variable acquires its value. The possible values are:

– I for instance– G for global– N for database partition– E for the environment

� AGGREGATE is the name of the aggregate in the DB2 registry variable; its value is obtained from a configured aggregate. If the registry variable is not being set through an aggregate, or if it is set through an aggregate but has been overridden, the value of AGGREGATE_NAME is NULL.


Figure 3-33 Registry and environment parameters set up

For further information about the REG_LIST_VARIABLES table function, visit the DB2 Information Center.

It is also a best practice to unset all environment and registry parameters before setting up DB2_WORKLOAD=SAP. All parameters that are set with DB2_WORKLOAD were configured by tuning the DB2_WORKLOAD=SAP.

Some parameters were configured based on our specific environment, as explained here:

� DB2_USE_FAST_PREALLOCATION=ON

DB2 UDB began using J2_METAMAP space pre-allocation in 64-bit DB2 UDB Version 8.1 FixPak 7 for DMS files on JFS2 filesystems. It has changed the behavior regarding the use of J2_METAMAP space preallocation in Version 8.1 FixPak 9 due to issues which have been encountered.

By default, J2_METAMAP space preallocation is not utilized as of Version 8.1 FixPak 9. As a result, space allocation using the DB2 ALTER or CREATE TABLESPACE commands may take a longer time on 64-bit DB2 UDB Version 8.1 FixPaks 9 and above as compared with FixPaks 7 and 8.

However, the faster space allocation method may still be enabled via the registry variable DB2_USE_FAST_PREALLOCATION by using the command db2set DB2_USE_FAST_PREALLOCATION=YES.


Note that space which was allocated while running 64-bit DB2 UDB Version 8.1 FixPaks 7 and 8 may still be vulnerable to problems due to the use of the JFS2 preallocation feature.

� DB2_FORCE_APP_ON_MAX_LOG=YES

The MAX_LOG database configuration parameter controls the percentage of log space that a unique application can use.

If the environment variable DB2_FORCE_APP_ON_MAX_LOG is set to TRUE, the application that exceeds the percentage configured on MAX_LOG is forced off the database and the unit of work is rolled back.

If this parameter is set to FALSE, the current statement fails. The application can still commit the work completed by the previous statements in the unit of work, or it can roll back the work completed to undo the unit of work.

� DB2_USE_LATCH_TRACKING=YES

DB2_USE_LATCH_TRACKING is set to YES to trap files that contain a list of the latches held or being waited for by the process. To check this value, you can use the following script:

DB2_VENDOR_INI=/db2/EB8/dbs/tsm_config/vendor.env

where DB2_VENDOR_INI points to a file containing all vendor-specific environment settings. The value is read when the database manager starts. In our case, this value was set to run the Tivoli Storage Manager environment settings. The settings stored on this file are shown in Example 3-18.

Example 3-18 DB2-USE-LATCH_TRACKING setting

db2eb8@sys3db0p:/db2/db2eb8/fabio/ # cat /db2/EB8/dbs/tsm_config/vendor.envXINT_PROFILE=/db2/EB8/dbs/tsm_config/initEB8.utlTDP_DIR=/db2/EB8/dbs/tsm_config/tdplogBACKOM_LOCATION=/usr/tivoli/tsm/tdp_r3/db264/backom

� DB2_APM_PERFORMANCE=1,2,4,5,6,7,8,9

This variable was set for latch contention issues.

� DB2_STRIPED_CONTAINERS=ON

By default, DB2 UDB uses the first extent of each DMS container (file or device) to store a container tag. The container tag is DB2 metadata for the container.

In earlier versions of DB2 UDB, the first page was used for the container tag, instead of the first extent, and as a result less space in the container was used to store the tag. (In earlier versions of DB2 UDB, the DB2_STRIPED_CONTAINERS registry variable was used to create table spaces with an extent sized tag. However, because this is now the default behavior, this registry variable no longer has any affect.)

� DB2_CORRELATED_PREDICATES=ON

The default for this variable is ON. When there are unique indexes on correlated columns in a join, and this registry variable is ON, the optimizer attempts to detect and to compensate for correlation of join predicates.

When this registry variable is ON, the optimizer uses the KEYCARD information of unique index statistics to detect cases of correlation, and dynamically adjusts the combined selectivity of the correlated predicates, thus obtaining a more accurate estimate of the join size and cost.


� DB2ATLD_PORTS=6000:6500

The DB2ATLD_PORTS registry variable will replace the value of the PORT_RANGE load configuration option. The default range is from 6000 to 6063. For the DB2ATLD_PORTS registry variable, the range should be provided in the following format:

<lower-port-number>:<higher-port-number>

� DB2_HASH_JOIN=YES

DB2_HASH_JOIN specifies hash-join as a possible join method when compiling an access plan. The DB2_HASH_JOIN registry variable should be used, but it needs to be tuned to get the best performance.

Hash-join performance is best if you can avoid hash loops and overflow to disk. To tune hash-join performance, estimate the maximum amount of memory available for the sheapthres configuration parameter, and then tune the sortheap configuration parameter. Increase its value until you avoid as many hash loops and disk overflows as possible, but do not reach the limit specified by the sheapthres configuration parameter.

� DB2MEMMAXFREE=2000000 [O]

DB2MEMMAXFREE specifies the maximum number of bytes of unused private memory that is retained by DB2 processes before unused memory is returned to the operating system.

� DB2MEMDISCLAIM=YES

On AIX, memory used by DB2 processes may have some associated paging space. This paging space may remain reserved even when the associated memory has been freed and it depends on the AIX system's (tunable) virtual memory management allocation policy.

The DB2MEMDISCLAIM registry variable controls whether DB2 agents explicitly request AIX to disassociate the reserved paging space from the freed memory.

A DB2MEMDISCLAIM setting of YES results in smaller paging space requirements, and possibly less disk activity from paging.

A DB2MEMDISCLAIM setting of NO will result in larger paging space requirements, and possibly more disk activity from paging. In some situations (for example, if paging space is plentiful and real memory is so plentiful that paging never occurs), a setting of NO provides a minor performance improvement.

� DB2ENVLIST=INSTHOME SAPSYSTEMNAME dbs_db6_schema DIR_LIBRARY LIBPATH

This variable lists specific variable names for either stored procedures or user-defined functions. By default, the db2start command filters out all user environment variables except those prefixed with DB2 or db2.

If specific environment variables must be passed to either stored procedures or user-defined functions, you can list the variable names in the DB2ENVLIST environment variable. Separate each variable name by one or more spaces.

� DB2_BLOCK_ON_LOG_DISK_FULL=ON

DB2_BLOCK_ON_LOG_DISK_FULL is a registry variable that you can set to prevent “disk full” errors from being generated when DB2 cannot create a new log file in the active log path.

DB2 attempts to create the log file every five minutes and writes a message to the db2diag.log file after each attempt.


� DB2_FORCE_FCM_BP=YES [O]

This registry variable is applicable to DB2 UDB ESE for AIX with multiple logical partitions. When DB2START is issued, DB2 allocates the FCM buffers either from the database global memory or from a separate shared memory segment, if there is not enough global memory available. These buffers are used by all FCM daemons for that instance on the same physical machine.

The kind of memory allocated is largely dependent on the number of FCM buffers to be created, as specified by the fcm_num_buffers database manager configuration parameter.

If the DB2_FORCE_FCM_BP variable is set to YES, the FCM buffers are always created in a separate memory segment so that communication between FCM daemons of different logical partitions on the same physical node occurs through shared memory.

Otherwise, FCM daemons on the same node communicate through UNIX Sockets. Communicating through shared memory is faster, but there is one fewer shared memory segment available for other uses, particularly for database buffer pools. Enabling the DB2_FORCE_FCM_BP registry variable reduces the maximum size of database buffer pools.

� DB2DBDFT=EB8

DB2DBDFT=EB8 specifies the database alias name of the database to be used for implicit connects. If an application has no database connection but SQL statements are issued, then an implicit connect will be made if the DB2DBDFT environment variable has been defined with a default database.

� DB2COMM=TCPIP [O]

The DB2COMM registry variable allows you to set communication protocols for the current DB2 instance. If the DB2COMM registry variable is undefined or set to null, no protocol connection managers are started when the database manager is started.

� DB2CODEPAGE=1208

DB2CODEPAGE=1208 specifies the code page of the data presented to DB2 for database client application. The user should not set DB2CODEPAGE unless explicitly stated in DB2 documents, or asked to do so by DB2 service. Setting DB2CODEPAGE to a value not supported by the operating system can produce unexpected results.

Normally, you do not need to set DB2CODEPAGE because DB2 automatically derives the code page information from the operating system.

� DB2_PARALLEL_IO=*

The DB2_PARALLEL_IO=*registry variable is used to change the way DB2 UDB calculates the I/O parallelism of a table space. When I/O parallelism is enabled (either implicitly, by the use of multiple containers or explicitly, by setting DB2_PARALLEL_IO), it is achieved by issuing the correct number of prefetch requests. Each prefetch request is a request for an extent of pages.

If this registry variable is not set, the degree of parallelism of any table space is the number of containers of the table space. For example, if DB2_PARALLEL_IO is set to null and a table space has four containers, there will be four extent-sized prefetch requests issued.

If this registry variable is set, the degree of parallelism of the table space is the ratio between the prefetch size and the extent size of this table space. For example, if DB2_PARALLEL_IO is set for a table space that has a prefetch size of 160 and an extent size of 32 pages, there will be five extent-sized prefetch requests issued. A wildcard character can be used to tell DB2 UDB to calculate the I/O parallelism for all table spaces this way.


In I/O subsystems that support striping the physical spindles beneath each DB2 UDB container (for example, with a RAID device), the number of disks underneath each DB2 UDB container should be taken into account when choosing a prefetch size for the table space. If the prefetch size of the table space is AUTOMATIC, DB2 UDB automatically calculates the prefetch size of a table space using the following equation:

Prefetch size = (number of containers)*(number of disks per container)*extent size

The number is then used by DB2 UDB to fill in the number of disks per container in the equation.

If only an asterisk is used and a number is not specified, a default of 6 disks per container is used.

The DB2_PARALLEL_IO registry variable can be used to tell DB2 UDB the number of disks per container. For example, if DB2_PARALLEL_IO="1:4" and table space 1 has three containers, the extent size 32, and prefetch size AUTOMATIC, then the prefetch size is calculated as 3 * 4 * 32 = 384 pages. The I/O parallelism of this table space is 384 divided by 32 = 12. If the prefetch size of a table space is not AUTOMATIC, this information about the number of disks per container is not used.

Any table space that is specified under the DB2_PARALLEL_IO variable is assumed to be using six as the number of disks per container. For example, if DB2_PARALLEL_IO=*,1:3, all table spaces will use 6 as the number of disks per container, except for table space 1, which will use 3. Values other than 6 can be specified in the registry variable.



Chapter 4. The storage physical environment

This chapter describes the physical storage implementation that we used in the project. It also explains the project from the perspective of a storage expert. The following topics are covered:

� The storage architecture from the test environment

� How the disks were configured

� The Storage Area Network components.

� Details of the two main features used for our tests

� Information about the storage capacity and the options chosen

4


4.1 Storage design description

The objective of this project was to implement a DB2 multi-partition shared nothing architecture, meaning that ideally, each DB2 partition would get dedicated CPUs, RAM and disks. This configuration:

� Offers a nearly linear scalable architecture with the number of partitions� Increases DB2 parallelism (and, therefore, performance)� Provides a flexible architecture that makes it possible to add or move DB2 partitions as

needed

For the 20 TB configuration, we used 33 DB2 partitions with a usable capacity of 27.6 TB for production and a usable capacity of 24 TB for the backup. The total capacity was then 51.6 TB.

The DS8000 disk space was divided in data, indexes, DB2 logger, and temporary tablespaces.

� 600 GB was allocated for the base tables, master tables, and dimension tables.

� The ratio between the data-indexes, the DB2 logger, and the temporary tablespaces was as follows:

– DB2 logger: 3.25 TB – DB2 temporary tablespace: 1.75 TB – DB2 data and index: 20 TB

� For each DB2 partition, the following amounts were allocated:

– Partition 0

• DB2 logger: 400 GB• DB2 temporary tablespace: 400 GB • DB2 data and index: 1200 GB

– Partition 6-37 (each)

• DB2 logger: 100 GB • DB2 temporary tablespace: 100 GB• DB2 data and index: 600 GB

All the DB2 assumptions we used for sizing the storage are summarized on Table 4-1.

� The first row lists the minimum capacities as defined in the initial requirement for each component.

� The second row lists the real capacity formatted in the DS8000 after a complete layout study, including an optimum size and number of LUNs, had been done (for example, it includes some extra space needed for the LUN definition and assignment).

Table 4-1 Storage sizing (in TB)

Data and Index TS

TS Logger Total

DB2 partition 0Minimum required

0.60 0.36 0.25 1.21

Allocated 1.20 0.40 0.40 2

DB2 partition 6-37 (each)

Minimum required

0.606 0.043 0.093 0.742

Allocated 0.6 0.1 0.1 0.8

27.6 TB for production data and 24 TB for backup data using 512 disk drives.


We used one DS8300 model from the DS8000 series in our environment1; it was a frame storage subsystem with 80 arrays, for a maximum of 640 disks. Each disk drive module had a capacity of 146 GB, a rotation speed of 15 krpm, and provided a full usable capacity of 54 TB (54,016 GB). Actually only 512 disks for a capacity of 27.6 TB of production data were used in our environment.

Figure 4-1 depicts a DS8000 unit.

Figure 4-1 DS8000 front view

Figure 4-2 on page 178 illustrates that each AIX LPAR had its own set of source LUNs for the production database and target LUNs for the backup. A link between the source LUNs and target LUNs, called the FlashCopy relationship, was created. We used incremental mode.

DB2 total Minimum required

20 1.75 3.25 25

Allocated 20,4 3.6 3.6 27.6

1 To learn more about DS8000 architecture, refer to IBM Redbook IBM System Storage Solutions Handbook, SG24-5250.

Data and Index TS

TS Logger Total

Up to 640 Disks

I/O DrawersBatteries

Power Supplies

p5 (POWER5) Servers

HMC

Chapter 4. The storage physical environment 177

Figure 4-2 Storage mapping between DB2 servers and DS8000

As shown in Figure 4-3 on page 179, the DS8300 is mainly composed of:

� Two controllers (POWER5 technology)

� Fibre Channel Host Adapters (HA) to connect to the servers. 16 FC ports were used in our environment.

� Eight devices adapters (DA) to connect to the switched back-end.

The storage is designed to balance the resources between the controllers; half of the logical volumes (LUNs) are managed by controller/server 0, and the other half is managed by controller/server 1.

Source Target

Source

Source Source

Source

TargetTarget

Target Target

DB2 Partition0

DB2 Partition 6.13

DB2 Partition 14.21

DB2 Partition 22.29

DB2 Partition 30.37

DS8300 p595

49.6 / 54 T

db2 6-13db2 14-21db2 22-29db2 30-37


Figure 4-3 The DS8000 main components

Figure 4-4 on page 180 illustrates the configuration with a total of 64 arrays and 8 device adapters (DA):

� 32 arrays with 6 hard disk drives (HDD) for data, 1 HDD for parity, and 1 HDD for spare

� 32 arrays with 7 hard disk drives (HDD) for data and 1 HDD for parity

Every DA is connected to a group of 8 arrays (4 of each type). These arrays shown in greater detail in Figure 4-5 on page 181.

RIO-2 Interconnect

RIO-2 Interconnect

P5

Memory DIMMs

Memory DIMMs

L3 Cache

P5

Memory DIMMs

Memory DIMMs

L3 Cache

P5

Memory DIMMs

Memory DIMMs

L3 Cache

P5

Memory DIMMs

Memory DIMMs

L3 Cache

...

...

RIO-GNon Arbitrated, Spatial reuse (use all links)Cache Coherent

Host Adapter

FICON, ESCON, Fiber Channel links 4 GB

Host AdapterA Fiber Channel host port

4 way P5+ 570 Server 4 way P5+ 570 Server

Back-endXCR partial remain in the adapter, No cache bandwidth consumedSwitched-Fiber: two concurrent ops per loop.

POWER5Near linear SMP scalingSimultaneous Multi ThreadingLarge L1, L2 and L3 cachesL3 cache directory


Figure 4-4 Array implementation

The next section explains how the data was distributed on the arrays to optimize I/O parallelism.

4.2 The storage and AIX file systems layout

Here we describe the various options available for the storage layout for the SAP NetWeaver BI database.

� For the DB2 production LUNs, there were two options:

– Spread all the DB2 partitions on all the arrays by using small LUNs, and have one LUN for each DB2 partition in each array

– Or, dedicate a group of arrays for each DB2 partition by using large LUNs in the group of arrays

We chose the second option. With a dedicated group of arrays for each DB2 partition and a small number of LUNs on an array, potential I/O contention is reduced on this array and administrative task performance may be improved.

� For the FlashCopy LUNs, there were two options:

– Dedicate the arrays.

– Or, share the same arrays for the production and for the FlashCopy LUNs.

We chose the second option. By sharing the same arrays, we optimized the production workload by providing more physical drives.

� For data/index, temporary files and log production LUNs, there were two options:

– Keep it separate, on different physical arrays

– Or, dedicate an array for each type: data, temp, log

DA 6

DA 4

DA 7

DA 5

DA 3

DA 1

DA 2

DA 0

EXP 1 EXP 2

6PS A16 6PS A17

6PS A18 6PS A19

7P A20 7P A21

7P A22 7P A23

6PS A24 6PS A25

6PS A26 6PS A27

7P A28 7P A29

7P A30 7P A31

6PS A32 6PS A33

6PS A34 6PS A35

7P A36 7P A37

7P A38 7P A39

6PS A40 6PS A41

6PS A42 6PS A43

7P A44 7P A45

7P A46 7P A47

6PS A48 6PS A49

6PS A50 6PS A51

7P A52 7P A53

7P A54 7P A55

6PS A56 6PS A57

6PS A58 6PS A59

7P A60 7P A61

7P A62 7P A63

DA 2

DA 0

6PS A86PS A10

6PS A9

6PS A11

7P A12 7P A13

7P A14 7P A15

6PS A0

6PS A2

6PS A1

6PS A3

7P A4 7P A5

7P A77P A6

A

A

The main options and choices that engage the storage architecture.


We chose the second option. When sharing the same group of arrays for the data/index, temp and log LUNs, the number of disks is minimized and the share nothing architecture between the DB2 partitions is maintained.

To summarize, and as illustrated in Figure 4-5, we had four arrays/ranks for each DB2 partition with the following rules:

� No arrays shared between DB2 partitions 1 to 16.

� No arrays shared between DB2 partitions 17 to 32.

� The same array is shared between DB2 partitions 1 and 17, 2 and 18, n and n+16, until 16 and 32.

� The same array is shared between DB2 partition 0 and partitions 1, 17, 5, 21, 3, 19, 7, 23.

Figure 4-5 Summary - the implementation in our environment

The following concepts were followed in the storage layout:

� The data/index, temporary and log LUNs share the same group of arrays.

� Each DB2 partition was mapped with four arrays with six HDD for data and one for parity, and four arrays with seven HDD for data and one for parity. This configuration comprised the best use of disks and internal servers. Having a minimal number of LUNs produces better management in production and better performance for backup/restore, disaster recovery, and FlashCopy.

� Only two sizes for the LUNs (25 GB and 150 GB) were used.

This implementation provided us with an easy way to monitor, predict, and understand the performance issues, as well as an easy way to migrate DB2 partitions from one AIX LPAR to another, and from one DS8300 to another DS83000.

Each DB2 partition (6 to 37) had four file systems for the data/index tablespaces; four file systems for the temporary tablespaces; and one file system for the logs. A DB2 container was defined in each file system. DB2 was spread across those four containers, so over the four file systems, the content of each table used a block size of 32 KB. Each group of four file systems was hosted on four LUNS in the DS8000.

DA 2

DA 0

DA 6

DA 4

DA 7

DA 5

DA 3

DA 1

DA 2

DA 0

EXP 1 EXP 2

DB2 p20-21, p36-37Data-index, Temp,

Log Prod


Log Prod


Log Prod


Log Prod


Log Prod


Log Prod


Log Prod


Log Prod


Figure 4-6 illustrates this implementation for DB2 partition 6.

Figure 4-6 Details of the arrays implementation for the DB2 partition 6

Two options were possible:

� Use one LUN for one file system, as illustrated in Figure 4-7.

Figure 4-7 Array implementation option: one LUN per one file system

DB2 Partition6 LUN_D_1Data/Index 150 G

DS8000

DB2 Partition6 LUN_T_1Temp 25 G

DB2 Partition6 LUN_L_1Log 25 G

DB2 Partition6 LUN2Data/Index 150 G









ARRAY6+P

146G

ARRAY7+P

146GLUN

DATALUN

TEMPLUNLOG

LPAR 1

DB2 Partition6

SAN

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1F11

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1F11

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3F31

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3F31

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2F21

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2F21

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4F41

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4F41

Array 1 Array 2

Array 3 Array 4

D5D1 D5D1

D6D2 D6D2

D7D3 D7D3

D8D4 D8D4

Db2 containers, block = 32 k

VG SAP_DATA, PP = 64 MB

db2, AIX , DS8000 design With db2 and without LVM PP spreading

/fs_sapdata1

/fs_sapdata2

/fs_sapdata3

/fs_sapdata441288 GB

4161 GB

48512 MB

44256 MB

44128 MB

4464 MB

ArraysPPDb2_table

41288 GB

4161 GB

48512 MB

44256 MB

44128 MB

4464 MB

ArraysPPDb2_table


� Or, use the four LUNs grouped together using AIX LVM spreading (max policy on) for the four file systems, as illustrated in Figure 4-8.

Figure 4-8 Array implementation option: LUNs grouped together

We chose the second option for manageability and scalability reasons. Using the first option may improve performance, but it was not tested. We expect that because of the large number of tables (several hundreds) and the random distribution of those tables in the file systems, any difference should not be very significant.

The following file systems were defined for the DB2 partitions 6 to 37:

� Four file systems per DB2 partition for data and index tablespace (sapdata1 to sapdata4) mapped to eight LUNs

– DB2/EB8/sapdata1/NODE000X– DB2/EB8/sapdata2/NODE000X– DB2/EB8/sapdata3/NODE000X– DB2/EB8/sapdata4/NODE000X

� Four file systems per DB2 partition for the temporary tablespace (sapdatat1 to sapdatat4) mapped to eight LUNs

– DB2/EB8/sapdatat1/NODE000X– DB2/EB8/sapdatat2/NODE000X– DB2/EB8/sapdatat3/NODE000X– DB2/EB8/sapdatat4/NODE000X

� One file system per DB2 partition for logger (log_dir) mapped to eight LUNs

– DB2/EB8/log_dir/NODE000X

To summarize, as illustrated in Figure 4-9 on page 184 for LPAR 1, for each DB2 partition (6 to 37), nine file systems with a total of 24 LUNs, with a total of 313 file systems (25 + 32 X 9 = 313) and a maximum of 72 file systems per System p (8 X 9 = 72) have been defined.

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1F11

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1FS1

FS1FS1FS1FS1FS1F11

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3F31

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3FS3

FS3FS3FS3FS3FS3F31

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2F21

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2FS2

FS2FS2FS2FS2FS2F21

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4F41

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4FS4

FS4FS4FS4FS4FS4F41

Array 1 Array 2

Array 3 Array 4

D5D1 D5D1

D6D2 D6D2

D7D3 D7D3

D8D4 D8D4

DB2 containers, block = 32 kB

VG SAP_DATA, PP = 64 MB

DB2, AIX, DS8000 design With DB2 and without LVM PP spreading

/fs_sapdata1

/fs_sapdata2

/fs_sapdata3

/fs_sapdata441288 GB

4161 GB

48512 MB

44256 MB

44128 MB

4464 MB

ArraysPPDB2_table

41288 GB

4161 GB

48512 MB

44256 MB

44128 MB

4464 MB

ArraysPPDB2_table


Figure 4-9 Array and file system distribution for DB2 partitions 6 to 37 (LPAR1 as an example)

The following files were defined for DB2 partition 0.

� 12 file systems for DB2 partition 0 for data and index tablespace (sapdata1 to sapdata12) mapped to 12 LUNS

– DB2/EB8/sapdata1/NODE0000– DB2/EB8/sapdata2/NODE0000– …….– DB2/EB8/sapdata11/NODE0000– DB2/EB8/sapdata12/NODE0000

� 12 file systems for DB2 partition 0 for Temporary tablespace (sapdatat1 to sapdatat12) mapped to 24 LUNS

– DB2/EB8/sapdatat1/NODE0000– DB2/EB8/sapdatat2/NODE0000– …….– DB2/EB8/sapdatat11/NODE0000– DB2/EB8/sapdatat12/NODE0000

� One file system for DB2 partition 0 for logger (log_dir) mapped to 24 LUNs

– DB2/EB8/log_dir/NODE0000

To summarize, as shown in Figure 4-10 on page 185, for DB2 partition 0, 25 files systems were defined, with a total of 60 LUNs.

LPAR node number space name Disk Arrays

# LUN(s) / disk array

total # of LUNs

LUN size (GB)

total size (GB) filesystem name

#LUN / filesystem

LPAR1 db2/EB8/sapdata1/NODE0006

db2/EB8/sapdata2/NODE0006



db2/EB8/sapdatat1/NODE0006




logger A0,A1,A4,A5 1 4 25 100 db2/EB8/log_dir/NODE0006 4

4 25 100 4

4 150 600 4

6

Data/index Tablespace A0,A1,A4,A5 1

temp tablespace A0,A1,A4,A5 1


Figure 4-10 Array and file system distribution for DB2 partition 0

The difference between DB2 partition 0 and the other partitions regarding the number of file systems is purely the result of our successive tests. The first system had 6 DB2 partitions with 12 file systems each. When we extended the DB2 architecture from 6 to 33 DB2 partitions, the possible number of file systems was between 1 and 12; we chose to use 4 as a compromise between manageability, performance and the number of arrays reserved for each DB2 partition.

4.3 The SAN design

Zoning at the SAN level and LUN masking at the DS8300 level were defined to have a maximum of four paths for every LUN, as explained here:

� A zone would include all ports of a DS8300 and the p5-595 LPAR connected to it.

� Each group of LUNs in a DS8300 would belong to a DS8000 Volume Group with four hostconnects having a maximum of one I/O port each, for a maximum of four paths.

� LPAR 0 (DB2 partition 0) would share one FC port in each group of four FCs that were used for LPAR1 to LPAR4.

Figure 4-11 on page 186 illustrates the SAN components set up in our environment.

LPAR node number space name Disk Arrays

# LUN(s) / disk array

total # of LUNs

LUN size (GB)

total size (GB) filesystem name

#LUN / filesystem

LPAR0 0

Data/index tablespace

A4,A13,A20,A29, A52,A61,A36,A45 1 8 150 1200 db2/EB8/sapdata1/NODE0000

Total for sapdata1 to sapda12

db2/EB8/sapdata2/NODE0000db2/EB8/sapdata3/NODE0000db2/EB8/sapdata4/NODE0000db2/EB8/sapdata5/NODE0000db2/EB8/sapdata6/NODE0000db2/EB8/sapdata7/NODE0000db2/EB8/sapdata8/NODE0000db2/EB8/sapdata9/NODE0000db2/EB8/sapdata10/NODE0000db2/EB8/sapdata11/NODE0000db2/EB8/sapdata12/NODE0000 8

0

Temp tablespace

A6,A15,A22,A31, A54,A63,A38,A47 2 16 25 400 db2/EB8/sapdatat1/NODE0000

Total for sapdatat1 to sapdat12

db2/EB8/sapdatat2/NODE0000db2/EB8/sapdatat3/NODE0000db2/EB8/sapdatat4/NODE0000db2/EB8/sapdatat5/NODE0000db2/EB8/sapdatat6/NODE0000db2/EB8/sapdatat7/NODE0000db2/EB8/sapdatat8/NODE0000db2/EB8/sapdatat9/NODE0000db2/EB8/sapdatat10/NODE0000db2/EB8/sapdatat11/NODE0000db2/EB8/sapdatat12/NODE0000 16

0 Logger

A6,A15,A22,A31, A54,A63,A38,A47, A0,A9,A16,A25, 1 16 25 400 db2/EB8/log_dir/NODE000 16

Note: Hostconnect, I/O port, and Volume Group are specific DS8000 definitions, and do not have to be interpreted as UNIX terminology. To learn more about DS8000 concepts and terminology, refer to IBM Redbook IBM System Storage DS8000 Series: Architecture and Implementation, SG24-6786.


Figure 4-11 The SAN fabric

4.4 The backup and FlashCopy design and implementation

Next, we describe the backup and FlashCopy processes and options we used for our tests. For more comprehensive information about the details of these functions, refer to IBM Redbook IBM TotalStorage DS8000 Series: Copy Services in Open Environments, SG24-6788.

4.4.1 Backup

Three types of backup are available: backup with the LAN, LAN-free backup, and server-free backup. They are illustrated in Figure 4-12 on page 187.

� Backup via the LAN

In a traditional LAN environment, the Tivoli Storage Manager backup and archive client or application reads data from locally attached disks. It then sends the data over the LAN to the Tivoli Storage Manager backup server.

The server receives the data and then writes it out to its storage pool, based on predefined policies and server configuration. Data is read and written by both the Tivoli Storage Manager client and Tivoli Storage Manager Server machines. In addition, control information is also sent over the LAN to the Tivoli Storage Manager server.

� LAN-free backup

SAN technology provides an alternative path for data movement between the Tivoli Storage Manager client and the server. Shared storage resources (disk, tape) are accessible to both the client and the server through the SAN. Data movement is offloaded from the LAN and from the server processor.

LAN-free backups decrease the load on the LAN by introducing a Storage Agent. The Storage Agent handles the communication with the Tivoli Storage Manager server over

16 FibreChannelsDS8000

LPAR 0 (DB2 PO)ARRAY A4,A13,A20,A29

A52,AA61,A36,A45A6,A15,A22,A31,A54

A63,A38,A47A0,A9,A16,A25

A48,A57,A32,A41

LPAR 1 (DB2 P6-13)ARRAY A0 TO A31




0

15

p595

LPAR 0 (DB2)partition 0





19

0

4 FC

4 FC

4 FC

4 FC

SANFabric

4 FC

11

1

1

20 Fibre Channels(4 per LPAR)


the LAN, but sends the data directly to SAN-attached tape devices, thus relieving the Tivoli Storage Manager server from performing the actual I/O transfer.

� Server-free backup

Server-free backup/restore capability is available in Tivoli Storage Manager Version 5. In a server-free backup environment, data is copied directly from the SAN-attached Tivoli Storage Manager client disk to the SAN-attached tape drive via the SAN Data Gateway data mover. The Storage Agent used in LAN-free backups is not used.

The data movement is performed by a SAN Data Gateway (SDG) or similar device on the SAN. Therefore, both Tivoli Storage Manager client and server machines do not have to read and write the data at all. Instead, the Tivoli Storage Manager server sends commands to the SDG device to tell it which blocks to move from which SAN-attached disk to which SAN-attached tape device. The data is actually copied rather than moved from one location to another. This provides a way to back up and restore large volumes of data between client-owned disks and storage devices by using a method that considerably reduces overhead on the Tivoli Storage Manager server and the client.

Only volume images, not individual files, can be moved by server-free data movement. The data is transferred block-by-block, rather than by doing file I/O. Both raw and Windows NT® file system (NTFS) volumes can be backed up using the server-free backup capability. Data that has been backed up using this technique can be restored over a server-free path, over a LAN-free path, or over the LAN itself.

The impact on application servers is minimized with this type of backup. It reduces both Tivoli Storage Manager client and server CPU utilization.

The data mover device can be anywhere in the SAN, but it has to be able to address the LUNs for both the disk and tape devices it is moving data between.

Figure 4-12 Types of backup with Tivoli Storage Manager

We chose to use server-free backup in our tests because it offers the benefit of not using the LAN and production DB2 server resources.

In our tests, we used server-free backup.

Copy

SANFabric

LAN

SANFabric

LAN

SANFabric

LAN

Client ClientClient

Copy

Metadata

TSM Server TSM Server TSM Server

Metadata

Copy

Backup via the LAN Backup "LAN-free" Backup "Server-free"


4.4.2 FlashCopy

FlashCopy creates a copy of a logical volume at a specific point in time, which we also refer to as a Point-in-Time Copy, instantaneous copy, or t0 copy (time-zero copy).

By doing a FlashCopy, a relationship is established between a source and a target. Both are considered to form a FlashCopy pair. Using FlashCopy, you can copy all physical blocks (full copy), or copy only those parts of blocks that are changing in the production data after the FlashCopy has been established (using the nocopy option).

The three main steps of a FlashCopy operation, as explained here, are: creating the FlashCopy relationship; reading from the source; writing to the target.

� Establishing the FlashCopy relationship

When the FlashCopy is started, the relationship between the source and the target are established within seconds. This is done by creating a pointer table including a bitmap for the target.

Let us assume all bits for the bitmap of the target are set to their initial values. This represents the fact that no data block has been copied so far. The data in the target will not be touched during the setup of the bitmaps. After the relationship has been established, it is possible to perform read and write I/Os on both the source and the target.

� Reading from the source

Data can be read immediately after the creation of the FlashCopy relationship.

� Writing to the source

Whenever data is written to the source volume while the FlashCopy relationship exists, the storage subsystem makes sure that the time-zero-data is copied to the target volume prior to overwriting it in the source volume.

Figure 4-13 on page 189 illustrates the FlashCopy process.

With a normal FlashCopy, a background process is started that copies all data from the source to the target. Incremental FlashCopy provides the capability to refresh a FlashCopy relationship. With incremental FlashCopy, the initial relationship between a source and a target volume is maintained.

In our tests, we used incremental FlashCopy with the copy option (full volume copy); during a refresh, the updates that took place on the source volume since the last FlashCopy are copied to the target volume. Also, the updates done on the target volume will be overwritten with the contents of the source volume.

We used Incremental FlashCopy and full volume copy in our tests.


Figure 4-13 The FlashCopy process

The target LUNs are used during the server-free backup with FlashCopy, as explained here:

� Only the data/index and temporary tablespace LUNs will be flashcopied.

� The target FlashCopy LUNs are spread on all the arrays of the DS8000 (there are no dedicated arrays assigned to the target FlashCopy LUNs).

� The target FlashCopy LUNs are on different arrays from the source FlashCopy LUNs, as shown in Figure 4-14.

Comparing Figure 4-5 on page 181 to Figure 4-14, notice the shift of the device adapter (DA). For example, note that DB2 partition 6 has moved from DA2 to DA0.

Figure 4-14 Target LUNs mapping with DB2 partitions

� The FlashCopy LUNs pairs are across device adapters, and on the same internal server (server/controller 0 or 1).

Source Target

Time

Write Read

Optional background copy

Copy data command issued

Copy immediately available

Read and write to both source and copy possible

When copy is complete, relationship between source and target ends


Log FC


Log FC


Log FC


Log FC


Log FC


Log FC

DB2 p18-19 p34-35Data-index, Temp,

Log FC

DB2 p16-17 p32-33Data-index, Temp,

Log FC

DA 2

DA 0

DA 6

DA 4

DA 7

DA 5

DA 3

DA 1

DA 2

DA 0

EXP 1 EXP 2


� All the FlashCopy relationships are started almost at the same time, so the FlashCopy completes very quickly and the source and target LUNs become available very quickly.

� A maximum of 16 background copies occur in parallel. This is due to the fact that each device adapter of the DS8300 allows 4 background copies (2 outbound and 2 inbound). In our implementation, we have 8 device adapters with all the relationships across the device adapters, so we have a total of 8 x 2 background copies.

The LUNs which are defined and copied or refreshed by FlashCopy are used by Tivoli Storage Manager Data Protection for FlashCopy, DB2, and SAP.

4.5 DS8300 internal addressing and total capacity consideration

Internal addressing is used to identify and attribute a number to all the LUNs within the DS8300. This address is used on the server side in the different LPARs to map the hdisk2 to the correct file system.

There are three levels of hierarchies:

� The first level (indicated by X in Figure 4-15 on page 191), which is also called address group, identifies the LPAR and the production versus the backup LUNs:

– For LPAR 0 (DB2 partition 0), all production LUN identification numbers start with zero (0) and the backup LUN identification numbers start with A.

– For LPAR 1 (DB2 partition 6 to13), all production LUN identification numbers start with 1 and the backup LUN identification numbers start with B.

– For LPAR 2 (DB2 partition 14 to 21), all production LUN identification numbers start with 2 and the backup LUN identification numbers start with C.

– For LPAR 3 (DB2 partition 22 to 30), all production LUN identification numbers start with 3 and the backup LUN identification numbers start with D.

– For LPAR 4 (DB2 partition 31 to 37), all production LUN identification numbers start with 4 and the backup LUN identification numbers start with E.

� The second level (indicated by Y in Figure 4-15 on page 191), which is also called Logical Subsystem (LSS), identifies the DB2 partition within a LPAR. It also assigns a LUN to controller/server 0 (if the number is even) or to controller/server 1 (if the number is odd). For example, DB2 partition 16 in the LPAR 2 has the addresses 24 and 25.

� The third level (indicated by A and B in Figure 4-15 on page 191) identifies (for all DB2 partitions except 0):

– Data/index 00, 01– Temp 02, 03– Log 04, 05

2 A hdisk is an AIX term representing a logical unit number (LUN) on an array.

Summary of number and type of components in our test.


Figure 4-15 LUN identification

Figure 4-16 is a summary of the total capacity and the distribution between the LPARs and the type of LUNs (production, Fiber Channel, logger, temporary, data and index).

Figure 4-16 Total LUNs layout

Production LUNs LSS # even LSS # odd FlashCopy LUNsLSS # even LSS # odd

LPAR0 DB2P0_D_T_L 0 1 FC_DB2P0_D_T_L A0 A1LPAR1 DB2P6_D_T_L 10 11 FC_DB2P6_D_T_L B0 B1 LUN number = X Y A B

DB2P7_D_T_L 12 13 FC_DB2P7_D_T_L B2 B3 X Y = LSS numberDB2P8_D_T_L 14 15 FC_DB2P8_D_T_L B4 B5 A B see below DB2P9_D_T_L 16 17 FC_DB2P9_D_T_L B6 B7DB2P10_D_T_L 18 19 FC_DB2P10_D_T_L B8 B9 D for Data/Index tablespaceDB2P11_D_T_L 1A 1B FC_DB2P11_D_T_L BA BBDB2P12_D_T_L 1C 1D FC_DB2P12_D_T_L BC BD Partition 0DB2P13_D_T_L 1E 1F FC_DB2P13_D_T_L BE BF A B = 00,01,02,03

LPAR2 DB2P14_D_T_L 20 21 FC_DB2P14_D_T_L C0 C1DB2P15_D_T_L 22 23 FC_DB2P15_D_T_L C2 C3 Partition 6-37DB2P16_D_T_L 24 25 FC_DB2P16_D_T_L C4 C5 A B = 00,01DB2P17_D_T_L 26 27 FC_DB2P17_D_T_L C6 C7DB2P18_D_T_L 28 29 FC_DB2P18_D_T_L C8 C9DB2P19_D_T_L 2A 2B FC_DB2P19_D_T_L CA CB T for Temporary tablespaceDB2P20_D_T_L 2C 2D FC_DB2P20_D_T_L CC CDDB2P21_D_T_L 2E 2F FC_DB2P21_D_T_L CE CF Partition 0

LPAR3 DB2P22_D_T_L 30 31 FC_DB2P22_D_T_L D0 D1 A B = 10,11,12,13,14,15,16,17DB2P23_D_T_L 32 33 FC_DB2P23_D_T_L D2 DDDB2P24_D_T_L 34 35 FC_DB2P24_D_T_L D4 D5 Partition 6-37DB2P25_D_T_L 36 37 FC_DB2P25_D_T_L D6 D7 A B = 02,03DB2P26_D_T_L 38 39 FC_DB2P26_D_T_L D8 D9DB2P27_D_T_L 3A 3B FC_DB2P27_D_T_L DA DBDB2P28_D_T_L 3C 3D FC_DB2P28_D_T_L DC DD L for LogsDB2P29_D_T_L 3E 3F FC_DB2P29_D_T_L DE DF

LPAR4 DB2P30_D_T_L 40 41 FC_DB2P30_D_T_L E0 E1 Partition 0DB2P31_D_T_L 42 43 FC_DB2P31_D_T_L E2 E3 A B = 20,21,22,23,24,25,26,27DB2P32_D_T_L 44 45 FC_DB2P32_D_T_L E4 E5DB2P33_D_T_L 46 47 FC_DB2P33_D_T_L E6 E7 Partition 6-37DB2P34_D_T_L 48 49 FC_DB2P34_D_T_L E8 E9 A B = 04,05DB2P35_D_T_L 4A 4B FC_DB2P35_D_T_L EA EBDB2P36_D_T_L 4C 4D FC_DB2P36_D_T_L EC ED

X

Y

AB

Partition 0-5: LPAR0# LUNs Prod size (GB)

Total prod

# LUNs Flash Copy

DB2 Logger: 16 25 400 GB DB2 Temporary Tablespace: 16 25 400 GB 16DB2 Data and index: 8 150 1200 GB 8Partition 6-13 (total) LPAR1 : DB2 Logger: 32 25 800 GB DB2 Temporary Tablespace: 32 25 800 GB 32DB2 Data and index: 32 150 4800 GB 32Partition 14-21(tota l) LPAR2 : DB2 Logger: 32 25 800 GB DB2 Temporary Tablespace: 32 25 800 GB 32DB2 Data and index: 32 150 4800 GB 32Partition 22-29 (tota l) LPAR3 : DB2 Logger: 32 25 800 GB DB2 Temporary Tablespace: 32 25 800 GB 32DB2 Data and index: 32 150 4800 GB 32Partition 30-37 (tota l) LPAR4 : DB2 Logger: 32 25 800 GB DB2 Temporary Tablespace: 32 25 800 GB 32DB2 Data and index: 32 150 4800 GB 32



Chapter 5. Using IBM Tivoli Storage Manager to manage the storage environment

This chapter describes the storage management processes used in the project, addressed from a storage manager expert perspective. The following topics are covered:

� Management products and backup techniques, particularly in the SAP context

� The test environment, as well as our conclusions prior to undertaking the formal testing

5


5.1 Introducing IBM Tivoli Storage Management

IBM Tivoli Storage Manager is the core application of the Tivoli Storage Management solution set. IBM Tivoli Storage Manager is an enterprise-wide storage management application for the network. It provides automated storage management services (including backup and restore, archive and retrieve, and hierarchical space management and disaster recovery) for multi-vendor workstations, personal computers, mobile laptops and servers of all sizes and operating systems, which are connected via wide area network (WAN), local area network (LAN), and storage area network (SAN).

IBM Tivoli Storage Manager is available in three editions: Express, Basic Edition, and Extended Edition, as described here:

� IBM Tivoli Storage Manager Express is a new product aimed at two market segments: the small to medium business with a less sophisticated IT environment, and the enterprise department that does not need the full suite of IBM Tivoli Storage Manager features.

IBM Tivoli Storage Manager Express provides a subset of Tivoli Storage Manager features, focusing on backup and recovery for between 5 and 20 client machines.

� IBM Tivoli Storage Manager Basic Edition contains a rich set of features and provides the core functions of backup, recovery, and archive management.

� The Extended Edition of IBM Tivoli Storage Manager expands on the features and possibilities of the Basic Edition. IBM Tivoli Storage Manager Extended Edition adds disaster recovery planning capability for the server, Network Data Management Protocol (NDMP) control for network-attached storage (NAS) filers, and support for larger capacity tape libraries and more tape drives.

Tivoli Storage Manager includes the following components:

� The server

The server provides backup, archive, and space management services to its defined clients. The server maintains its own database and recovery log for information about IBM Tivoli Storage Manager resources, users, and user data including all backed-up, archived and migrated files. The client data itself is stored in server-controlled entities called storage pools. These are groups of random and sequential access media that store backed-up, archived, and space-managed files.

The IBM Tivoli Storage Manager server is responsible for maintaining the integrity of client sessions, reliably receiving client data, storing client data in storage pools, and efficiently managing that data internally so that it can be restored or retrieved when required. You can set up multiple servers in your enterprise network to balance storage, processor, and network resources. IBM Tivoli Storage Manager allows you to manage and control multiple servers from a single interface that runs in a Web browser, the enterprise console.

� The administrative interface

The administrative interface allows administrators to control and monitor server activities, define management policies for client files, and set up schedules to provide services at regular intervals. Administrative functions are available from an administrative client command line and from a Web browser interface. A server console is also available.

� Backup-Archive client

The Backup-Archive client allows users to maintain backup versions of their files, which they can restore if the original files are lost or damaged. Users can also archive files for long-term storage and retrieve the archived files when necessary. A command line interface, native GUI interface, and Web browser interface are available for the Backup-Archive clients.

Functions andcomponents of Tivoli products to manage the storage.


� Application program interface (API)

The API allows users to enhance existing application programs with backup, archive, restore, and retrieve services. When users install the Tivoli Storage Manager API client on their clients, they can register as client nodes with a Tivoli Storage Manager server.

The IBM Tivoli Storage Manager family of offerings includes:

� IBM Tivoli Storage Manager for Application Servers

IBM Tivoli Storage Manager for Application Servers is a software module that works with IBM Tivoli Storage Manager to better protect the infrastructure and application data and improve the availability of WebSphere Application Servers. It works with the WebSphere Application Server software to provide an applet graphical user interface (GUI) to do reproducible, automated online backup of a WebSphere Application Server environment, including the WebSphere configuration data, and deployed application program files.

� IBM Tivoli Storage Manager for Databases

IBM Tivoli Storage Manager for Databases is a software module that works with IBM Tivoli Storage Manager to protect a wide range of application data via the protection of the underlying databases management systems holding that data. Tivoli Storage Manager for Databases exploits the backup-certified utilities and interfaces provided for Oracle, Microsoft SQL Server, and Informix®.

In conjunction with Tivoli Storage Manager, this module automates data protection tasks and allows database servers to continue running their primary applications while they back up and restore data to and from offline storage. (This same functionality is included in IBM's DB2 Universal Database™ package, allowing it to work directly with Tivoli Storage Manager without the need to buy any additional modules.)

Regardless of which brand of database is used, Tivoli Storage Manager for Databases allows the centralized and automated data protection capabilities of Tivoli Storage Manager to be applied to up and running database servers.

� IBM Tivoli Storage Manager for Enterprise Resource Planning

IBM Tivoli Storage Manager for Enterprise Resource Planning (ERP) is a software module that works with IBM Tivoli Storage Manager to better protect the vital enterprise data and improve the availability of SAP database servers.

Specifically designed and optimized for the SAP environment, IBM Tivoli Storage Manager for ERP provides automated data protection, reduces the CPU performance impact of data backups and restores on the SAP database server, and greatly reduces the administrator workload necessary to meet data protection and storage management requirements.

Tivoli Storage Manager for ERP seamlessly integrates with the database-specific utilities of DB2 UDB (db2 admin) and Oracle Recovery Manager (RMAN) or with the SAP br-tools, a set of database administration functions integrated with SAP for Oracle databases. The Storage Manager for ERP software module allows multiple SAP database servers to share a single Tivoli Storage Manager server to automatically manage the backup data.

As the intelligent interface to SAP databases, Tivoli Storage Manager for ERP is SAP-certified in heterogeneous environments, supporting large-volume data backups, data recovery, data cloning, and disaster recovery of multiple SAP database servers.

� IBM Tivoli Storage Manager for Advanced Copy Services

IBM Tivoli Storage Manager for Advanced Copy Services (formerly known as IBM Tivoli Storage Manager for Hardware) is an optional software module for AIX that integrates with Tivoli Storage Manager Extended Edition.

Chapter 5. Using IBM Tivoli Storage Manager to manage the storage environment 195

Tivoli Storage Manager for Advanced Copy Services protects mission-critical data that must be available 24x7, and integrates hardware- and software-based snapshot capabilities with Tivoli Storage Manager and the corresponding Data Protection module (IBM Tivoli Storage Manager for Databases, or IBM Tivoli Storage Manager for ERP) to provide a “near zero-impact” data backup and “near instant” recovery solution.

Tivoli Storage Manager for Advanced Copy Services supports a wide range of hardware: IBM Enterprise Storage Server (ESS), IBM DS6000™, IBM DS8000, SAN Volume Controller (SVC) and all IBM and non-IBM devices supported by the SVC.

Tivoli Storage Manager for Advanced Copy Services also provides snapshot support for DS8000 for SAP on DB2 UDB, and coordinated FlashCopy backup of multi-partition DB2 UDB databases distributed across multiple host systems. Support of FlashCopy and snapshot functionality allows for “zero impact” backups and instant recovery. Data transfer to the Tivoli Storage Manager server is handled from a separate storage server, allowing the primary production data to remain online and undisturbed.

� IBM Tivoli Storage Manager for Mail

IBM Tivoli Storage Manager for Mail is a software module for IBM Tivoli Storage Manager that automates the data protection of e-mail servers running either Lotus® Domino® or Microsoft Exchange. This module utilizes the application program interfaces (APIs) provided by e-mail application vendors to perform online “hot” backups without shutting down the e-mail server and improve data-restore performance. As a result, it can help protect the growing amount of new and changing data that should be securely backed-up to help maintain 24 x 365 application availability.

� IBM Tivoli Storage Manager for System Backup and Recovery

IBM Tivoli Storage Manager for System Backup and Recovery (known as SysBack™) offers a flexible backup method for the AIX systems. It helps to protect data and to provide bare metal restore capabilities. It offers a comprehensive system backup, restore, and reinstallation tool.

Tivoli Storage Manager can include the two following optional solutions:

� IBM Tivoli Storage Manager for Storage Area Networks

IBM Tivoli Storage Manager for Storage Area Networks extension allows SAN-connected Storage Manager servers and Storage Manager client computers to make maximum use of their direct network connection to storage. This software extension introduces storage agents which allow both servers and client computers to make the bulk of their backup/restore and archive/retrieve data transfers over the SAN instead of the LAN, either directly to tape or to the Storage Manager disk storage pool.

This ability greatly reduces the impact of data protection on the LAN, while also reducing CPU utilization on both client and server. Should a tape library be connected directly to a SAN, this software extension also allows multiple IBM Tivoli Storage Manager servers to share the library over this high-bandwidth data connection.

� IBM Tivoli Storage Manager for Space Management

Often, some percentage of your data is inactive; that is, it has not been accessed in weeks, if not months. IBM Tivoli Storage Manager for Space Management, using Hierarchical Storage Management (HSM) techniques, can automatically move inactive data to less-expensive offline storage or near-line storage, thus freeing online disk space for more important active data. IBM Tivoli Storage Manager for Space Management complements both IBM Tivoli Storage Manager and IBM Tivoli Storage Manager Extended Edition.

For a full list of products, you can visit:

http://www.ibm.com/software/tivoli/products/




5.1.1 Backup techniques for databases

In this section we discuss integrating database backup strategies with IBM Tivoli Storage Manager, using built-in functionality for DB2 UDB and the IBM Tivoli Storage Manager for Databases.

IBM Tivoli Storage Manager for Databases exploits the backup-certified utilities and interfaces provided for Oracle, Microsoft SQL Server, and older versions of Informix. In conjunction with IBM Tivoli Storage Manager, it automates data protection tasks and allows database servers to continue running their primary applications while they back up and restore data to and from offline storage.

Here we provide an overview of the fundamental structure of a database, such as tables, table spaces, data files, control files, parameter files, and configuration files. Different types of database backups are presented, as well as techniques for online and offline backups.

Relational databases componentsRDBMSs share a common set of principles and, conceptually, similar logical and physical structures. Figure 5-1 shows their fundamental structure: tables, table spaces, log files, and control files. These are generic terms, and each vendor's RDBMS may use different terminology or structures. For example, there is no “table space” concept in Microsoft SQL Server, and Log files in Oracle are called “redo logs”.

It is important to understand the basic RDBMS structures so that you can put an effective backup and recovery strategy in place—because you must back up more than the database itself to ensure a successful recovery.

Figure 5-1 Fundamental structure of a database

TablesAn RDBMS holds its data in the form of two-dimensional tables, also referred to as relations. These two-dimensional tables are easy for users to understand and manipulate. They also

Understand which database components to back up.

table spaces

tables

tables

database

control files log files


enable different users and applications to view and process the same data in different ways, without requiring complex structures.

Table spacesTable spaces are logical concepts that many RDBMSs use. When a user creates tables in an RDBMS that supports table spaces, the tables are created within a table space. Table spaces provide a convenient way of separating the user's view of data from some of the practical considerations associated with storing that data on disk. In many UNIX environments, table spaces can be implemented using either files or raw devices. A table space provides the link between the logical view of a database that the user sees, and the data files that the database uses to hold the data.

Log filesMost RDBMSs maintain details of updates to databases in log files. If, for some reason, a transaction that updates a database fails to complete successfully, the RDBMS recovery procedure will use the log file to detect that an update may be only partially complete and to undo any changes made to the database.

RDBMSs use log files to record the changes made to databases. Log files often can be used to maintain database consistency in the event of an error or failure. Different RDBMS suppliers use different terms for log files. Some RDBMSs support the use of log files to perform forward recovery (also called roll-forward recovery). Forward recovery takes advantage of the fact that log files hold details of all changes that have been made to the database, so you do not necessarily have to undo changes, but instead can reapply them. Log files can be used for forward recovery for both online and offline backup techniques.

RDBMSs have very complex schemes to manage log files. RDBMSs typically have multiple sets of log files to ensure the proper recording of database transactions. Most RDBMSs have a set of online log files as well as offline (or archived) log files. Online log files are used to record the current database transaction activity, and at some point in time when the online logs become full, they become offline logs and are moved to another location. Typically, backup applications back up the offline log files.

Control filesEach RDBMS holds information about the physical structure of the database, such as which physical files are used by each table space and which is the current log file. We call this information control data. Some RDBMSs hold this data in separate files, and others hold it within the database itself.

Backup techniquesThere are several techniques that you can use to back up data managed by an RDBMS. These techniques are, at least at a conceptual level, common to most RDBMSs. A combination of the following techniques may be used:

� Disk mirroring� Database export� Offline backup� Online backup� Full database backup� Partial database backup� Log file backup� Incremental backup� Backup of RDBMS supporting files� Backup using storage server advanced copy services

A combination of backup techniques is needed.


Disk mirroringMirroring is the process of writing the same data to multiple storage devices at the same time. Disk mirroring is a useful technique to maximize the database availability, because it means users can continue working when a media failure has occurred. This solution is more dedicated to high availability than backup solution; however, it is still needed to back up databases.

Database exportAll RDBMSs provide export and import utilities, which operate on logical objects as opposed to physical objects. For example, you can use an export command to copy an individual table to a file system file. You might want to restore the table at some later time, in which case you would use the import command. Export and import are time consuming and are not designed as backup and restore utilities, but instead they can be used to move data for workload balancing or migration, for example.

Offline backupTo make an offline backup, you need to shut down the database before starting the backup and restart it after the backup is complete. The obvious but significant disadvantage is that neither users nor batch processes can access the database (read or write) while the backup is in progress. Most databases do not require that you perform offline backups if you perform online backups; online backups (along with the log files) are sufficient to recover the database.

Online backupMost RDBMSs enable backups to be performed while the database is started and in use. Clearly, if a database is being backed up while users are updating it, it is likely that the backed-up data will be inconsistent. The RDBMSs that support online backup use log files during the recovery process to recover the database to a fully consistent state. This approach requires that you retain the RDBMS log files and indicate to the RDBMS when you are about to start the backup and when you have completed it. Be aware that this method impacts negatively the RDBMS performances.

Full database backupA full database backup is a copy of all of the data files used to hold user data. In some database products, full database backups also include copies of the data files that hold tables used by the RDBMS itself, RDBMS log files, and any control files and parameter files that the RDBMS uses. Many RDBMSs provide both full online and offline database backups; however, the backup is different in each case.

An offline full backup can be done using operating system utilities, RDBMS utilities, or the IBM Tivoli Storage Manager backup-archive client to back up the data files that constitute the database. An online backup requires an RDBMS utility to create data files containing a copy of the database. You can then use IBM Tivoli Storage Manager to back up these data files along with the parameter files that you use to start the RDBMS.

The simplest approach to database backup is to perform only full, offline backups at regular intervals. This approach is relatively easy to administer, and recovery is relatively straightforward. However, it may not be practical to take databases offline for the period of time necessary to perform full backups at the frequency you need. You may have to adopt a more flexible approach.

Some database products provide incremental backup, which only backs up changed database pages or blocks. This is called a true incremental backup, as opposed to a simulated incremental backup (called a log file backup).


Understanding what incremental backup means for a database is critical.

Partial database backupMany RDBMSs allow both online and offline partial database backups. A partial database backup is a backup of a subset of the full database (such as a table space or data files that make up a table space). It is often not the best approach to back up only a subset of a database, because you must ensure that what you back up represents a complete logical unit of recovery from the perspective of both the application and the RDBMS itself.

If you have added a new data file to a table space, you must ensure that any control file that the RDBMS uses to define the relationship between data files and table spaces is also backed up. You may need to back up data files that the RDBMS does not manage.

Log file backupFor some applications, the units of recovery are too large to be backed up on a daily basis (for example, performing a full daily backup). The constraining factor might be the backup window, or the network and CPU overhead of transferring all the data, for example.

An alternative is to capture only the changes to the database by backing up the RDBMS log files. This type of backup is sometimes called an incremental backup (versus a full daily backup), but it is really a simulated incremental backup, as opposed to a “true” incremental backup. A true incremental backup backs up changed database blocks or pages, whereas a simulated incremental backup backs up the database transactions.

Recovery from a simulated incremental can be much longer than from a true incremental, because you must reapply all of the transactions in the logs.

Incremental backupSome RDBMSs provide for backing up data that has changed since the last offline or online database backup. This saves tape or disk space, but might not reduce the backup duration because the RDBMS still has to read each data block to determine whether it has changed since the last backup. When recovery is needed, the database backup and incremental backups are required to fully recover the full database. Incremental backups are useful for saving space or for saving bandwidth when backing up over the network.

Backup of RDBMS supporting filesMost RDBMSs require certain files to operate, but do not back them up when using their backup utilities. These files can be initialization parameter files, password files, files that define the environment, or network configuration files.

They are external files and are not part of the database because they must be accessible for reading or editing even when the database is down. For example, the password file provides authentication in order to administer a database, especially for starting up a database from a remote site.

You must ensure that these files are also backed up using operating system tools or third-party tools such as IBM Tivoli Storage Manager.

Backup using copy storage server advanced copy services: FlashCopyA backup may potentially degrade the performance of a production system. In a 24x7 environment or with very large databases, it is particularly important to run backups without interfering with normal operation. To free the production system from the overhead of backup, it is valuable to have a copy of the database for backup, reporting, or other purposes.


Some intelligent storage servers, such as IBM Total Storage DS6000, DS8000, and SAN Volume Controller, provide an advanced copy service: FlashCopy. A FlashCopy is an identical and independent copy of one or more disk volumes, called a FlashCopy pair, which is created within the storage server. Normally these copies can be established in a very short time (five to 20 seconds, depending on the machine).

If the database resides on a storage server that supports FlashCopy, a copy of the disk volumes can be established and assigned to another (backup) machine. On the backup machine, the (backup) database can be accessed exclusively for backup or other purposes.

It is important that the data on the disk volumes is consistent while creating the FlashCopy volumes. One way to achieve this is to shut down the database and synchronize to disk, all of the data that may reside in memory. After the FlashCopy is established, the database can be started again.

If the database cannot be stopped, then the database itself must provide features to ensure that the data on the disk will be in a consistent state when establishing the FlashCopy pair.

5.1.2 Backing up SAP using FlashCopy

In this section we detail the FlashCopy techniques in the SAP environment and we explain the configuration parameters we used for our tests.

Introduction to Tivoli Data Protection for Storage for SAP on DB2Data Protection for Disk Storage and SVC for SAP (also known as Data Protection for FlashCopy Devices for SAP) is the SAP-specific component of Tivoli Storage Manager for Advanced Copy Services. Data Protection for FlashCopy Devices for SAP communicates with storage devices using the Storage Management Interface Standard (SMI-S). It supports storage subsystems like IBM Enterprise Storage Server (ESS), IBM TotalStorage DS6000, DS8000, and SAN Volume Controller (SVC).

Tivoli Storage Manager for Advanced Copy Services implements the backup/recovery technique known as copy backup: using the FlashCopy capabilities of storage subsystems, a logical copy of the production data is created on a backup server. This copy can then be used for:

� Postprocessing on the backup server; that is, creating a backup copy on the Tivoli Storage Manager server, starting a clone database, performing backup verification, and so on.

� Performing a FlashBack restore.

Because the backup to Tivoli Storage Manager uses the resources of the backup system, the production system is not adversely impacted by backing up the data to a Tivoli Storage Manager server; a traditional backup window is no longer required. This gives administrators the operational flexibility to initiate backups any time (backing up the database more often in order to reduce the number of database logs to be applied during a forward recovery), and to balance the load on the Tivoli Storage Manager server (for example, by delaying backups to tape).

As long as a backup eligible for restore is available as FlashCopy on the backup server, this can be used instead of the backup data saved on Tivoli Storage Manager. In this case, the restore is done by flashing back the data to the production system. This way, a “minute restore” can be achieved. When combined with more frequent backups, this will yield massively shortened database restoration windows.

With the additional FlashCopy cloning offering, database clones can be created any time on short notice.

IBM Tivoli products and components available to back up a SAP environment.


Data Protection for Storage for SAP executes on the production DB server as well as on the backup server. It provides complete and automated FlashCopy processes through seamless integration with:

� The database administration tools, the built-in backup and restore commands for DB2 UDB, br-tools (splitint -BC-BRS) for Oracle.

� Data Protection for SAP, which is used to back up to, and restore from, a Tivoli Storage Manager server. Also, database log files are saved to Tivoli Storage Manager directly from the production system, and are therefore handled by Data Protection for SAP.

Figure 5-2 illustrates this integration and the interaction of the components on a high level.

Figure 5-2 Data Protection for Storage for SAP

Referencing the numbers in Figure 5-2, the following tasks are performed from an integrated FlashCopy function:

� A database backup is started from the backup server (1).

� The database administration tools route backup requests to Data Protection for FlashCopy Devices for SAP (2), which prepares the FlashCopy (3).

� After the database administration tools indicate that the database on the production server is ready for backup, the data is flashed (4) and a background process starts copying the data to the backup server using the FlashCopy functionality of the disk subsystems (5).

� Immediately after flashing the data, the database on the production server can continue running without any more impact. If a backup to Tivoli Storage Manager is requested, data is backed up using the copy on the backup server as a source (6, 7).

Database restore requests are initiated from the production server. The database administration tools route restore requests to Data Protection for SAP. Data Protection for SAP then interacts with Data Protection for FlashCopy Devices for SAP to determine whether the data is still available as a copy on disk or whether a restore from Tivoli Storage Manager is required.

mySAPproduction

DBServer

TSMBackupServer

Flash Copy

Tape Backup

11 database backup command

22 Data Protection FC mySAP

33Prepare

database 44

Disk system

55

Disk Backup

77

Tape Backup

66 DP for mySAP

TSM

112

2

3

45

678

9

10

11


If the backup copy is still available on the backup server, Data Protection for FlashCopy Devices for SAP performs a Flashback restore from the copy to the production system; otherwise, Data Protection for SAP restores the data from Tivoli Storage Manager.

The test environmentThe operating environment of our test environment consisted of the DB2 RDBMS executing on an AIX server attached to one of the IBM DS8000 disk storage subsystems. This AIX server was the production system (SYS3, in our case). Another AIX server, the backup system, was also attached to the same storage system to back up FlashCopy copies of the production system to the Tivoli Storage Manager server. This was done by the concerted action of the two Data Protection for FlashCopy components (tdphdwdb2/splitint) and Data Protection for SAP (shared vendor library and prole).

Data Protection for FlashCopy requires Data Protection for SAP to perform the actual backup or restore to or from the Tivoli Storage Manager server. Figure 5-3 depicts the hardware and software environment in which Data Protection for FlashCopy operates.

Figure 5-3 The Data Protection for Flashcopy

When you create a FlashCopy (locally or remotely), the storage system creates a logical copy of a logical disk (or set of disks) over a window of time without freezing or inhibiting application changes to those disks, and therefore requires proper synchronization with the database system. DB2 provides capabilities to ensure this synchronization.

Production System / DB Server

Admin Tools

DB for FlashCopytdphdwdb2

backup restore/archive

DB2

Connect

DB2 Server/Client

TSM Client

DP formySAP

TSM APIDB for FlashCopy

CORE

HCI LVM

TCPIPNFS

RFXEC

CommonInformationModel (CIM)

Interface

AIXDisk I/OAIX JFSAIX LVM

SCSI LayerSCSI / Fibre Channel

Backup System

DB for FlashCopytdphdwdb2

DB2

Connect

DB2 Client

TSM Client

DP formySAP

TSM API DB for FlashCopy

??????

CORE

HCILVM

TCPIPNFS

RFXEC

CommonInformationModel (CIM)

Interface

AIXDisk I/OAIX JFSAIX LVM

SCSI LayerSCSI / Fibre Channel

Storage SystemSource Target

SAN


FlashCopy using Tivoli Data Protection for FlashCopy Devices for SAP for DB2

After the tdphdwdb2 process has been started on the backup system with the parameters to perform a backup of a FlashCopy, it will first interact with the DB2 database on the production system via DB2 remote connect to get the DB2 database information. Then it calls the splitint function, which, with its components (CORE, HCI, and LVM), will ensure that the tdphdwdb2 process will get all the target volumes with the files that it needs to run a full DB2 database backup. When performing the backup, the tdphdwdb2 process calls the DB2 backup command, which calls the vendor library DP for SAP to perform the actual backup. The tdphdwdb2 process can be executed on the command line, but it normally runs in a batch job.

The major functionality of Tivoli Data Protection for FlashCopy is to enhance the database backup and restore processes by embedding the FlashCopy capabilities into those processes. In order to utilize those capabilities, TDP for FlashCopy needs a sequence of tasks to put in place, and they are described in Figure 5-4.

FlashCopy backupTivoli Data Protection for FlashCopy provides a set of necessary functions to prepare the environment on the backup host for a backup of the database of the production system. Such a backup must be started on the backup system by using the tdphdwdb2 process.

Tivoli Data Protection for FlashCopy uses the DB2 write suspend, DB2 write resume, and db2inidb as standby features to activate the backups from the image disk copies.

Figure 5-4 Flow of FlashCopy/Backup/Withdraw operations

Note: Steps 1, 3 and 5 in Figure 5-4 are performed using the tdphdwdb2 command. Steps 2, 4 and 8 are performed using the splitint command. Step 7 is performed using the brbackup and splitint commands.

DB2 UDBproduction server

DB2 UDBbackup

DP for mySAP

DP for FlashCopy(agent)

Logical VolumeManagerR/3 DB Server

DP

for

Flas

hCop

y(td

phdw

db2)

DB2 UDBClient

DB2 UDBbackup

DP for mySAP

DP for FlashCopy(agent)

TivoliStorageManager

Logical VolumeManager

Backup Server

DP

for

Flas

hCop

y(td

phdw

db2)

1' 3'

5'

1 3

5

7

2, 4

1, 3

5, 7

2, 4

6, 8

2 4 8 6

Copy Services4' 8'

Storage System

Source Target


The tasks are described here:

1. Using a DB2 client for a DB2 remote connection, the tdphdwdb2 process determines all container files that make up the DB2 database on the production system, puts the names into a file list, and adds the local database directory to this file list.

2. Volume and file system information is obtained from the list created in step 1 of the DP for FlashCopy program splitint.

3. The DB2 database on the production system is set in write suspend mode using the tdphwdb2 command.

4. A request for a FlashCopy of the database volumes (source/target) is made by calling the DP for FlashCopy program splitint.

5. The DB2 database on the production system is set in write resume mode using the tdphwdb2 command. The production system is now operational.

6. The disks (target volumes) are mounted on the backup system after an image copy (FlashCopy) is available.

7. Backup of the DB2 database to the TSM Server is performed with the db2 backup command via DP for SAP on the backup system.

8. A request is made to splitint to withdraw all FlashCopy source/target relationships.

FlashBack RestoreTivoli Data Protection for FlashCopy provides the capability to restore target volumes that are created in the FlashCopy Backup to the original source volumes. This is referred to as FlashBack Restore. In contrast to FlashCopy Backup, all database restore activities with TDP for FlashCopy need to be started on the production system.

For more details about this topic, refer to IBM Tivoli Storage Manager for Advanced Copy Services Data Protection for Snapshot Devices, SC33-8208.

Our backup method

Important: In our test environment, we chose to implement the full database backup with a FlashCopy of the volumes that SAP and DB2 use, as illustrated in Figure 5-5 on page 206.


Figure 5-5 The backup method used for our tests

This process employs Tivoli Storage Manager for Hardware (Tivoli Storage Manager Advanced Copy Services), Tivoli Data Protection for DS R/3 (on the AIX server), and Tivoli Data Protection for Disk Storage and SVC for SAP (also known as Data Protection for FlashCopy Devices for SAP). The choice was motivated by the following reasons:

� Performance - this represented the fastest method, and we needed to back up 20 TB in less than 8 hours.

� Availability - by using this method, there was no need to turn off the database and data is available during backup.

� Quality of service - using this method meant that there was no impact on users during backup.

5.2 DB2 backup coordination across 32 partitions and multiple LPARs

The goal of scenario 3a, as described in 1.1.7, “The Key Performance Indicators test requests” on page 15, was to prove the feasibility of backing up a FlashCopy image of a 20 TB database to tape in a reasonable time using Tivoli Storage Manager. The test simulated a cycle backup window of 8 hours. Achieving a high rate in the test would avoid the need for multiple FlashCopy target sets in the DS8000 storage server.

The overall backup process was controlled using the following components:

� Tivoli Storage Manager for Hardware � Tivoli Data Protection for Enterprise Resource Planning � Tivoli Storage Manager Server � DB2 itself

During the backup cycle, the database servers would initiate a write suspend to the database. FlashCopy operations would then be started for all database source volumes. After the FlashCopy has been invoked, write operations would be resumed and the database servers

Host-free

Split-mirroring

IBM ESS ou DS Flashcopy

Application Host

Splitable Replication

Backup Server

Pointer-Copy

Minimized backup window

LAN LAN-free

Backup


would continue with normal operation. The Tivoli Storage Manager server would then access the FlashCopy target volumes and start the DB2 database to be able to do the backup. The backup of the database image could use up to 16 tapes in parallel.

5.2.1 The process description

One database can be created across one or more nodes or machines. Each node in a partitioned database supports a subset of the overall database. They are called database partitions. As shown in Figure 5-6, a node will contain a database partition of the database which has it own data, configuration files, and transaction logs.

Figure 5-6 Database partitioning

A database partition fits together with the massively parallel processors (MPP) hardware architecture that is called a shared-nothing architecture, because everyone has its own data, configuration files, and transaction logs. When using media storage managers like Tivoli Storage Manager, you should install and configure the media manager on all machines with database partitions. Backup must also be done on each database partition. Recovering to a point in time must be carefully planned with database partitioning.

Database partitions are initially defined in a file called db2nodes.cfg. Each database partition is assigned a node number which represents a partition (sometimes called a node).

A DB2 partition can be physical or logical:

� Physical DB partitions are nodes assigned on separate machines. � Logical DB partitions are nodes participating on the same machine.

DB2 logical partitions can be useful when exploiting the symmetrical multiprocessor (SMP) architecture.

It is therefore possible to have more than one database partition for one database residing on the same machine.

Database partitioning reminder.

configuration files

logsdata

Database Partition

configuration files

logsdata

Database Partition

configuration files

logsdata

Database Partition

configuration files

logsdata

Database Partition

Partitioned Database


In our environment, the production database had 33 database partitions distributed across five System p LPARs:

� The first LPAR hosted partition 0 (called the initial node).

� The four other LPARs hosted eight partitions each.

The Tivoli Storage Manager backup-archive client uses, by default, the nodename as the hostname of the machine. In our case, however, we used a different naming convention as shown in Table 5-1. For example, in the name SYS3dbXpY, SYS3 is the name of the physical server, X identifies the number of the LPAR partition, and Y identifies the index of the DB2 partition (from 0 to 7). LPAR SYS3db0p hosted one DB partition (partition 0); all our other DB server LPARs hosted eight DB partitions.

Table 5-1 Our DB2 partition names

Figure 5-7 shows the distribution of data across the partitions. The allocated space in the data tablespaces was about 17.4 TB. Excluding the freespace, 15.4 TB of net data had to been backed up in total for all database partitions. Partition 0 has less data (263 GB net), and all other partitions were distributed about a average net size of 485 GB plus or minus 30 GB. (485 GB reflects the average value; 30 GB reflects the standard deviation of the data distribution).

Figure 5-7 Distribution of data across all partitions

Hostname DB2 partitions

SYS3db0p Partition 0

SYS3db1p Partitions 6 to 13




distribution of data across all 33 database partitions

(total amount of data backed up: 15770 GByte)

1 1

7

10

6

7

1

0

2

4

6

8

10

12

<250 275 325 375 425 475 525 575size of backup image per partition [GB]

num

ber


The database storage was spread across 512 disks located in a single DS8000 disk storage subsystem. Both the source volumes and the flashcopy target volumes were distributed across the disks. The DS8000 had Licensed Internal Code (LIC) level 6.1.700.4 installed.

For each Logical Unit Number (LUN) hosting a filesystem with database data, temporary tablespace, or the local database directory, a flashcopy target volume was available with the same size. A SAN connection with multiple paths achieved the access to the disks.

IBM Subsystem Device Driver 1.6.0.49 was used for multipathing access. The source volumes were accessible on the database servers using 3 Fibre Channel (FC) adapters for each LPAR, and using multiple paths to each LUN.

All FlashCopy target volumes were accessible by the backup server. Each individual flashcopy LUN was accessed by two paths on the backup server. FlashCopy volumes were connected to the backup server using FC adapters; the number of FCs may vary depending on the type of test. Each of the LPARs had multiple network adapters.

The IBM TotalStorage Common Information Model (CIM) Agent for Data Storage Open (application programming interface (API)) was installed on an Intel® Windows 2000 server. The IBM TotalStorage DS Open API is a non-proprietary storage management client application that supports routine LUN management activities, such as LUN creation, mapping and masking and the creation or deletion of RAID-5 and RAID-10 volume spaces. It also enables copy services configuration and use activities, such as FlashCopy, Metro Mirror (PPRC), and Global Mirror (formerly Asynchronous PPRC).

The DS Open API supports these activities through the use of the Storage Management Initiative Specification (SMIS), as defined by the Storage Networking Industry Association (SNIA). The DS Open API helps integrate configuration management support into storage resource management (SRM) applications, which allow customers to benefit from existing SRM applications and infrastructures. The DS Open API also enables the automation of configuration management through customer-written applications.

The DS Open API presents another option for managing storage units by complementing the use of the IBM TotalStorage DS Storage Manager Web-based interface and the IBM TotalStorage DS Command-Line Interface. You must implement the DS Open API through the IBM TotalStorage Common Information Model (CIM) agent, a middleware application that provides a CIM-compliant interface.

The DS Open API uses the CIM technology to manage proprietary storage units as open system storage units through storage management applications. The DS Open API allows these storage management applications to communicate with a storage unit. The DS Open API supports the IBM TotalStorage DS8000, IBM TotalStorage DS6000, and the IBM TotalStorage Enterprise Storage Server. It is available for the AIX, Linux, and Windows 2000 (or later) operating system environments, and must be used on storage units having fibre ports.

In our tests, both the production and backup LPARs had a local area network (LAN) connection to the CIM Agent.

Figure 5-8 on page 210 describes the complete environment used for the final tests.

CIM and Data Storage Open API consideration.


Figure 5-8 The complete test environment

The database server LPARs had AIX 5.2 Maintenance Level 06 (ML06) installed. The backup server had AIX 5.2 ML05 installed. On the backup server, Tivoli Storage Manager Server 5.3.2.0 (64 bit) was installed. On all database production LPARs, Tivoli Storage Manager Client 5.3.0.12 and API Client 5.3.0.12 were installed.

For LAN-free backups, IBM Tivoli Storage Agent 5.3.2.0 was installed on all database production LPARs. The Locale en_US.ISO8859-11 was available on all LPARs. IBM Tivoli Data Protection for DS and SVC 5.3.1.2 and IBM Tivoli Data Protection for SAP (DB2) 5.3.2.2 were installed on all LPARs.

DB2 Universal Database (UDB) ESE 64-bit V8.1 FixPack 12 (FP12) was installed on all LPARs. DB2 was configured using the new logfile management, therefore the SAP database administrator (DBA) tools were not required.

The Pegasus CIM client package (Pegasus CIM Server Runtime Environment 2.5.0.0 and Base Providers for AIX OS 1.2.5.0) and Secure Sockets Layer and cryptography libraries and tools 0.9.7g-1 were installed on all LPARs. All filesystems for the EB8 database were created on JFS2 filesystems with external ffs2logs (no inline logs).

Logical Volume Manager (LVM) mirroring was not implemented. The Concurrent I/O (CIO) mount option was not used for the filesystems, but instead CIO was activated due to the DB2 tablespace configuration. All data tablespaces and the user temporary tablespaces were DMS tablespaces. The DB2 database was operated in log retain mode.

A tape library having 16 LTO3 tapes was connected to the backup server, and each tape was connected to the backup server using a dedicated FC adapter.

1 To learn more about the Locales, refer to: http://nlmas.ams.nl.ibm.com/sunsoft/files/SUN/docs/manuals/806-0169.pdf

16 LTO3 tapesassigned toTSM server

LPAR 0 sys1db0pDB2 Partition 0

LPAR 1sys1db1pDB2 Partition 6 … 13





10.3.13.7310.10.10.73

10.3.13.7110.10.10.71

10.3.13.7410.10.10.74

10.3.13.7510.10.10.75

10.3.13.7610.10.10.76

DS8000

Source FCTarget

LPAR 6 deats005TSM Server

10.3.13.510.10.10.5


116

16 fibre channeladapterfor tapes

16 fibre channeladapterfor FC disks


http://nlmas.ams.nl.ibm.com/sunsoft/files/SUN/docs/manuals/806-0169.pd

5.2.2 Influencing factors

One individual FlashCopy backup run can be subdivided into several phases, as shown in Table 5-2.

Table 5-2 Flashcopy phases

The duration of each step will influence the duration of the overall process. In the next sections, we discuss the different parameters affecting the individual steps.

Phase 1 influencing factorsThe duration of Phase 1 is influenced by the initial state of the FlashCopy target volumes.

We intended to use incremental mode in our test for the FlashCopy backup. However, before the test, all FlashCopy source-target pairs were withdrawn.

Having no existing FlashCopy source-target relationships, the check phase for the tdpdb2hdw process will not individually check the status of the pairs and thus will be shorter. Repeating the FlashCopy a second time (having the incremental FlashCopy relationships) will check for the state of all source-target pairs and will last some minutes more. Checking all the 280 source-target relationships for the LUNs takes up to 30 minutes.

Phase 2 influencing factorsThe duration of Phase 2 is influenced by the actual workload on the database.

The DB2 write suspend commands will have to compete with the current workload on the database server. Having a high workload on the database server, it may take more time to get the commands executed on all partitions (up to a minute). Compared to the duration of the backup to tape, this is a negligible amount of time. However, it impacts the offline time, where the database was in write suspend mode.

Phase 3 influencing factorsThe duration of Phase 3 is influenced by the number of LUNs and disk devices.

During the FlashCopy backup process, the FlashCopied image has to be accessed on the backup server. After the FlashCopy has been invoked in the storage server, the backup server will run AIX configuration manager processes (cfgmgr). The FlashCopied volume groups will be redefined and then varied online. Afterwards, all the filesystems will be mounted.

In the environment of the test, 280 LUNs spread across 66 volume groups had to be discovered during this task; see Table 5-3 on page 212. The multiple tdpdb2hdw processes synchronized themselves and ran sequentially through this step, because the configuration manager processes could not be run in parallel. Depending on the number of paths in the SDD configuration, one vpath (LUN) will map to several hdisks.

Phase Action

1 Check DB2 database state, check storage layout, match FlashCopy source-target pairs and check FlashCopy source-target volume relationships.

2 Invoke FlashCopy operations via the CIM agent.

3 Discover target volumes and mount the filesystems on the backup server.

4 Start DB2 and perform the backup to tape.

Optimizing the duration of a FlashCopy backup in our environment.


Table 5-3 Discovering 280 LUNs spread across 66 volume groups and mount of 331 filesystems

Obviously, the time required for the configuration of all LUNs on the backup server scales linearly with the number of SDD paths. So, the number of SDD paths should be kept to a minimum value. On the other hand, having not enough paths to the LUNs may impact the throughput during the DB2 backup by being restricted reading from the disks; in fact, having two paths only seemed to be a limit. So a compromise between the time required to access the disks and the backup throughput has to be found to optimize the overall process.

Phase 4 influencing factorsThe duration of phase 4 is influenced by multiple factors: the type of tapes, the FC adapter, and features of the Tivoli components or DB2.

LT02 versus LT03 throughput The kind of tapes used for the backup will influence the duration of the phase 4.

To compare the throughput capability between the LTO2 tape type and the LTO3 tape type, a unit test was performed with a backup of two single partitions in parallel. We used LTO2 drives in the first run and LTO3 drives in the second run. Table 5-4 shows the results of this test.

Table 5-4 Backup throughput per tape session (in MB/sec)

The measured throughput in the POC environment for the LTO3 drives was about 141% compared to LTO2 drives. According to the specification, each LTO3 drive can handle 90 MB/s (read or write) without compression, so achieving a rate of 140 MB/s corresponds to a compression rate of 1,55.

Throughput of a single FC adapterThe FC adapter may be of influence only if there is a bottleneck in the overall process. A unit test was performed using multiple database backups in parallel, but having the tapes connected via one FC adapter only. By using a single adapter, we observed a saturation at 175 MB per sec.

Combining the results of tape and FC adapter, it seems best to dedicate an FC adapter per connected tape drive. On the other hand, if for example two tapes drives are connected via one adapter only, the throughput will be limited to about 62% (LTO-3) or 88% (LTO-2) of the possible value.

Tivoli Data Protection DB2 backup scheduling capabilitiesThis feature may influence the duration of phase 4 and needs to be aligned to the available hardware (for example, to the number of tape drives available for backup purposes).

Number of paths to LUN

Duration(hh:mn)

2 00:45

4 01:30

8 02:45

Tape type Backup throughput

LT03 140

LT02 99


In order to launch a FlashCopy backup of a multipartitioned database, for each LPAR hosting DB partitions, a dedicated tdpdb2hdw process has to be started on the backup server. The production database is distributed across five LPARs, so five tdpdb2hdw processes have to be started in parallel. All five processes will communicate with each other.

In any case, the backup needs to be performed sequentially, with partition 0 first. Each tdpdb2hdw process will control the scheduling for its own database partitions. The backups scheduled out of different tdpdb2hdw processes will be done in parallel.

For one individual tdpdb2hdw process, the scheduling for the backups of the database partitions is controlled by a TDP profile parameter: DB2_EEE_PARALLEL_BACKUP. If DB2_EEE_PARALLEL_BACKUP.

� If this parameter is set to NO (default setting), then all database backups aligned to one tdpdb2hdw process will be done sequentially.

� If it is set to YES, then the backups aligned to the tdpdb2hdw process are done in parallel.

Because the different tdpdb2hdw processes will run in parallel, the backups of all 32 database partitions (except partition 0) will be done in parallel. To be able to do so, sufficient tape drives have to be available.

In our test environment, we used 16 LTO3 tape drives. This is less than the number of partitions (32 partitions plus partition 0). However, it is possible to achieve the start of 32 backups in parallel while having only 16 tape drives available by using the following parameter values:

� DB2_EEE_PARALLEL_BACKUP must be set to YES.

� Tivoli Storage Manager server options RESOURCETIMEOUT and MAXSESSIONS must be set to sufficient high values.

All backups will then be started in parallel. The second half of the session must wait for the first half to complete; then the second set will occupy the tape resources. This option may be satisfactory for a test environment; however, we do not recommend it for a production environment.

When running four backup streams (SYS1db1p, SYS1db2p, SYS1db3p, SYS1db4p) in parallel and varying the number of sessions (1, 2, or 3), the total throughput will not depend linearly on the number of sessions (equivalent of the number of available tape drives). Running more backups in parallel (with having only one tape session) is advantageous compared to running fewer backups with more tape sessions.

Figure 5-9 on page 214 shows the throughput by number of tapes. The data points displayed in blue diamond shapes (at the bottom and middle of the figure) show the aggregated throughput (sum over all backup stream and sessions) in GB per hour for all DB2 backup using one, two, or three sessions. Because four backup streams are run in parallel, then in total four, eight, and twelve tapes are used concurrently.

Obviously, the overall rate does not scale linearly; taking three times more tapes (12) does not even double the total throughput. The data point displayed in the pink square (at the upper right corner of the figure) shows the aggregated throughput for running 16 backup streams in parallel, with each DB2 backup done using one session.


Figure 5-9 Backup throughput by number of tapes

DB2 backup command argumentsDB2 is able to self-optimize the backup depending on the settings of the number of buffers, the buffer size, and the parallelism. Compared to the initial settings provided by the customer, the throughput without specifying any special arguments is about 10% less. No significant change could be achieved by modifying the settings: for the backup runs, the number of buffers was set to 14, the buffersize was set to 1024 and the parallelism was set to 8.

Figure 5-10 shows the throughput per number of sessions.

Figure 5-10 Backup performance depending on number of sessions

total backup throughput vs. number of LTO3 tapes used

1 session each

2 sessions each

3 sessions each

1 session each

0

1000

2000

3000

4000

5000

6000

0 4 8 12 16 total # of tapes used for the backup

total throughput [GB/h]

4 parallel backup streams 16 parallel backup streams

Average Rate per Session [GB/h]Backup SYS3

371

417

215

249

210

0,00

50,00

100,00

150,00

200,00

250,00

300,00

350,00

400,00

450,00

-0,5 4,5

Rate [GB/h]

backup db EB8 using 1 sessions backup db EB8 using 1 sessions with 14 buffers buffer 1024 parallelism 8backup db EB8 using 4 sessions backup db EB8 using 4 sessions with 14 buffers buffer 1024 parallelism 8backup db EB8 using 4 sessions with 14 buffers buffer 1024 parallelism 8

sys3db0p Partition 0 sys3db1p

Partition 6 ... 13sys3db2p



Partition 30 ... 37


Chapter 6. Test results

This chapter presents the results of the tests that were performed in the environment as described in the previous chapters. It covers the following areas:

� Online test performance

� Infrastructure test results

6


6.1 On line tests: the KPI-A results

The KPI-A is a combined load of query throughput, InfoCube load, and aggregate rollup, as described here:

� Query throughput

The objective was to run 50 navigations per minute per query for 100 concurrent users; that is, to simulate 0.8 concurrent navigations per second. The average response time had to be less than 20 seconds.

The load was driven by injector scripts; the scenario selected the target InfoCubes randomly, ranging over 50 InfoCubes. The target InfoCubes had 40 million records.

� InfoCube load

The target was to write 25 million rows per hour to any number of target objects, the target objects being InfoCubes.

� Aggregate rollup

The target was a throughput of 5 million records per hour.

The KPI-Frame definitionThe KPI-Frame is defined as the point where all loads are running concurrently. We built a specific diagram, shown in Figure 6-1 on page 217, to illustrate the results. This diagram was generated by an Excel tool that was written to show the achievements of the concurrent combined load in phase1.

This diagram represents two cube aggregation jobs which defined the runtime frame. The aggregation jobs ran longest and extended beyond the highload phase of the test.

The dataload graphic depicts eight concurrent load requests (a load request being one info-packet. An info-packet represents the throughput of one extractor to all its target cubes).

The data for the aggregation achievement and the load throughput were taken from SAP st03 statistics. The query throughput and response times for the test in regard to the KPIs are also taken from the st03 for the KPI-Frame time frame.

The graphical depiction of the query response times over a time access were taken from the injector statistics. These statistics from LR are only used for the graphical view to show the behavior during highload phase.


Figure 6-1 The KPI Frame

6.1.1 KPI-A 7 TB results

This section presents the results of the query, aggregate rollup, and upload scenarios with a DB data warehouse of 7 TB.

Results for the query scenario Figure 6-2 shows the detailed results for the query scenario.

Figure 6-2 Query scenario results: details

Table 6-1 is a summary of the results, which shows a final number of 1.68 navigations/sec.

Table 6-1 Query scenario results: summary

Throughput

0

5.000.000

10.000.000

15.000.000

20.000.000

25.000.000

30.000.000

35.000.000

14:2

5

14:3

4

14:4

2

14:5

1

14:5

9

15:0

8

15:1

6

15:2

5

15:3

3

15:4

2

15:5

0

15:5

9

16:0

8

16:2

5

16:3

3

16:4

2

16:5

0

16:5

9

17:0

7

17:1

6

17:2

4

17:3

3

17:4

1

17:5

0

17:5

8

18:0

7

Rol

lup

& U

p-Lo

ad (R

ecor

ds/h

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

20.00

Que

ry R

espo

nse

Tim

e


KPI-Frame

Query - LR Aggregates DataLoad

Query RspTimeRecords

Loaded (sum per group)

Results

Job start time 14:00:00

Job stop time 18:36:05

Elapsed time 04:36:05

Total navigational steps 12,627

Elapsed time KPI 02:05.00

Total navigation steps per second 1.68

Average navigation steps per mn 101.02

Average response time 13.60

Chapter 6. Test results 217

Results for the aggregate rollup scenario Figure 6-3 presents the results of the aggregate rollup test where:

� A shows the result of infocube ZGTFC060.

� B shows the result of infocube ZGTFC030.

� C is the total by infocube.

� 1 and 2 are the numbers of records loaded per hour.

Figure 6-3 Aggregate rollup scenario final results: details

Table 6-2 is a summary of the results, which shows a final number of 6.493.607 records per hour.

Table 6-2 Aggregate rollup scenario final results: summary

Results for the upload scenario Figure 6-4 on page 219 shows the results details for the upload scenario.

Rollup 1 Rollup 2 Total

Start date 20.07.2006 20.07.2006

Job start time 14:30:37 14:30:37

Job stop time 18:05:08 18:04:42

Elapsed time 03:34:31 03:34:05

Total records inserted 6,934,934 6,346,817

Elapsed Time KPI 02:02:55 02:02:30

Records per hour 3,385,005 3,108,603 6,493,607

Note: The elapsed time KPI is defined in “The KPI-Frame definition” on page 216.


Figure 6-4 Upload scenario results: details

Table 6-3 is a summary of the numbers by upload number, which shows a final number of 25,353,800 records per hour.

Table 6-3 Upload scenario results: summary by upload number

6.1.2 KPI-A 20 TB results

Table 6-4 presents the results for the KPI-A 20 TB tests.

Table 6-4 KPI-A 20TB results

Note that all the three targets have been reached, whatever the number of application servers, even if increasing the number of application servers improves the results.

Table 6-5 on page 220 presents the CP resource utilization for the KPI-A 20 TB test.

Upload 1

Upload 2

Upload 3

Upload 4

Upload 5

Upload 6

Upload 6

Upload 8

Total

Job start time 15:00:59 15:00:59 15:00:59 15:00:59 15:01:05 15:01:05 15:01:05 15:01:05

Job stop time 17:10:51 17:10:51 17:10:51 17:10:51 17:10:26 17:10:26 17:10:26 17:10:26

Elapsed time 02:09:52 02:09:52 02:09:52 02:09:52 02:09:21 02:09:21 02:09:21 02:09:21

Records inserted 6,529,019 6,529,019 6,529,019 6,529,019 6,529,019 6,529,019 6,529,019 6,529,019

Elapsed time KPI 02:09:52 02:09:52 02:09:52 02:09:52 02:09:21 02:09:21 02:09:21 02:09:21

Records per hour 3,016,642 3,016,642 3,016,642 3,016,642 3,028,380 3,028,380 3,028,380 3,028,380 24,180,088

Ramp-up data read time

00:06:00

Records per houra

a. Excluding ramp-up time

25,353,000

Test Required Achieved 7 TB Achieved 20 TB(2 AS)

Achieved 20 TB(4 AS)

Query:� transaction/sec� average RT

0.8< 20 sec

1.6813.60

1.1912.9

1.2911.5

Cube load 25,000,000 25,353,800 26,920,390 27,591,632

Aggregate rollup 5,000,000 6,493,607 6,206,325 6,927,170


Table 6-5 KPI-A CP utilization

Table 6-6 presents the memory resource utilization for the KPI-A 20 TB test.

Table 6-6 KPI-A Memory utilization

All the KPI-A tests show that the objectives were reached, and they demonstrate the good scalability of the environment.

The KPI-A was performed using AIX Version 5.2, as requested in the initial project. However, tests were run after moving to AIX Version 5.3; their results are provided in the next section.

6.1.3 KPI-A53 20 TB results

The KPI-A53 test is the same test as the KPI-A, but was performed using AIX Version 5.3 instead of AIX Version 5.2. Using Version 5.3 with a POWER5 system offers the benefit of exploiting the new Simultaneous Multi-Threading (SMT) feature.

Simultaneous Multi-Threading is the ability of a single physical processor to simultaneously dispatch instructions from more than one hardware thread. In AIX 5L Version 5.3, a dedicated partition created with one physical processor is configured as a logical two-way by default.

The basic concept is that no single application can fully saturate the processor, so it is better to have multiple applications providing input at the same time. Two hardware threads can run on one physical processor at the same time. Using SMT is a good choice when overall throughput is more important than the throughput of an individual thread. Web servers and database servers, for example, are good candidates for SMT.

Multiple tests were performed, as described here:

� Using AIX 5.3 with SMT OFF for the application servers, while using AIX 5.2 for the DB server

Physical CPUs total KPI-A 7 TB KPI-A 20 TB(2 AS)

KPI-A 20 TB(4 AS)

For application 72.75 73.58 72.01

For DB 36.28 32.67 34.36

Overall 109.02 106.25 106.38

Component 7 TB 20 TB

Memory(Average-MB)

Working Storage(Max-MB)

Memory(Average-MB)

Working Storage(Max-MB)

Application 123,257 141,241 82,096 99,826

DB 186,849 192,870 203,262 206,546

Overall 310,106 334,111 285,361 306,372

Note: Some measurements for the KPI-D are provided in 1.2.4, “Online test results summary” on page 39, but the architecture needs to be changed to succeed completely. It was decided to run the KPI-D tests in Phase 2 of the project, which is not documented in this IBM Redbook.

Using Simultaneous Multi-Threading on IBM System p.


� Using AIX 5.3 with SMT ON for the application servers, while using AIX 5.2 for the DB server

� Using AIX 5.3 for both the application servers and the DB server with SMT OFF

� Using AIX 5.3 for both the application servers and the DB server with SMT ON

Two application models were tested:

� A hybrid model, where one ODS source used two batch extractors to populate four InfoCubes and the second ODS source used one batch extractor to populate its four InfoCubes

� A dual extractor model, where every ODS source used two batch extractors each

We wanted to measure the impact of AIX version and for these KPI-A53 tests, we measured only the number of records loaded. The queries and the aggregate rollup were beyond the scope of these tests.

The results are listed in Table 6-7; they show that the SMT feature provides a performance advantage.

Table 6-7 KPI-A53 Results

6.2 Infrastructure tests results

Next, we look at the detailed results of all the infrastructure tests. For each test, we provide:

� The objective� The results� Observations that may illuminate other situations and help to provide better results

6.2.1 KPI-1 results: flashback and roll forward

ObjectiveFlashback and roll forward of 500 GB in less than eight hours.

ResultsFigure 6-5 on page 222 displays the log retrieve and recovery for all DB partitions

Test App DB SMT Type CPs Records/CP CP resources

1 5.3 5.2 OFF 42 240,927 119

2 5.3 5.2 ON 42 251,737 115

3 5.3 5.2 ON 58 289,463 102

4 5.3 5.2 ON 64 292,588 100

5 5.3 5.2 ON Hybrid 64 337,590 107.5

6 5.3 5.3 ON Hybrid 64 356,680 101.4

7 5.3 5.3 ON Dual extractor 64 433,642 99.8

8 5.3 5.3 OFF Dual extractor 64 297,489 108.5


Figure 6-5 KPI-1: Log retrieve and recovery for all DB partitions

Referring to Figure 6-5, the following timing can be observed (expressed in hh:mn):

� 00:00 (10:32 on the clock) → The TDP processes are invoked on all DB servers.

– One tdpdb2hdw process is started on each server hosting the DB partitions.

– Tivoli Data Protection checks for available backups (FC and tape), the configurations on the DB servers, the source target volumes relationship, the filesystems, and so on.

� 00:15 → TDP invokes the flashback for all relevant DB volumes.

– It unmounts/removes all filesystems and exports volume groups.

– It starts FlashBack for all volumes using the CIM agent.

– It acquires disk devices and volume groups for all FC volumes and mounts filesystems.

– It mounts all FC filesystems on the backup server.

� 00:40 → Reactivation of the database.

– It runs “db2inidb as mirror”.

– The manual “retrieve of logs” is skipped due to direct archiving.

� 00:55 → The Tivoli Data Protection processing is finished.

� 01:00 → The roll forward starts.

– The direct archiving retrieves logs directly from Tivoli Storage Manager during the roll forward.

� 04:30 → End of the roll forward recovery of 850 GB of logs.

Figure 6-6 on page 223 shows the log retrieve and recovery per DB server.

LOG retrieve and recovery(all DB partions)

0

100

200

300

400

500

600

700

800

900

1000

10:00:00 10:30:00 11:00:00 11:30:00 12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00

time

amou

nt o

f rec

over

ed lo

gs [

GB

]

Start of tdpdb2hdw (Flashback) End of tdpdb2hdw (Flashback) LOG retrieve and recovery

ROLLFORWARD Start ROLLFORWARD END


Figure 6-6 KPI-1: Log retrieve and recovery per DB server

Observations� In the direct archiving step, each DB partition retrieves its logfile individually and opens

direct sessions to TSM. This is a constraint for the number of tape drives and the number of cartridges during the first phase (fewer tape drives than number of partitions).

In summary, avoid waiting on tapes during roll forward recovery while retrieving logs from disk storage pool.

� The roll forward recovery is limited by CPU resource availability.

� The automatic roll forward recovery step is not automatically invoked out of TDP because the time-out is reached.

� If more data has to be managed, one recommendation is to archive log files to the disk storage pool first, and then collocate log files to separate cartridges, each partition owning cartridges. That would allow you to retrieve log files directly from tape during the roll forward phase. This approach would require different Tivoli Storage Manager nodenames per DB partition, sufficient tape cartridges, and having tape drives available in parallel.

6.2.2 KPI-2 results: database restore and roll forward

ObjectiveDatabase restore using Tivoli Storage Manager server and roll forward of 2 TB of logs in less than 18 hours, and successful start of the SAP system.

EnvironmentFigure 6-7 on page 224 displays the components of this test:

� 16 tapes are linked to 33 DB partitions.

� One DS8000 is dedicated to the test.

Configuration:

- SYS1DB0

- 10 CPU, 32 GB Memory

- SYS1DB1 – 4

- 4 CPU, 32GB Memory

LOG retrieve and recovery(per DB server)

0

50

100

150

200

250

10:00:00 10:30:00 11:00:00 11:30:00 12:00:00 12:30:00 13:00:00 13:30:00 14:00:00 14:30:00 15:00:00 15:30:00

time

amou

nt o

f rec

over

ed L

OG

s[G

B]

DB part 0 DB part 6 to 13 DB part 14 to 21 DB part 22 to 29 DB part 30 to 37


Figure 6-7 Components of the KPI infrastructure tests

ResultsThis test is comprised of two steps: the restore, and the roll forward.

� Figure 6-8 shows the results of the restore step. Each individual backup uses two sessions; the four parallel backups used eight LT03 tapes. The following timing can be observed (expressed in hh:mn):

– 00:00 → The SAP system DB08 is stopped.– 00:15 → The restore is started.– 07:40 → The restore is finished.

Figure 6-8 KPI-2: restore results

� Figure 6-9 on page 225 shows the results for the roll forward step. The following timing can be observed (expressed in hh:mm):

– 00:00 → The roll forward is started.– 03:34 → The roll forward recovery phase is stopped.– 04:24 → The roll forward is stopped.– 04:40 → The SAP system is up and running.

distribution of data across all 33 database partitions

(total amount of data backed up: 15770 GByte)

1 1

7

10

67

1

0

2

4

6

8

10

12

<250 275 325 375 425 475 525 575

size of backup image per partition [GB]

num

ber

DS 8300

Source1

16

16 LTO3 tapesassigned toDB partitions(LANfree)

Restore scheduling and runtime4 parallel restore streams having 2 sessions each

partition0

partition 30

partition 22

partition 14

partition 6

partition 15

partition 7

partition 23

partition 31

partition 8

partition 16

partition 24

partition 32

partition 9

partition 17

partition 25

partition 33

partition 10

partition 18

partition 26

partition 34

partition 11

partition 19

partition 27

partition 35

partition 12

partition 20

partition 28

partition 36

partition 13

partition 21

partition 29

partition 37

00:00:00 02:00:00 04:00:00 06:00:00 08:00:00

sys1db0p

sys1db1p

sys1db2p

sys1db3p

sys1db4p

time


Figure 6-9 KPI-2: roll forward results

The total run time for this test is the summary of the restore and the roll forward times: 12 hours and 20 minutes.

Observations� The restore sequence for the partitions and the amount of sessions have to be the same

as during the backup sequence.

� The roll forward recovery is limited by the CPU resources availability.

� One recommendation would be to provide one 2 GB or 4 GB adapter per tape to avoid bandwidth limitations.

6.2.3 KPI-3a results: FlashCopy with no online workload

ObjectiveTape backup of a FlashCopy in less than 8 hours (no online workload).

ResultsFigure 6-10 on page 226 shows the backup scheduling and runtime.

Retrieve/Roll forward of 2 TB of logs

0

400

800

1200

1600

2000

0:00:00 1:00:00 2:00:00 3:00:00 4:00:00time

amou

nt o

f log

s [G

B]

sumpartition 0partition 6 ... 13partition 14 ... 21partition 22 ... 29partition 30 ... 37


Figure 6-10 KPI-3a: backup scheduling and runtime

Figure 6-11 shows the total throughput.

Figure 6-11 KPI-3a: total throughput

Figure 6-12 on page 227 shows the McData view.

partition 30 partition 31 partition 32 partition 33 partition 34 partition 35 partition 36 partition 37




partition 0

flashcopy andmount phase

B ackup schedu ling and run tim e

time

sys1db0p

sys1db1p

sys1db2p

sys1db3p

sys1db4p

00:00:00 02:00:00 04:00:00 06:00:00

total throughput (fibre channel disk adapter) vs. time4 backups running in parallel using 2 sessions each

0

200

400

600

800

1000

1200

0:00:00 2:00:00 4:00:00 6:00:00

time

thro

ughp

ut [M

B/s

]

to tal read [MB/s] average (265 MB/s) average (913 MB/s)


Figure 6-12 KPI-3a: McData view

The following timing can be observed (in hh:mn):

� 00:00 → Tivoli Data Protection is started on the backup server. There is one process for each LPAR hosting the DB partitions.

– Tivoli Data Protection checks the configuration and the source-target volumes relationships.

� 00:10 → Tivoli Data Protection invokes flashcopy for all relevant DB volumes.

– The database is set to “write suspend” for all DB partitions.

– Flashcopy is started for all LUNs via the CIM agent.

– The database is set to “write resume” for all DB partitions.

� 00:55 → Mount FC LUNs on the backup server done.

– Acquires disk devices, volume groups and mounts filesystems.

– db2inidb runs in standby.

– The backup of partition 0 (with two sessions) is started.

� 01:15 → The backup of partition 0 is finished.

– The backup for the remaining partitions is started. There is one stream for each DB server; in our test we have four streams in parallel.

– Each individual backup uses two tapes; in our test, eight tapes are used.

� 06:15 → The backup of all partitions is finished; start of the unmount of filesystems, the export of volume groups, and the removal of disk devices.

� 06:35 → The cleanup on the backup server is finished.

Observations� A gap in the db2nodes.cfg is not allowed. In our case, the DB partitions were not involved

in the test (partitions 1 to 5) with a temporary empty tablespace.

� The background copy for the full amount of data (24 TB) takes about 15 hours; the incremental flashcopy after the first daily workload takes less than one hour.

� Four parallel backups to eight LTO3 tape drives consume about 6 CPs. We observed that the CPU utilization scales linearly with the rate of transmitted data.

� The data rate for a single DB2 backup does not scale linearly with the number of sessions and we observed some saturation. Given a fixed number of available tapes, it may be

Sum (all LTO3 tapes)

0200000400000600000800000

10000001200000

9:00:00 11:00:00 13:00:00 15:00:00 17:00:00

time

thro

ughp

ut

Tx


advantageous to increase the number of backups for the DB2 partitions, and decrease the number of sessions for each backup.

� The task of mounting the flashcopy target devices on the Tivoli Storage Manager server takes a noticeable amount of time in the overall process.

� If there is more data to manage, we recommend the following:

– Use more LTO3 tape drives to exploit parallel backup of all partitions.

– Avoid mounting the target devices on a single Tivoli Storage Manager server, and use several data movers to optimize the mount phase for the flashcopy target devices.

– Use the Subsystem Device Driver Path Control Module (SDDPCM) to reduce the number of devices and optimize the mount task for the flashcopy target devices.

6.2.4 KPI-3b results: FlashCopy with online workload

ObjectiveFlashCopy in less than 8 hours with online workload, and observe the queries and load response times.

The online workload was comprised of online queries (25 navigations per minute, with a target average response time of 20 sec) and a load of 25 million records per hour.

MethodologyAs illustrated in Figure 6-13 on page 229, the steps to run this test were as follows:

� The initial state consisted of production LUNs (also called source FlashCopy disks) and backup LUNs (also called target FlashCopy disks) fully synchronized after a first FlashCopy (arrow 0 on Figure 6-13 on page 229).

� Then the content of the production LUNs was changed to reflect the expected modifications in the customer production system: an average of 8 hours per day representing 200 million records of upload activities and 18 hours of queries, as shown with arrow 1.

� The test was the combination of the incremental backup of the database and the online activities (queries and load) for 6 hours (arrow 2).

� The measurement of the percentage of the degradation because of the FlashCopy activity has been reported as part of the deliverable of this test.


Figure 6-13 KPI-3b steps

ResultsFigure 6-14 displays the results of the reference run, the run without FlashCopy. The wide blue line at the bottom of the figure illustrates the period of time when the queries are running, from17h25 to 23h40.

Figure 6-14 KPI-3b: reference run

A

PROD BKUP (FC)

Online prod% delta by Upload 200 million records + Queries

1

B

2

A

2 FC with 8 hours % delta to background copy

C B

Online ProdLoad tests

0 AInit State

Init State for KPI 3B

Final State for KPI 3B

0

Preparation for the test 18 hours

+ REFERENCERUN

KPI RUN

AllBlocks In sync

Runtimes

17:2

0:00

17:3

0:00

17:4

0:00

17:5

0:00

18:0

0:00

18:1

0:00

18:2

0:00

18:3

0:00

18:4

0:00

18:5

0:00

19:0

0:00

19:1

0:00

19:2

0:00

19:3

0:00

19:4

0:00

19:5

0:00

20:0

0:00

20:1

0:00

20:2

0:00

20:3

0:00

20:4

0:00

20:5

0:00

21:0

0:00

21:1

0:00

21:2

0:00

21:3

0:00

21:4

0:00

21:5

0:00

22:0

0:00

22:1

0:00

22:2

0:00

22:3

0:00

22:4

0:00

22:5

0:00

23:0

0:00

23:1

0:00

23:2

0:00

23:3

0:00

23:4

0:00

Load

Queries Rollup 1 Rollup 2 Backup Upload 1 Upload 2 Upload 3 Upload 4 Upload 5 Upload 6 Upload 7 Upload 8


Figure 6-15 displays the results with the incremental FlashCopy. The short grey line shows the FlashCopy time duration, in this case from 12h25 to 13:00, with the queries running from 11h55 to 18h05.

Figure 6-15 KPI-3b: Incremental FlashCopy

Figure 6-16 displays the result of the full FlashCopy. The FlashCopy runs from 19:20 to 09:35, with the queries running from 20:40 to 02:50.

Figure 6-16 KPI-3b: Full FlashCopy

Table 6-8 on page 231 summarizes the results.

Runtimes

11:5

0:00

12:0

0:00

12:1

0:00

12:2

0:00

12:3

0:00

12:4

0:00

12:5

0:00

13:0

0:00

13:1

0:00

13:2

0:00

13:3

0:00

13:4

0:00

13:5

0:00

14:0

0:00

14:1

0:00

14:2

0:00

14:3

0:00

14:4

0:00

14:5

0:00

15:0

0:00

15:1

0:00

15:2

0:00

15:3

0:00

15:4

0:00

15:5

0:00

16:0

0:00

16:1

0:00

16:2

0:00

16:3

0:00

16:4

0:00

16:5

0:00

17:0

0:00

17:1

0:00

17:2

0:00

17:3

0:00

17:4

0:00

17:5

0:00

18:0

0:00

18:1

0:00

Load

Queries Inc. Flashcopy Upload 1 Upload 2 Upload 3 Upload 4 Upload 5 Upload 6 Upload 7 Upload 8

Runtimes

19:1

5:00

19:4

0:00

20:0

5:00

20:3

0:00

20:5

5:00

21:2

0:00

21:4

5:00

22:1

0:00

22:3

5:00

23:0

0:00

23:2

5:00

23:5

0:00

00:1

5:00

00:4

0:00

01:0

5:00

01:3

0:00

01:5

5:00

02:2

0:00

02:4

5:00

03:1

0:00

03:3

5:00

04:0

0:00

04:2

5:00

04:5

0:00

05:1

5:00

05:4

0:00

06:0

5:00

06:3

0:00

06:5

5:00

07:2

0:00

07:4

5:00

08:1

0:00

08:3

5:00

09:0

0:00

09:2

5:00

09:5

0:00

Load

Flashcopy Queries Upload 1 Upload 2 Upload 3 Upload 4 Upload 5 Upload 6 Upload 7 Upload 8


Table 6-8 KPI-3b: Results summary

Observations� The storage capacity equivalent of the modifications in the production system in an

average 8-hour day was about 400 GB, which is less than 2% of the overall FlashCopy capacity.

� The 280 FlashCopy Relationships were established in 2 minutes.

� The background copy took maximum 13h24 (24.3 TB) with DS8K R1, which is an average of 1.8 TB per hour.

� Up to 16 relationships background copy in parallel; 4 relations per DA (2 sources and 2 targets). 8 DA used as source and target > 16 relations.

� The impact on production was minimized because of a maximum of 16 LUNs sources to 16 LUNs target on 32 different arrays (of a total of 64).

� Maximize parallelism (32 arrays on a total of 64).

� Average data rate all ranks = 1100 MB/sec, with a maximum of 1350™.

6.2.5 KPI-3c results: tape backup with online workload

ObjectiveObserve the impact of a tape backup of the DB2 database on the online workload.

The online workload is comprised of online queries (25 navigations per minute, with a target average response time below 20 sec), cube load with a target of 25 million records per hour, and rollup aggregate with a target of 5 million records per hour.

MethodologyThe backup was done sequentially, as follows:

1. DB2 partition 0 using 4 sessions.

2. One DB2 partition sequentially per LPAR on 2 sessions. With the 4 LPARs for DB2, we used 8 drives in parallel for DB partitions 6 to 37.

To perform the backup, we used a main script do_backup. This script executed the script named backup_node0.ksh. Then we started four scripts in the background (one script per LPAR), named backup_sys3db1.ksh to backup_sys3db4.ksh.

The do_backup script started under user db2eb8, which owned the DB2 database on the LPAR that contained DB partition 0 (sys3db0).

Workload Parameter Target Base Incremental FC Full FC

Queries

Average Nav steps/sec

> 0.42 0.85 1.13 0.99

Average Nav steps/mn

> 25,00 51.25 67.69 59.37

Average Response Time

< 20.00 sec 14.60 10.80 8.50

Loads

Throughput > 25,000,000 27,179,748 26,987,376 27,028,935


These scripts are provided in Appendix C, “Scripts for storage” on page 295.

Here is the main command of the backup_node0.ksh script:

db2_all "<<+0< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 4 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"

The main parameters of this command are described in Table 6-9. (Note that the backup and the restore operations need to use the same values.)

Table 6-9 Script main parameters

For the backup_sys3db*.ksh scripts, the same command was done 8 times (one per DB2 partition on this LPAR), and the number of sessions was set to 2. To specify the partition you want to back up, the zero in "<<+0<" is changed by the number of the DB2 partition to be backed up.

The backup was performed with <UTIL_IMPACT_LIMIT> set to <50%> in DB2 to limit the impact of the backup.

Parameter Value Description

sessions 4 for DB partition 02 for the other DB partitions

Sets the number of LTO drives to back up this DB partition.

buffers 14 Specifies the number of buffers to be used for the DB2 backup.

buffer 1024 Specifies the size, in 4 KB pages, of the buffer used when building the DB2 backup image.

parallelism 8 Specifies the number of tablespaces which can be read in parallel by the DB2 backup.


Figure 6-17 KPI-3c steps

ResultsFigure 6-18 displays the KPI-A results, which were used as the reference run for this test. This graph shows the average for the cube load of 27.6 million records per hour and the aggregate rollup of 6.9 million records per hour (left scale). The blue curve with the scale on the right shows the response time for the query load. On average, the response time for queries was 11.55 seconds.

Figure 6-18 KPI-3c profiling: KPI-A results reference

Figure 6-19 on page 234 shows the results of the same workload during the tape backup.

LPAR sys3db1Node 13Node 12Node 11Node 10Node 9Node 8Node 7DB part 6




LPAR sys3db0

Data Load

Aggregates Rollup

Queries

Backup Node 0 Backup Nodes 6 to 37

Node 5Node 4Node 3Node 2Node 1DB part 0

Total= 6h58

t

activity

Throughput

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,000

20:0

2

20:1

1

20:1

9

20:2

8

20:3

6

20:4

5

20:5

3

21:0

2

21:1

0

21:1

9

21:2

7

21:3

6

21:4

4

21:5

3

22:0

1

22:1

0

22:1

9

22:2

7

22:3

6

22:4

4

22:5

3

23:0

1

23:1

0

23:1

8

23:2

7

23:4

4

23:5

2

0:01

0:09

0:18

0:27

0:35

0:44

Rol

lup

& U

p-Lo

ad (R

ecor

ds/h

)

0

5

10

15

20

25

30

Que

ry R

espo

nse

Tim

e



Figure 6-19 KPI-3c test results

Table 6-10 lists the KPI-3c results compared to the KPI-A reference.

Table 6-10 KPI-3c: Results

Observations� The InfoCube upload is not impacted by the backup, because this activity is mostly

CPU-intensive on the application servers, but accesses the database very little.

� The aggregate is slightly more impacted by the backup.

� The queries activity is a heavy user of the database and do may reads on the disk storage subsystem. However, even if the performance of the queries decreases by 35%, the average response time is still within the target of 20 seconds.

� The CPU usage, observed in Figure 6-20 on page 235, shows user time slightly higher during KPI-3c than during KPI-A (26% versus 35%).

The system time is more affected by the backup; the average system time increases from 16% for KPI-A to 29% for KPI-3c. This is due to the high number of I/O s performed for the backup and to the usage of the loop back communication between DB2 and the TSM Storage Agent. Figure 6-20 on page 235 shows the CPU resource usage in the KPI-A reference test versus the KPI-3c test.

Throughput

0

5,000,000

10,000,000

15,000,000

20,000,000

25,000,000

30,000,000

35,000,00016

:06

16:1

5

16:2

3

16:3

2

16:4

0

16:4

9

16:5

7

17:0

6

17:1

5

17:2

3

17:3

2

17:4

0

17:4

9

17:5

7

18:0

6

18:1

4

18:2

3

18:3

1

18:4

0

18:4

8

18:5

7

Rol

lup

& U

p-Lo

ad (R

ecor

ds/h

)

0

5

10

15

20

25

30

Que

ry R

espo

nse

Tim

e


Target KPI-A KPI-3c Delta to KPI-A

Load throughput (records/hour) > 25,000,000 27,591,632 27,858,407 1.00%

Aggregate rollup (records/hour) > 5,000,000 6,927,170 6,043,773 -14.6%

Queries average nav steps/sec > 0.80 1.29 0.94 -37.8%

overlarge nav steps/mn > 48.00 77.60 56.30 -37.8%

average response time (sec) < 20.00 11.55 17.70 +35%


Figure 6-20 KPI-3c: CPU usage

On the next two graphs, summarized in Figure 6-21 on page 236, we compare the disk cumulative fibre channel throughput between KPI-A and KPI-3c on one of the four LPARs that hosted eight DB2 partitions.

CPU Total sys3db1p 8/9/2006

0

10

20

30

40

50

60

70

80

90

100

21:3

5

21:3

9

21:4

3

21:4

7

21:5

1

21:5

5

21:5

9

22:0

3

22:0

7

22:1

1

22:1

5

22:1

9

22:2

3

22:2

7

22:3

1

22:3

5

22:3

9

22:4

3

22:4

7

22:5

1

22:5

5

22:5

9

23:0

3

23:0

7

23:1

1

23:1

5

23:1

9

23:2

3

23:2

7

23:3

1

User% Sys% Wait%

KPI-A CPU usage

KPI-3c CPU usageCPU Total sys3db1p 8/11/2006

0

10

20

30

40

50

60

70

80

90

100

16:4

4

16:4

8

16:5

2

16:5

6

17:0

0

17:0

4

17:0

8

17:1

2

17:1

6

17:2

0

17:2

4

17:2

8

17:3

2

17:3

6

17:4

0

17:4

4

17:4

8

17:5

2

17:5

6

18:0

0

18:0

4

18:0

8

18:1

2

18:1

6

18:2

0

18:2

4

18:2

8

18:3

2

18:3

6

User% Sys% Wait%


Figure 6-21 KPI-3c: Fiber channel resource usage

One LPAR uses four LTO3 tape drives. The throughput is more sustained with an average of 480 MB/s and with some picks around 580 MB/s.

Figure 6-22 on page 237 is a comparison of the cumulative tape throughput between the KPI-3c and a full backup with the same infrastructure (four tapes drives per LPAR), but without online activity and with the DB2 parameter <UTIL_IMPACT_LIMIT> set to <100%>.

We see on this graph that the throughput on the tape drive was reduced by 30% during KPI-3c. This was due to the DB2 parameter <UTIL_IMPACT_LIMIT> and the competition with the queries on the disk storage subsystems

Disk Adapter sys3db1p (KB/s) 8/8/2006

0

100

200

300

400

500

600

700

21:3

5

21:3

9

21:4

3

21:4

7

21:5

1

21:5

5

21:5

9

22:0

3

22:0

7

22:1

1

22:1

5

22:1

9

22:2

3

22:2

7

22:3

1

22:3

5

22:3

9

22:4

3

22:4

7

22:5

1

22:5

5

22:5

9

23:0

3

23:0

7

23:1

1

23:1

5

23:1

9

23:2

3

23:2

7

23:3

1

23:3

5

23:3

9

23:4

3

23:4

7

23:5

1

23:5

5

23:5

9

Thou

sand

s

fcs4_read fcs4_write fcs6_read fcs6_write fcs1_read fcs1_write fcs7_read fcs7_write scsi0_read scsi0_write

Disk Adapter sys3db1p (KB/s) 8/11/2006

0

100

200

300

400

500

600

700

16:4

4

16:4

8

16:5

2

16:5

6

17:0

0

17:0

4

17:0

8

17:1

2

17:1

6

17:2

0

17:2

4

17:2

8

17:3

2

17:3

6

17:4

0

17:4

4

17:4

8

17:5

2

17:5

6

18:0

0

18:0

4

18:0

8

18:1

2

18:1

6

18:2

0

18:2

4

18:2

8

18:3

2

18:3

6

Thou

sand

s

fcs4_read fcs4_write fcs6_read fcs6_write fcs1_read fcs1_write fcs7_read fcs7_write scsi0_read scsi0_write


Figure 6-22 Tape throughput comparison

Observations� The impact of the tape backup on the production activity is mainly on the users activities.

This impact is moderate or null on the batches like InfoCube upload.

� To reduce the impact of the backup, it's always possible to reduce the parameter <UTIL_IMPACT_LIMIT> to a lower value.

� The number of drives used during the backup is another parameter. To decrease the number of drives, you can decrease the number of session in the db2_backup command, which decreases the number of DB2 processes that read the data and so decrease usage of DB2 and the disk storage subsystem, and at the end decreases the impact on the other activities, like queries.

On the other hand, you are going to increase the duration of the backup.

� 8 drives, 2 per LPAR, are the minimum number of drives necessary to complete the backup within the requested time frame of 8 hours.

� For this test we backup sequentially each DB2 partition of one LPAR and we can observe some peaks in the throughput. With a bigger instance and the same number of DB2 partitions, it could be interesting to consider the backup of all 32 partitions in parallel, with only 1 drive per DB2 partition to try to decrease the peaks.

6.2.6 KPI-4 results: creation of indexes

ObjectiveCreate indexes of 3 TB data in less than 2 hours.The online workload is comprised of online queries (25 navigations per minute with a target average response time of 20 sec) and a load of 25 million records per hour.

Comparison KPI-3c versus backup with no activity

0

200

400

600

800

1000

1200

1400

00:00:00

00:15

:03

00:30

:06

00:45

:10

01:00:14

01:15

:18

01:30

:20

01:45

:24

02:00:2

8

02:15

:32

02:30

:35

02:45:3

8

03:00:42

03:15

:45

03:30

:48

03:45

:52

04:00:5

5

04:15:58

04:31

:02

04:46

:05

05:01

:09

05:16

:13

05:31

:16

05:46

:20

06:01:23

06:16:27

06:31

:31

06:46

:34

07:01

:38

07:16:41

Cum

ulat

ive

thro

ughp

ut A

ll ta

pe d

rives

in M

B/s

KPI 3c UTIL_IMPACT_LIMIT=50%, 2 Drives/Lpar Backup Only UTIL_IMPACT_LIMIT=100%, 2 Drives/Lpar


MethodologyThe first task was to find large tables based on the field db6tsize in the table db6treorg; 38 tables were identified for a total of 3,476,154,752 bytes.

The indexes were marked invalid with db2dart.

ResultsAs shown in the figures in this section, the runtime was 01h05mn: DB2 was started at 22:34:36; the index build was started at 22:36:12; the index build was finished at 23:39:09.

Figure 6-23 shows the total data rate on the Fibre Channel adapters, with three FC adapters per LPAR and a maximum throughput of 200 MB/s per adapter.

Figure 6-23 KPI-4: total data rate

Figure 6-24 on page 239 shows the read data rate.

Total Data Rate - Individual Ports

0

50

100

150

200

250

9:51

10:0

1

10:1

1

10:2

1

10:3

1

10:4

1

10:5

1

11:0

1

11:1

1

11:2

1

11:3

1

11:4

1

11:5

1

12:0

1

12:1

1

12:2

1

Time

Dat

a R

ate

(MB

/s)

File 1 - Port 0x0

File 1 - Port 0x1

File 1 - Port 0x2

File 1 - Port 0x3

File 1 - Port 0x30

File 1 - Port 0x31

File 1 - Port 0x32

File 1 - Port 0x33

File 1 - Port 0x100

File 1 - Port 0x101

File 1 - Port 0x102

File 1 - Port 0x103

File 1 - Port 0x130

File 1 - Port 0x131

File 1 - Port 0x132

File 1 - Port 0x133

File 1 - Port 0x200

File 1 - Port 0x201

File 1 - Port 0x202

File 1 - Port 0x203

File 1 - Port 0x230

File 1 - Port 0x231

File 1 - Port 0x232

File 1 - Port 0x233

File 1 - Port 0x300

File 1 - Port 0x301

File 1 - Port 0x302

File 1 - Port 0x303

File 1 - Port 0x330

File 1 - Port 0x331

File 1 - Port 0x332

File 1 - Port 0x333


Figure 6-24 KPI-4 Read data rate

Figure 6-25 on page 240 shows the CPU resource usage for DB2 partition 0 and for partition 1. DB2 partition 0 (SYS1DB0) was configured with 12 CPUs and 40 GB of memory; each of DB2 partitions 1 to 4 (SYS1DB1 to SYS1DB4) was configured with 10 CPUs and 40 GB of memory.

Read Data Rate - Individual Ports

0

20

40

60

80

100

120

140

160

180

200

9:51

10:0

1

10:1

1

10:2

1

10:3

1

10:4

1

10:5

1

11:0

1

11:1

1

11:2

1

11:3

1

11:4

1

11:5

1

12:0

1

12:1

1

12:2

1

Time

Dat

a R

ate

(MB

/s)

File 1 - Port 0x0

File 1 - Port 0x1

File 1 - Port 0x2

File 1 - Port 0x3

File 1 - Port 0x30

File 1 - Port 0x31

File 1 - Port 0x32

File 1 - Port 0x33

File 1 - Port 0x100

File 1 - Port 0x101

File 1 - Port 0x102

File 1 - Port 0x103

File 1 - Port 0x130

File 1 - Port 0x131

File 1 - Port 0x132

File 1 - Port 0x133

File 1 - Port 0x200

File 1 - Port 0x201

File 1 - Port 0x202

File 1 - Port 0x203

File 1 - Port 0x230

File 1 - Port 0x231

File 1 - Port 0x232

File 1 - Port 0x233

File 1 - Port 0x300

File 1 - Port 0x301

File 1 - Port 0x302

File 1 - Port 0x303

File 1 - Port 0x330

File 1 - Port 0x331

File 1 - Port 0x332

File 1 - Port 0x333


Figure 6-25 KPI-4: CPU resource usage for the DB2 partitions

Observations� The tests were done in constraint FC adapters environment: the number of FC channel

was reduced for project management purposes; as the PKI-4 objectives were met, they were not redone in an optimized environment.

� DB2 was nos optimized:

– The DB2 Intra_parallel parameter was enabled

– The Indexsort was set to NO to avoid disk I/O

– “Reconfiguration and decrease buffer pools and increase sortheaps (more in memory index creation)”


Chapter 7. Proposal for more scalability

The results of the tests described in this IBM Redbook have demonstrated that this environment was able to scale at the level requested by the customer with the expected management capabilities.

Looking to the future, the expectation is that the number of users will grow to such a level that a data warehouse of 60 TB needs to be investigated.

This chapter discusses how this growth can be sustained in terms of performance and manageability. It also describes the potential solutions and the options that need to be tested. The 60 TB tests were not performed at the time of the writing.

7


7.1 Options to scale from 20 TB to 60 TB

Table 7-1 summarizes the results of testing when scaling from 7 TB to 20 TB. From this table you can see that the targets were achieved, in terms of performance and manageability. This section discusses the lessons learned from the 20 TB tests and the options available to scale above 20 TB, with the target of sustaining a data warehouse of 60 TB.

Table 7-1 20 TB results summary

The 60 TB architecture must be built with the following three criteria:

� Performance and throughput

Demonstrate the ability of the technologies to support the proposed architecture with the requested performance numbers.

� Flexibility

Demonstrate the ability of the architecture to support a broad mix of workloads, including online queries, data loads, aggregate rollup, backup and restore, index rebuild, and flashcopy.

� Manageability

Demonstrate the management capabilities of the environment to support the hazards of a real production.

The high availability features are not considered in the proposed architecture.

Description KPI target KPI achievement

On-line KPI

KPI-A: 7 TB � load: 25 M records/hr� aggregate: 5 M records/hr� queries: 50 navigations/mn

with average RT of 20 sec

� load: 25,353,800 records/hr� aggregate: 6,493,607 records/hr� queries: 101 navigations/mn with average

RT of 13.6 sec

KPI-A-20 TB � load: 25 M records/hr� aggregate: 5 M records/hr� queries: 50 navigations/mn


� load: 27,993,432 records/hr� aggregate: 6,206,325 records/hr� queries: 77 navigations/mn with average RT

of 12.8 sec

KPI-A53 20 TB � load: 25 M records/hr� aggregate: 5 M records/hr� queries: 50 navigations/mn


� load: 41,223,971 records/hr� aggregate: 7,009,570 records/hr� queries: 71 navigations/mn with average RT

of 14.6 sec

Infrastructure KPIs

KPI-1 < 8 hr 04h30mn

KPI-2 < 18 hr 12h20mn

KPI-3a < 8 hr 06h35mn

KPI-3b < 8 hr 00h30mn

KPI-3c Measure degradation < 15%

KPI-4 < 2 hr 01h05mn


7.1.1 Lessons learned from the 20 TB tests

Looking at the 20 TB tests, the following recommendations can be made:

� The 33-partition layout for DB2 was a good choice in terms of manageability and performance.

� The partitioning layout was performing well for the chosen application.

� Using multiple LPARs per server gives a better approach than using one machine.

� Long-running DB2 queries could be better optimized using the optimizer enhancements and the DB2 V9 Statsview implementation.

Be aware that the SAP workload may change resource consumption at this level of terabytes. For example, the memory requirements may be higher than the current 196 GB limitation for DB2.

7.1.2 Improving performance in a BI environment

As a first step in improving performance in a BI environment, consider using the latest available hardware technologies:

� The new POWER5+ (2.2 GHz)� A new DS8000, the DS8300 Turbo� AIX 5.3 and its virtualization capability

Next, consider the use of the following features:

� Increase the number of aggregates.

Using aggregates reduces the runtimes of queries, and the rollup of data is optimized in a way that allows aggregates that are already rolled up and available to be used.

In addition to the runtime for the queries, a complete optimization must also take into account the dependencies of the aggregates, their memory requirements, and the time taken to roll up new data.

� Use the SAP BI accelerator option.

The BI accelerator (BIA) allows you to improve the performance of BI queries when data is read from an InfoCube. The data of a BI InfoCube is made available as a BI accelerator index in a compressed but not aggregated form.

BI accelerator is particularly useful in cases where relational aggregates or other BI-specific methods of improving performance (such as database indexes) are not sufficient, are too complex, or have other disadvantages.

� Use the OLAP cache.

OLAP is a core component of data warehousing and analytics. It gives users the ability to interrogate data by intuitively navigating from summary to detail data.

All OLAP solutions rely on a relational database management system to source and dynamically query data and to support drill-through reports. The cache mode determines whether, and in what ways, the query results and navigational states that are calculated by the OLAP processor as highly compressed data are to be saved in a cache.

You can set the respective mode in “customizing” as the standard value for an InfoProvider and in the “Query Monitor” for a query. Caching does bring advantages in terms of performance, particularly with more complex queries; it is recommended that complex data processed by the OLAP processor is held in the cache.

Observe the usual best practices to improve performances in a BI environment.

Chapter 7. Proposal for more scalability 243

However, using the cache is barely advantageous if query-relevant data is often changed and therefore has to be loaded frequently, because the cache has to be regenerated every time.

� Exploit the DB2 Performance Features for SAP BI.

The Database Partitioning Feature (DPF) allows customers to partition a database within a single server or across a cluster of servers. The DPF capability provides the customer with the benefits of scalability to support very large databases or complex workloads, and of increased parallelism for administration tasks.

� Run popular queries in background and push summary views of updated data to users.

� Increase the number of application servers.

That means having a very large number of blades, with Linux or AIX. However, the required flexibility criteria to better balance resources for a broad mix of workloads will not be satisfied.

� Exploit the DB2 Balanced Configuration Unit.

The IBM Data Warehousing Balanced Configuration Unit (BCU) combines IBM System p5, TotalStorage, and DB2 into a single, scalable offering that can run on UNIX, Linux, or AIX. A BCU solution represents hardware and software that IBM has integrated and tested as a pre-configured scalable solution for data warehousing systems. The BCU has been designed around the concept of a balanced infrastructure through the use of modular nodes or building blocks.

� Exploit MultiDimensional Clustering.

MultiDimensional Clustering (MDC) is a feature in DB2 (since Version 8.1) that is designed to improve the performance of large DB2 databases, particularly data warehouses and data marts, and dramatically improve the speed at which records can be retrieved. This DB2 feature improves the SAP query performance by a large factor; there is no administration to consider. However, using this feature may impact the application design.

� Use DB2 compression.

The DB2 compression feature, available with DB2 V9, allows you to reduce storage requirements, improve I/O efficiency, and provide quicker access to data from disk. This technology is capable of achieving storage cost savings by up to 50% or more, by reducing the amount of disk space (and disk sub-system peripherals) required for storing data. The size of database logs can also be reduced, because DB2 compresses user data within log records.

This technology can also improve performance in some scenarios, despite compression/decompression entailing CPU overhead. Accessing data from the disk is the slowest database operation. By storing compressed data on disk, then fewer I/O operations need to be performed on the disk to retrieve or store the same amount of data. Therefore, for disk I/O-bound workloads, the query processing time can be noticeably improved.

Furthermore, DB2 keeps the data compressed on both disk and memory (DB2 buffer pools), thereby reducing the amount of memory consumed, and freeing it up for other database or system operations. This can further improve database performance for queries and other operations.

7.2 The proposal

The following sections describe the environment proposed for the 60 TB tests.


7.2.1 Options selected

The following options have been selected for running new performance and manageability tests:

� SAP

SAP NetWeaver BI Netweaver Version 3.5 will be used with no functional changes.

� AIX

AIX 5L Version 5.3 will be used to add more management flexibility, because the virtualization features will help balance resources in a broad mix of workloads.

� Servers

– System p5 servers will be POWER5+ based.– Five System p5 595s with 64 CPUs each are expected.

� Storage

– Four DS8300 machines with 512 disks each will be used.– One disk subsystem will be allocated per DB system.– A shared nothing architecture will be implemented.

� DB2

– DB2 V9 will be used, because new functions available in Version 9 will improve performance, reduce the disk footprint, and optimize the backup and restore processes.

– A 32 DB nodes design will be used.

� TSM/TDP

Tivoli Storage Manager will still be used to manage the storage environment.

7.2.2 The logical architecture

There will be no change in the logical architecture for the 60 TB environment (described in Figure 7-1), as compared to the 20 TB environment (described in Figure 2-5 on page 74).

Figure 7-1 60TB: the logical architecture

The 20 TB logical architecture will not be changed for the60 TB environment.

Injectors

Injectors

SAP GUI

Injectors

SAP WASNODE

Database Storage Database Server

SANFabric

Db2 partition 0Log

TempData+Index

Db2 partition 37Log

TempData+Index

Db2 partition 6Log

TempData+Index

DB2 partition 0

DB2 partition 6

DB2 partition 37

Application Server

Front-endAccess tools


7.2.3 The SAP architecture and the data model

We expect to have the following InfoProviders distribution:

� Four AIX application servers.

� Two ODS per partitions.

� 25 IC per AIX partition.The IC for reporting may have up to 200 million records; the IC for loading will have between 100 and 200 million records.

The following SAP application servers will be used, as described in Figure 7-2:

� A central instance will be used for administrative purposes.

� One application server will be used for the extractors and the aggregation rollup.

� One application server will be used for reporting.

� Four application servers will be used for loading.

Figure 7-2 60 TB: SAP application servers

Figure 7-3 on page 247 describes the relationship between all these application servers.

Use additional LPARs to separate the load profiles.

P595

40 CPUs, 128 GB RAM

P595

40 CPUs, 128 GB RAM

P595

6-11 CPUs, 40-48 GB RAM

P595

6-11 CPUs, 40-48 GB RAM

SYSXCI Central Instance5 DIA, 5 BTC(administrative)

SYSXCI Central Instance5 DIA, 5 BTC(administrative)

SYSXONL Application Server60 DIA, 10 BTC (reporting)

SYSXONL Application Server60 DIA, 10 BTC (reporting)

SYSXBTC Application Server8 DIA, 12 BTC (extractors, aggregate rollup)

SYSXBTC Application Server8 DIA, 12 BTC (extractors, aggregate rollup)

P595

40 CPUs, 128 GB RAM

P595

40 CPUs, 128 GB RAM

SYSXAS03 Application Server48 DIA, 6 BTC(loading: update rules)

SYSXAS03 Application Server48 DIA, 6 BTC(loading: update rules)P595

40 CPUs, 128 GB RAM

P595

40 CPUs, 128 GB RAM



40 CPUs, 128 GB RAM

P595

40 CPUs, 128 GB RAM



40 CPUs, 128 GB RAM

P595

40 CPUs, 128 GB RAM




Figure 7-3 60 TB: SAP application servers for the load distribution

For the reporting, the injector query requests are directly dispatched to the SYSXONL application server.

For the loading:

� The “BTC extractor” processes are directed to the SYSXBTX application server. The number of extractors are controlled by the number of source ODS objects and the number of InfoPackages per ODS.

� The “Update Dialog” processes are directed to the SYSXAS03-04-05-06 application servers.

� The “aggregate rollup BTC” processes are controlled by the Process Chain server. They run on the SYSXBTC application server.

Two extractors per ODS will be used, as described in Figure 7-4 on page 248. More dialog work processes will process the update rules, and the throughput is expected to be higher.

SYSXCI Central InstanceSYSXCI Central Instance

SYSXONLApplication ServerSYSXONLApplication Server

SYSXBTC Application ServerSYSXBTC Application Server

SYSXAS03 Application ServerSYSXAS03 Application Server

Reporting (LoadRunner)Reporting (LoadRunner)

Loading (Extractors)Loading (Extractors)

Loading (Update Rules)Loading (Update Rules)

Loading (Aggregate Rollup)Loading (Aggregate Rollup)





Figure 7-4 60 TB: dataload split scenario

7.2.4 The DB2 environment

The disk space is used by the DB2 data, the index tablespaces, the temporary tablespaces and the DB2 logger. Multiplying by three the warehouse size, we expect the following distribution:

� DB2 Partition 0

– The DB2 logger will use 0.75 TB.– The DB2 temporary tablespaces will use 1.08 TB.– The DB2 data and index will use 1.08 TB.

� Each DB2 partition

– The DB2 logger will use 300 GB.– The DB2 temporary tablespaces will use 300 GB.– The DB2 data and index will use 2,400 GB.

� DB2 total:

– The DB2 logger will use 9.75 TB.– The DB2 temporary tablespaces will use 5.25 TB.– The DB2 data and index will use 80 TB.

IC1IC1 IC2IC2 IC3IC3 IC4IC4

½ selection½ selection

IC5IC5


Figure 7-5 DB2 layout assumptions for the 60 TB tests

We recommend:

� Using a 33-partition layout for DB2.

� Splitting the 33 partitions across 5 physical machines.

� Sharing system resources with the SAP applications

– To address the DB2 optimizations using DB2 features from the Version 9: Long running Query optimizations

• Improved Statsview Implementation

• Optimizer Enhancements

– Reduce Disk Footprint® of large SAP NetWeaver BI System

• DB2 9-row compression

– Backup/Restore optimizations for large database

• Ability to rebuild entire DB, including DPF, from set of tablespace backup images

7.2.5 The System p5 environment

A major advantage of using AIX 5.3 with the POWER5 system is the use of the Advanced POWER™ Virtualization (APV). This feature is a combination of hardware and software that supports and manages the virtual I/O environment on POWER5 and POWER5+ systems. The main technologies of the APV are:

� Virtual Ethernet� Shared Ethernet Adapter� Virtual SCSI Server� Micro-Partitioning™ technology� Partition Load Manager

The Micro-Partitioning technology in particular will help optimize the CP resources. Micro-Partitioning is the ability to divide a physical processor’s computing power into fractions of a processing unit, and then share them among multiple logical partitions. It is an option for which you must obtain and enter an activation code for most IBM System and IBM Server


models, except for the p5-590 and p5-595 models, where it is included automatically in the configuration. The benefit of Micro-Partitioning is that it allows increased overall utilization of CPU resources within the managed system. Better granularity of CPU allocation in a logical partition means more efficient use of processing power.

Partitions with AIX 5L Version 5.2 are supported on servers with dedicated processors. A partition with AIX 5L Version 5.3 consists of dedicated processors or shared processors with a specific capacity entitlement running in capped or uncapped mode, dedicated memory region, and virtual or physical I/O adapter slots. All of these resources can be dynamically changed.

For more information about the APV, refer to IBM Redbook Advanced POWER Virtualization on IBM System p5, SG24-7940.

In our environment, as shown in Figure 7-6, we propose the following:

� Giving the batch aggregates the lowest priority in the LPAR, with a generous fixed entitlement value. The load on aggregates drives the load in the database, which will have first priority in all LPARs. Aggregates are therefore predictable and can be controlled.

� Giving the upload the lowest priority with a low entitlement, but virtually unlimited. So, any free resource in the machine can be utilized as long as a higher priority system does not require it.

� Giving the online query the highest priority over all concurrent load types. That will drive the aggregates and the upload to their defined entitlements. We expect that the peaks in the load types will lead to effective load sharing even in high load situations.

Figure 7-6 Using the priority and resource distribution policy

Other features, such as the Enhanced Journaled File System (JFS2), increase file and file system sizes to 16 terabytes (TB) with the 64-bit kernel. They can be of interest when performing the tests. The 32-bit kernel continues to support 1 TB.

CPU virtualization capabilities used fordynamic resource sharing.

DB0+CI DB2DB1 DB3 DB4

Online OnlineOnline Online Online

Batch Aggregates UploadUpload Upload Upload

03 04 05 06 07 08 09 10

PRIORITY1

PRIORITY2

PRIORITY3

Controlled by ent & vp


7.2.6 The storage environment

We expect that the workload will be equally distributed on all DB2 partitions, and that the I/O activity for each of the 32 DB2 partitions be the same.

Two options can be investigated for the DB2 production LUNs:

� Spread all DB2 partitions on all the arrays by using small LUNs and having one LUN for each DB2 partition in each array.

� Dedicate a group of arrays for each DB2 partition by using a large LUN in the group of arrays.

We expect that with a dedicated group of arrays for each DB2 partition, and with a small number of LUNs on an array, the I/O contention will be reduced on this array and that the administrative tasks will be reduced.

Two options can be investigated for the FlashCopy LUNs:

� Dedicate arrays for Flashcopy LUNs.

� Or share the same arrays for productions and Flashcopy LUNs.

We expect that sharing the same arrays will optimize the production workload by providing more physical drives.

From a manageability and a scalability perspective, we recommend that you plan for 20% or 30% of extra capacity. For example, for a 60 TB test, plan to have a minimum of 72 TB.

Except for the DB2 partition 0 DS8000 arrays per DB2 partition will be set up, based on four DS8300 with 64 arrays each (512 disk drives), with 8 DB2s per DS8300.

7.2.7 Tivoli Storage Manager and Tivoli Data Protection environment

Figure 7-7 Storage infrastructure set up for the 60 TB tests

About 100 TB to be managed, needing four DS8000s.

68 LTO3 tapesassigned toTSM server /StorageAgents

6 * p595

4 * DS8300

Source FCTarget

Source FCTarget Source FC

Target Source FCTarget Source FC

Target

TSM Server

DB2Partition 0

DB2Partition 6…13

DB2Partition 14…21



StorageAgent 0

StorageAgent 1

StorageAgent 2

StorageAgent 3

StorageAgent 4

SAP AS SAP AS SAP AS SAP AS SAP ASSAP AS

168


For backup and restore functions:

� When the backup of partiton 0 is finished, the backup for all 32 remaining partitions can be run in parallel (either with one or two sessions per backup).

� There will be no blocking of tape cartridges or wait for free tape drives during the roll forward retrieve phase.

The Flashcopy backup will be done from the StorageAgent “Data Mover” partitions, as explained here:

� Each “Data Mover” LPAR will handle the backup of 8 partitions (symmetric to the partition distribution of the production DB).

� On each “DataMover” LPAR, up to 8 backups will be run in parallel using the DB2_EEE_PARALLEL_BACKUP YES parameter.

The following options can be tested:

1. Using the DB2_EEE_PARALLEL_BACKUP YES parameter, run 32 backups in parallel and test with one or two sessions respectively with 32 or 64 LTO3 tape drives.

2. Using the DB2_EEE_PARALLEL_BACKUP NO parameter, run 4 backups in parallel and test with one, two, three, or ?????? four sessions respectively with 4, 8, 12, and 16 LTO3 tape drives. (Note, however, that this option may not be sufficient to meet the time window with a 60 TB DB size.)

Using the first option, we expect that the following should still be possible:

� The FlashCopy backup to tape should complete in less than 8 hours.

� The FlashCopy restore and simultaneous roll forward of 500 GB should complete in less than 8 hours.

� The database restore from tape restore using Tivoli Storage Manager server and roll forward 2 TB of logs should complete in less than 18 hours.

7.2.8 The physical architecture

Compared to the physical architecture used for the 20 TB tests, more servers and the latest technologies (POWER5+ and Turbo models) will be used for the 60 TB tests.

Figure 7-8 on page 253 illustrates the environment we will use: five p5-595 and four DS8300 will be installed in the IBM PSSC Center. This infrastructure and the architecture may evolve, however, depending on the results.


Figure 7-8 60 TB: the physical architecture

DB2 + SAP ASSYS3 TSM

358432 to 64 LTO3

p595

DS8300



Appendix A. The DB2 scripts

This appendix provides the scripts that were developed for this project to manage the DB2 environment.

A


DB2 monitoring scriptsThe usage of these scripts is explained in 3.4.3, “DB2-specific monitoring scripts” on page 140.

Script A: Upload monitoring scriptExample: A-1 Upload Monitoring script A

1 ###################################################################################### 2 ## ## 3 ## statistics.sh - Script to check how many rows are being inserted ## 4 ## ## 5 ## ## 6 ## ## 7 ## ## 8 ## Version: 20060716 ## 9 ## Author: Fabio Hasegawa ([email protected]) ## 10 ## Modifications: ## 11 ## Property of IBM ## 12 ## ## 13 ###################################################################################### 14 15 # Setting initial variables 16 17 # CUBES THAT WILL BE MONITORED 18 CUBE_LIST="060 030" 19 # PREFIX OF THE FACT TABLE 20 PREFIX_FACT_TABLE="/BIC/FZGTFC" 21 # TIMESTAMP 22 DATE=$(date '+%Y%m%d%H%M%S') 23 # DATE 24 DAY=$(date '+%Y-%m-') 25 # DATABASE NAME 26 DBNAME="EB8" 27 # TABLE NAME TO STORE STATISTICS 28 TABLE_NAME="FABIO.STAT${DATE}" 29 # REFRESH SCREEN TIME 30 REFRESH=60 31 # NUMBER OF RETRIES THAT IT WILL WAIT IF TOTAL DID NOT UPDATE 32 let RETRIES=40 33 34 # Reseting variables used to create the count statement 35 let TOTAL_LAST_RUN=0 36 let TOTAL_THIS_RUN=0 37 let NUMBER_OF_INSERTS=0 38 let LOOP_ZERO=0 39 let FIRST_LOOP=0 40 COLUMNS="" 41 POPULATE="" 42 JOIN="" 43 SELECT="" 44 AGREGATED="" 45 let COUNT=0 46 let DISPLAY_COUNT=0


47 48 for i in ${CUBE_LIST} 49 do 50 if [ ${COUNT} -eq 0 ];then

51 export COLUMNS="CUBE${i} integer" 52 export POPULATE=" CUBE${i} AS (select count(*) AS COUNT_${i}, '1' AS KEY from SAPR3.\"/BIC/FZGTFC${i}\")" 53 export JOIN="CUBE${i}" 54 export SELECT="CUBE${i}.COUNT_${i}" 55 export AGREGATED="INT(COUNT_${i})" 56 else 57 export COLUMNS="CUBE${i} integer, ${COLUMNS}" 58 export POPULATE=" CUBE${i} AS (select count(*) AS COUNT_${i}, '1' AS KEY from SAPR3.\"/BIC/FZGTFC${i}\"), ${POPULATE}" 59 export JOIN="CUBE${i},${JOIN}" 60 export SELECT="CUBE${i}.COUNT_${i}, ${SELECT}" 61 export AGREGATED="INT(COUNT_${i})+${AGREGATED}" 62 63 fi 64 let COUNT=${COUNT}+1 65 66 done 67 let NUMBER_CUBES=${COUNT} 68 69 # CREATING THE STATISTICS TABLE 70 71 # EXECUTE THE COUNT OVER THE FACT TABLES 72 73 db2 "connect to ${DBNAME}" > /dev/null 74 75 # CREATING THE STATISTICS TABLE 76 db2 "CREATE TABLE ${TABLE_NAME} (${COLUMNS}, TOTAL INTEGER, TIME TIMESTAMP)">/dev/null 77 78 echo "" 79 echo "Calculating data..." 80 81 while true 82 do 83 db2 "WITH ${POPULATE} SELECT ${SELECT}, (${AGREGATED})AS TOTAL , CURRENT TIMESTAMP FROM ${JOIN}" > rst.tmp 84 85 let COUNT=0 86 VALUES="" 87 88 for i in `cat rst.tmp | grep "${DAY}"` 89 do 90 let COUNT=${COUNT}+1

91 if [ ${COUNT} -le ${NUMBER_CUBES} ];then 92 VALUES="${VALUES} ${i}," 93 else 94 if [ ècho ${i} | grep ${DAY} | wc -l` = "1" ];then 95 VALUES="${VALUES} '${i}'"

96 else 97 VALUES="${VALUES} ${i}," 98 let TOTAL_THIS_RUN=${i}

Appendix A. The DB2 scripts 257

99 fi100 fi101 done102 rm rst.tmp

103 104 db2 "INSERT INTO ${TABLE_NAME}(${JOIN}, TOTAL, TIME) VALUES(${VALUES})" > /dev/null 105 106 let DISPLAY_COUNT=${DISPLAY_COUNT}+1 107 let NUMBER_OF_INSERTS=${TOTAL_THIS_RUN}-${TOTAL_LAST_RUN} 108 echo "RUN COUNT: ${DISPLAY_COUNT}" 109 echo "Reporting table in use: ${TABLE_NAME}" 110 if [ ${FIRST_LOOP} -eq 1 ];then 111 echo "Number of rows added since last refresh: ${NUMBER_OF_INSERTS} record(s)" 112 fi 113 printf "Start time of collecting data: " 114 db2 "SELECT MIN(TIME) AS START_TIMESTAMP FROM ${TABLE_NAME}" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "TIMESTAMP" | while read line 115 do 116 for i in ${line} 117 do 118 echo "${i}" 119 done

120 done121 printf "Last time run:"

122 db2 "SELECT MAX(TIME) AS LAST_RUN_TIMESTAMP FROM ${TABLE_NAME}" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "TIMESTAMP" | while read line 123 do 124 for i in ${line} 125 do 126 echo "${i}" 127 done 128 done 129 echo "Running for:" 130 db2 "SELECT DECIMAL((DOUBLE((HOUR(max(time)-min(time))*60)+(MINUTE(max(time)-min(time)))+(INT(SECOND(max(time)-min(time))/60)))/60), 10,2) AS HOURS, (HOUR(max(time)-min(time))*60)+(MINUTE(max(time)- min(time)))+(INT(SECOND(max(time)-min(time))/60)) as MINUTES FROM ${TABLE_NAME}" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "HOURS" | while read line 131 do 132 let COUNT=0 133 for i in ${line} 134 do 135 let COUNT=${COUNT}+1 136 if [ ${COUNT} -eq 1 ];then 137 printf "${i} hours or " 138 else 139 echo "${i} minutes" 140 fi

141 done 142 done 143 echo "" 144 printf "Probable number of rows inserted: " 145 if [ ${FIRST_LOOP} -eq 1 ];then

146 db2 "SELECT INT((MAX(TOTAL)-MIN(TOTAL))/(DOUBLE(DOUBLE((HOUR(max(time)-min(time))*60)+(MINUTE(max(time)-min(


time)))+(INT(SECOND(max(time)-min(time))/60)))/60))) AS PROBABLE_RECORD_HOUR FROM ${TABLE_NAME}" | grep -v "record(s) selected" | grep -v "SQL0801N" | grep -v "\-\-\-" | grep -v "PROBABLE" | while read line 147 do 148 for i in ${line} 149 do 150 echo "${i} records" 151 done 152 done 153 else

154 echo "Unable to get statistics, first round inplace" 155 fi 156 printf "Total number of rows inserted: " 157 db2 "select MAX(TOTAL)-MIN(TOTAL) AS TOTAL_INSERTED FROM ${TABLE_NAME}" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "TOTAL_INSERTED" | while read line 158 do 159 for i in ${line} 160 do 161 echo "${i} records" 162 done 163 done 164

165 echo "___________________________________________________________________________________________"

166 echo "" 167 echo "Refresh time (seconds): ${REFRESH}" 168 let TOTAL_LAST_RUN=${TOTAL_THIS_RUN} 169 sleep ${REFRESH} 170 171 if [ ${NUMBER_OF_INSERTS} -eq 0 ];then 172 let LOOP_ZERO=${LOOP_ZERO}+1 173 else 174 let LOOP_ZERO=0 175 fi 176 177 if [ ${LOOP_ZERO} -ge ${RETRIES} ];then 178 echo "The screen was refreshed ${RETRIES} times and the total did not increase" 179 echo "Do you want to continue? (1-Yes,2-No) " 180 select menulist in Y N 181 do 182 case ${menulist} in 183 Y) 184 echo "Continuing.... " 185 sleep 2 186 print "" 187 let LOOP_ZERO=0 188 break;; 189 N) 190 echo "Finalizing... Table name: ${TABLE_NAME} "191 echo ""

192 echo "" 193 exit 0 194 break;; 195 esac 196 done


197 fi 198 let FIRST_LOOP=1 199 echo "Waiting for the next run... " 200 echo "" 201

202 203 done 204

Script B: rollup monitoring scriptExample: A-2 Rollup monitoring script B

1 ###################################################################################### 2 ## ## 3 ## statistics.sh - Script to check how many rows are being inserted AGGREGATES ## 4 ## ## 5 ## ## 6 ## ## 7 ## ## 8 ## Version: 20060717 ## 9 ## Author: Fabio Hasegawa ([email protected]) ## 10 ## Modifications: ## 11 ## Property of IBM ## 12 ## ## 13 ###################################################################################### 14 15 # Setting initial variables 16 17 # CUBES THAT WILL BE MONITORED 18 CUBES="030" 19 # PREFIX OF THE FACT TABLE 20 PREFIX_FACT_TABLE="/BIC/F" 21 # TIMESTAMP 22 DATE=$(date '+%Y%m%d%H%M%S') 23 # DATE 24 DAY=$(date '+%Y-%m-') 25 # DATABASE NAME 26 DBNAME="EB8" 27 # TABLE NAME TO STORE STATISTICS 28 TABLE_NAME="FABIO.AGGR${DATE}" 29 # REFRESH SCREEN SNAPSHOT 30 REFRESH=120 31 # NUMBER OF RETRIES THAT IT WILL WAIT IF TOTAL DID NOT UPDATE 32 let RETRIES=2 33 34 # Reseting variables used to create the count statement 35 let TOTAL_LAST_RUN=0 36 let TOTAL_THIS_RUN=0 37 let NUMBER_OF_INSERTS=0 38 let LOOP_ZERO=0 39 let FIRST_LOOP=0 40 COLUMNS="" 41 POPULATE="" 42 JOIN=""


43 SELECT="" 44 AGREGATED="" 45 let COUNT=0 46 let DISPLAY_COUNT=0 47 AGGR_LIST="" 48 49 # adding entries for aggregate cubes 50

51 db2 "connect to ${DBNAME}" 52 let N_CUBES=0

5354

55 while true56 do

57 58 for SUFFIX in ${CUBES} 59 do 60 CUBE="ZGTFC${SUFFIX}" 61 CUBE_LIST="" 62 echo "Calculating data..." 63 db2 "select DISTINCT(AGGRCUBE) from sapr3.RSDDAGGR_V where infocube='${CUBE}'" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "AGGRCUBE" | while read line 64 do 65 CUBE_LIST="${CUBE_LIST} ${line}" 66 done 67 68 for i in ${CUBE_LIST} 69 do 70 if [ ${COUNT} -eq 0 ];then 71 export COLUMNS="AGGR${i} integer" 72 export POPULATE=" AGGR${i} AS (select count(*) AS COUNT_${i}, '1' AS KEY from SAPR3.\"/BIC/F${i}\")" 73 export JOIN="AGGR${i}" 74 export SELECT="AGGR${i}.COUNT_${i}" 75 export AGREGATED="INT(COUNT_${i})" 76 else 77 export COLUMNS="AGGR${i} integer, ${COLUMNS}" 78 export POPULATE=" AGGR${i} AS (select count(*) AS COUNT_${i}, '1' AS KEY from SAPR3.\"/BIC/F${i}\"), ${POPULATE}" 79 export JOIN="AGGR${i},${JOIN}" 80 export SELECT="AGGR${i}.COUNT_${i}, ${SELECT}" 81 export AGREGATED="INT(COUNT_${i})+${AGREGATED}" 82 83 fi # if [ ${COUNT} -eq 0 ];then 84 let COUNT=${COUNT}+1 85 done # for i in ${CUBE_LIST} 86 let NUMBER_CUBES=${COUNT} 87 88 AGGR_LIST="${AGGR_LIST} ${CUBE}" 89 90 # CREATING THE STATISTICS TABLE 91 if [ ${DISPLAY_COUNT} -lt 2 ];then 92 db2 "CREATE TABLE ${TABLE_NAME}_${CUBE} (${COLUMNS}, TOTAL INTEGER, SNAPSHOT TIMESTAMP)" 93 fi 94


95 echo "Retriving data from database..." 96 97 db2 "WITH ${POPULATE} SELECT ${SELECT}, (${AGREGATED}) AS TOTAL, CURRENT TIMESTAMP FROM ${JOIN}" | grep "${DAY}" > ${CUBE}_rst.tmp 98 99 let COUNT=0

100 VALUES=""101

102 for i in `cat ${CUBE}_rst.tmp` 103 do 104 let COUNT=${COUNT}+1 105 if [ ${COUNT} -le ${NUMBER_CUBES} ];then

106 VALUES="${VALUES} ${i}," 107 else 108 if [ ècho ${i} | grep "${DAY}" | wc -l` = "1" ];then 109 VALUES="${VALUES} '${i}'" 110 else 111 VALUES="${VALUES} ${i}," 112 fi 113 fi # if [ ${COUNT} -le ${NUMBER_CUBES} ];then 114 done # for i in `cat ${CUBE}_rst.tmp` 115 116 db2 "INSERT INTO ${TABLE_NAME}_${CUBE}(${JOIN}, TOTAL, SNAPSHOT) VALUES(${VALUES})" 117 118 let DISPLAY_COUNT=${DISPLAY_COUNT}+1 119 let NUMBER_OF_INSERTS=${TOTAL_THIS_RUN}-${TOTAL_LAST_RUN} 120 echo "RUN COUNT: ${DISPLAY_COUNT}" 121 printf "Reporting table in use: ${TABLE_NAME}_${CUBE}" 122 echo " Monitoring aggregates on infocube ${CUBE}" 123 printf "Start time of collecting data: " 124 db2 "SELECT MIN(SNAPSHOT) AS START_TIMESTAMP FROM ${TABLE_NAME}_${CUBE}" | grep -v "record(s) selected"| grep -v "\-\-\-" | grep -v "TIMESTAMP" |while read line 125 do 126 for i in ${line} 127 do 128 echo "${i}" 129 done 130 done 131 printf "Last time run: " 132 db2 "SELECT MAX(SNAPSHOT) AS LAST_RUN_TIMESTAMP FROM ${TABLE_NAME}_${CUBE}" | grep -v "record(s) selected" | grep -v "\-\-\-" | grep -v "TIMESTAMP" |while read line 133 do 134 for i in ${line} 135 do

136 echo "${i}" 137 done 138 done 139 printf "Running for: " 140 db2 "SELECT DECIMAL((DOUBLE((HOUR(max(snapshot)-min(snapshot))*60)+(MINUTE(max(snapshot)-min(snapshot)))+(INT(SECOND(max(snapshot)-min(snapshot))/60)))/60), 10,2) AS HOURS, (HOUR(max(snapshot)- min(snapshot))*60)+(MINUTE(max(snapshot)-min(snapshot)))+(INT(SECOND(max(snapshot)-min(snapshot))/60)) as MINUTES FROM ${TABLE_NAME}_${CUBE}" | grep -v "record(s) selected" | grep -v "HOURS" | grep -v "\-\-\-" | while read line

141 do


142 let COUNT=0143 for i in ${line}

144 do145 let COUNT=${COUNT}+1

146 if [ ${COUNT} -eq 1 ];then 147 printf "${i} hours" 148 else 149 echo " or ${i} minutes" 150 fi 151 done 152 done 153 154 printf "Probable number of rows inserted: " 155 db2 "SELECT INT((MAX(TOTAL)-MIN(TOTAL))/(DOUBLE(DOUBLE((HOUR(max(snapshot)-min(snapshot))*60)+(MINUTE(max(snapshot)-min(snapshot)))+(INT(SECOND(max(snapshot)-min(snapshot))/60)))/60))) AS PROBABLE_RECORD_HOUR FROM ${TABLE_NAME}_${CUBE}" | grep -v "record(s) selected" | grep -v "SQL0801N"| grep -v "PROBABLE" | grep -v "\-\-\-" | while read line 156 do 157 for i in ${line} 158 do 159 echo "${i} records" 160 done 161 done 162 printf "Total number of rows inserted: " 163 db2 "select MAX(TOTAL)-MIN(TOTAL) AS TOTAL_INSERTED FROM ${TABLE_NAME}_${CUBE}" | grep -v "TOTAL" | grep -v "record(s) selected"| grep -v "\-\-\-" | while read line 164 do 165 for i in ${line} 166 do 167 echo "${i} records" 168 done 169 done 170 echo "" 171 echo "Efresh time (seconds): ${REFRESH}" 172 let TOTAL_LAST_RUN=${TOTAL_THIS_RUN} 173 sleep ${REFRESH} 174 175 176 done # for SUFFIX in ${CUBES} 177 178 done #while true 179


DB2 checking scriptsThe usage of Script C is described in 3.4.4, “DB2 checking scripts” on page 144.

DB2 checking - Script CExample: A-3 DB2 checking - Script C

db2eb8@sys3db0p:/db2/db2eb8/fabio/ # cat -n check_infocube.sh 1 ###################################################################################### 2 ## ## 3 ## check_infocube.sh - Script to check space for infocubes ## 4 ## ## 5 ## ## 6 ## ## 7 ## ## 8 ## Version: 20060718 ## 9 ## Author: Fabio Hasegawa ([email protected]) ## 10 ## Modifications: ## 11 ## Property of IBM ## 12 ## ## 13 ###################################################################################### 14 15 go_to_partitions() 16 { 17 echo "This may take a long time.... " 18 let TOTAL_TABLESPACE_MB=0 19 for node in ${NODENUM} 20 do 21 PARTITION="" 22 PAGESIZE="" 23 TOTPAGES="" 24 USABLEPGS="" 25 USEDPGS="" 26 FREEPGS="" 27 AUTORESIZE="" 28 CONTAINER_ID="" 29 CONTAINER_TYPE="" 30 CONTAINER_TOTAL_PAGES="" 31 STRIPE_SET="" 32 CONTAINER_PATH="" 33 STRING_FOR_DF="" 34 NEXT_LINE_TYPE="NULL" 35 36 db2_all "<<+${node}< db2pd -d ${DBNAME} -tablespaces tablespace=${TBSPACEID}"| while read line 37 do 38 let COUNT=0 39 40 if [ ${NEXT_LINE_TYPE} != "NULL" ];then 41 if [ ${NEXT_LINE_TYPE} = "CONFIGURATION" ];then 42 let COUNT=0 43 for i in ${line} 44 do 45 let COUNT=${COUNT}+1


46 if [ ${COUNT} -eq 4 ];then 47 PAGESIZE=${i} 48 fi 49 done 50 elif [ ${NEXT_LINE_TYPE} = "STATISTICS" ];then 51 for i in ${line} 52 do 53 let COUNT=${COUNT}+1 54 if [ ${COUNT} -eq 2 ];then 55 TOTPAGES=${i} 56 fi 57 if [ ${COUNT} -eq 3 ];then 58 USABLEPGS=${i} 59 fi 60 if [ ${COUNT} -eq 4 ];then 61 USEDPGS=${i} 62 fi 63 if [ ${COUNT} -eq 6 ];then 64 FREEPGS=${i} 65 fi 66 done 67 fi 68 NEXT_LINE_TYPE="NULL" 69 elif [ ècho ${line} | grep "Type" | grep "Content" | grep "ExtentSz" | wc -l ` = "1" ];then 70 NEXT_LINE_TYPE="CONFIGURATION" 71 elif [ ècho ${line} | grep "TotPages" | grep "UsablePgs" | grep "UsedPgs" | wc -l ` = "1" ];then 72 NEXT_LINE_TYPE="STATISTICS" 73 elif [ ècho ${line} | egrep "File|Device" | wc -l ` = "1" ];then 74 let COUNT=0 75 for i in ${line} 76 do 77 let COUNT=${COUNT}+1 78 if [ ${COUNT} -eq 2 ];then 79 CONTAINER_ID="${CONTAINER_ID} ${i}" 80 fi 81 if [ ${COUNT} -eq 3 ];then 82 CONTAINER_TYPE="${CONTAINER_TYPE} ${i}" 83 fi 84 if [ ${COUNT} -eq 4 ];then 85 CONTAINER_TOTAL_PAGES="${CONTAINER_TOTAL_PAGES} ${i}" 86 fi 87 if [ ${COUNT} -eq 6 ];then 88 STRIPE_SET="${STRIPE_SET} ${i}" 89 fi 90 if [ ${COUNT} -eq 7 ];then 91 CONTAINER_PATH="${CONTAINER_PATH} ${i}" 92 fi 93 done 94 NEXT_LINE_TYPE="NULL" 95 else 96 NEXT_LINE_TYPE="NULL" 97 fi 98 done


99 echo "Action: ${SET_ACTION} Infocube:${INFOCUBE} Tablespace:${TABLESPACE} Tablespace Type:${TABLESPACE_TYPE} Partition:${node} statistics:" 100 PAGESIZE=èxpr ${PAGESIZE} / 1024 ` 101 printf " Pagesize: ${PAGESIZE} K" 102 printf " Total pages on the partition: ${TOTPAGES}" 103 echo " Total usable pages on the partition: ${USABLEPGS}" 104 printf " Free space (pages): ${FREEPGS}" 105 TOTAL_K=èxpr ${PAGESIZE} \* ${FREEPGS}` 106 TOTAL_M=èxpr ${TOTAL_K} / 1024 ` 107 printf " Free space (MB): ${TOTAL_M} " 108 echo "" 109 let CONT=0 110 for cont_id in ${CONTAINER_ID} 111 do 112 FILE_SYSTEM_FREE="" 113 FILE_SYSTEM_USED_PERCENT="" 114 FILE_SYSTEM="" 115 C_PATH="" 116 let CONT=${CONT}+1 117 printf " Container ID: ${cont_id}" 118 echo "" 119 printf " Container Type: "ècho ${CONTAINER_TYPE}| cut -d" " -f${CONT}` 120 printf " Container Total Pages: "ècho ${CONTAINER_TOTAL_PAGES}| cut -d" " -f${CONT}` 121 printf " Container Stripe Set: "ècho ${STRIPE_SET} | cut -d" " -f${CONT}` 122 echo "" 123 printf " Container Path: "ècho ${CONTAINER_PATH} | cut -d" " -f${CONT}` 124 echo "" 125 C_PATH=$(echo ${CONTAINER_PATH} | cut -d" " -f${CONT}) 126 let COUNT_FS=0 127 db2_all "<<+${node}<df -k ${C_PATH}" | grep -v "1024-blocks" | grep -v "completed ok" | while read line 128 do 129 for i in ${line} 130 do 131 let COUNT_FS=${COUNT_FS}+1 132 if [ ${COUNT_FS} -eq 3 ];then 133 FILE_SYSTEM_FREE=${i} 134 fi 135 if [ ${COUNT_FS} -eq 4 ];then 136 FILE_SYSTEM_USED_PERCENT=${i} 137 fi 138 done 139 done 140 printf " File System Free Space: ${FILE_SYSTEM_FREE}" 141 printf " File System utilization: ${FILE_SYSTEM_USED_PERCENT}\n" 142 done 143 144 echo " Free space in the tablespace partition(MB): ${TOTAL_M} " 145 let TOTAL_TABLESPACE_MB=${TOTAL_TABLESPACE_MB}+${TOTAL_M} 146 done # for node in 147 echo "Tablespace total size (MB): ${TOTAL_TABLESPACE_MB}" 148 } 149


150 151 ### SCRIPT BEGINS 152 153 154 DBNAME="EB8" 155 V_TABLE_TYPE="F" 156 157 158 if [ $# -gt 0 ];then 159 160 db2 "connect to ${DBNAME}" > /dev/null 161 162 for arg in $* 163 do 164 TABLESPACE="" 165 TBSPACEID="" 166 NODENUMS="" 167 AGGREGATE="" 168 INFOCUBE="" 169 SET_ACTION="UPLOAD check" 170 171 INFOCUBE=${arg} 172 echo "______________________________________________________________________________________________" 173 echo " Starting script to check infocube ${INFOCUBE} for data load and aggreations" 174 echo "______________________________________________________________________________________________" 175 echo "" 176 177 178 db2 "select distinct(AGGRCUBE) from sapr3.RSDDAGGR_V WHERE INFOCUBE='${INFOCUBE}' fetch first 1 row only" | grep -v "\-\-" | grep -v "selected." | grep -v "AGGRCUBE" | while read line 179 do 180 let COUNT=0 181 for i in ${line} 182 do 183 if [ ${COUNT} -eq 0 ];then 184 AGGREGATE="${i}" 185 fi 186 done 187 let COUNT=${COUNT}+1 188 done 189 190 let COUNT=0 191 192 for TABLE_TYPE in ${V_TABLE_TYPE} 193 do 194 195 echo "The INFOCUBE '${TABLE_TYPE}' fact table is spread over the following nodes:" 196 db2 "select SUBSTR(tbs.tbspace,1,25), tbs.tbspaceid, ng.nodenum from syscat.NODEGROUPDEF ng inner join syscat.tablespaces tbs on ng.NGNAME = tbs.NGNAME inner join syscat.tables tb on tbs.tbspaceid=tb.tbspaceid where tb.tabname='/BIC/${TABLE_TYPE}${INFOCUBE}'"| grep -v "record" | grep -v "\-\-" | grep -v "NODENUM" | while read line 197 do


198 let COUNT=0 199 for i in ${line} 200 do 201 if [ ${COUNT} -eq 0 ];then 202 TABLESPACE="${i}" 203 elif [ ${COUNT} -eq 1 ];then 204 TBSPACEID="${i}" 205 else 206 NODENUM="${NODENUM} ${i}" 207 fi 208 let COUNT=${COUNT}+1 209 done 210 done 211 printf "Data Tablespace: ${TABLESPACE}" 212 echo " Nodes: ${NODENUM}" 213 TABLESPACE_TYPE="Data" 214 go_to_partitions 215 216 let COUNT=0 217 NODENUM="" 218 TABLESPACE="" 219 let TOTAL_TABLESPACE_MB=0 220 221 db2 "select SUBSTR(tbs.tbspace,1,25), tbs.tbspaceid, ng.nodenum from syscat.NODEGROUPDEF ng inner join syscat.tablespaces tbs on ng.NGNAME = tbs.NGNAME inner join syscat.tables tb on tbs.tbspace=tb.INDEX_TBSPACE where tb.tabname='/BIC/${TABLE_TYPE}${INFOCUBE}'"| grep -v "record" | grep -v "\-\-" | grep -v "NODENUM" | while read line 222 do 223 let COUNT=0 224 for i in ${line} 225 do 226 if [ ${COUNT} -eq 0 ];then 227 TABLESPACE="${i}" 228 elif [ ${COUNT} -eq 1 ];then 229 TBSPACEID="${i}" 230 else 231 NODENUM="${NODENUM} ${i}" 232 fi 233 let COUNT=${COUNT}+1 234 done 235 done 236 echo "Index Tablespace: ${TABLESPACE}" 237 echo "Nodes: ${NODENUM}" 238 TABLESPACE_TYPE="Index" 239 go_to_partitions 240 241 let COUNT=0 242 NODENUM="" 243 TABLESPACE="" 244 done # for TABLE TYPE 245 246 echo "" 247 echo "" 248 249 # Doing for aggregates now


250 251 SET_ACTION="ROLLUP check" 252 253 echo "The INFOCUBE ${INFOCUBE} AGGREGATES fact tables are spread out over the following nodes:" 254 for TABLE_TYPE in ${V_TABLE_TYPE} 255 do 256 echo "The INFOCUBE '${TABLE_TYPE}' fact table is spread over the following nodes:" 257 db2 "select SUBSTR(tbs.tbspace,1,25), tbs.tbspaceid,ng.nodenum from syscat.NODEGROUPDEF ng inner join syscat.tablespaces tbs on ng.NGNAME = tbs.NGNAME inner join syscat.tables tb on tbs.tbspaceid=tb.tbspaceid where tb.tabname='/BIC/${TABLE_TYPE}${AGGREGATE}'"| grep -v "record" | grep -v "\-\-" | grep -v "NODENUM" | while read line 258 do 259 let COUNT=0 260 for i in ${line} 261 do 262 if [ ${COUNT} -eq 0 ];then 263 TABLESPACE="${i}" 264 elif [ ${COUNT} -eq 1 ];then 265 TBSPACEID="${i}" 266 else 267 NODENUM="${NODENUM} ${i}" 268 fi 269 let COUNT=${COUNT}+1 270 done 271 done 272 echo "Data Tablespace: ${TABLESPACE}" 273 echo "Nodes: ${NODENUM}" 274 TABLESPACE_TYPE="Data" 275 go_to_partitions 276 277 let COUNT=0 278 NODENUM="" 279 TABLESPACE="" 280 let TOTAL_TABLESPACE_MB=0 281 282 db2 "select SUBSTR(tbs.tbspace,1,25), tbs.tbspaceid, ng.nodenum from syscat.NODEGROUPDEF ng inner join syscat.tablespaces tbs on ng.NGNAME = tbs.NGNAME inner join syscat.tables tb on tbs.tbspace=tb.INDEX_TBSPACE where tb.tabname='/BIC/${TABLE_TYPE}${AGGREGATE}'"| grep -v "record" | grep -v "\-\-" | grep -v "NODENUM" | while read line 283 do 284 let COUNT=0 285 for i in ${line} 286 do 287 if [ ${COUNT} -eq 0 ];then 288 TABLESPACE="${i}" 289 elif [ ${COUNT} -eq 1 ];then 290 TBSPACEID="${i}" 291 else 292 NODENUM="${NODENUM} ${i}" 293 fi 294 let COUNT=${COUNT}+1 295 done 296 done 297 echo "Index Tablespace: ${TABLESPACE}"


298 echo "Nodes: ${NODENUM}" 299 TABLESPACE_TYPE="Index" 300 go_to_partitions 301 302 let COUNT=0 303 NODENUM="" 304 TABLESPACE="" 305 done # for TABLE TYPE AGGREGATES 306 307 308 done # for arg in $* - CHECK MORE THAN ONE INFOCUBE 309 310 db2 "terminate" | grep -v "DB20000I" 311 else 312 echo "" 313 echo "Missing parameter: Usage $0 INFOCUBE" 314 echo "" 315 echo "" 316 exit 0 317 fi # if [ $# -gt 0 then ];

Output - Script DExample: A-4 Output - Script D

1 ________________________________________________________________________________________ 2 Starting script to check infocube ZGTFC026 for data load and aggregations 3 ________________________________________________________________________________________ 4 5 The INFOCUBE 'F' fact table is spread over the following nodes: 6 Data Tablespace: YMFACTD14 Nodes: 7 11 15 19 23 27 31 35 7 This may take a long time.... 8 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:7 statistics: 9 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 10 Free space (pages): 31328 Free space (MB): 489 11 Container ID: 0 12 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 13 Container Path: /db2/EB8/sapdata1/NODE0007/YMFACTD14.container000 14 File System Free Space: 18031848 File System utilization: 89% 15 Container ID: 1 16 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 17 Container Path: /db2/EB8/sapdata2/NODE0007/YMFACTD14.container001 18 File System Free Space: 18031852 File System utilization: 89% 19 Container ID: 2 20 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 21 Container Path: /db2/EB8/sapdata3/NODE0007/YMFACTD14.container002 22 File System Free Space: File System utilization: 23 Container ID: 3


24 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 25 Container Path: /db2/EB8/sapdata4/NODE0007/YMFACTD14.container003 26 File System Free Space: 18031852 File System utilization: 89% 27 Free space in the tablespace partition(MB): 489 28 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:11 statistics: 29 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 30 Free space (pages): 30920 Free space (MB): 483 31 Container ID: 0 32 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 33 Container Path: /db2/EB8/sapdata1/NODE0011/YMFACTD14.container000 34 File System Free Space: 18135852 File System utilization: 89% 35 Container ID: 1 36 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 37 Container Path: /db2/EB8/sapdata2/NODE0011/YMFACTD14.container001 38 File System Free Space: 18135860 File System utilization: 89% 39 Container ID: 2 40 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 41 Container Path: /db2/EB8/sapdata3/NODE0011/YMFACTD14.container002 42 File System Free Space: 18135872 File System utilization: 89% 43 Container ID: 3 44 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 45 Container Path: /db2/EB8/sapdata4/NODE0011/YMFACTD14.container003 46 File System Free Space: 18135864 File System utilization: 89% 47 Free space in the tablespace partition(MB): 483 48 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:15 statistics: 49 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 50 Free space (pages): 31352 Free space (MB): 489 51 Container ID: 0 52 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 53 Container Path: /db2/EB8/sapdata1/NODE0015/YMFACTD14.container000 54 File System Free Space: 15466720 File System utilization: 91% 55 Container ID: 1 56 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 57 Container Path: /db2/EB8/sapdata2/NODE0015/YMFACTD14.container001 58 File System Free Space: 15466740 File System utilization: 91% 59 Container ID: 2 60 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 61 Container Path: /db2/EB8/sapdata3/NODE0015/YMFACTD14.container002 62 File System Free Space: 15466736 File System utilization: 91% 63 Container ID: 3 64 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 65 Container Path: /db2/EB8/sapdata4/NODE0015/YMFACTD14.container003


66 File System Free Space: 15466736 File System utilization: 91% 67 Free space in the tablespace partition(MB): 489 68 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:19 statistics: 69 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 70 Free space (pages): 30504 Free space (MB): 476 71 Container ID: 0 72 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 73 Container Path: /db2/EB8/sapdata1/NODE0019/YMFACTD14.container000 74 File System Free Space: 16002188 File System utilization: 90% 75 Container ID: 1 76 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 77 Container Path: /db2/EB8/sapdata2/NODE0019/YMFACTD14.container001 78 File System Free Space: 16002208 File System utilization: 90% 79 Container ID: 2 80 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 81 Container Path: /db2/EB8/sapdata3/NODE0019/YMFACTD14.container002 82 File System Free Space: 16002212 File System utilization: 90% 83 Container ID: 3 84 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 85 Container Path: /db2/EB8/sapdata4/NODE0019/YMFACTD14.container003 86 File System Free Space: File System utilization: 87 Free space in the tablespace partition(MB): 476 88 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:23 statistics: 89 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 90 Free space (pages): 30784 Free space (MB): 481 91 Container ID: 0 92 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 93 Container Path: /db2/EB8/sapdata1/NODE0023/YMFACTD14.container000 94 File System Free Space: 16905208 File System utilization: 90% 95 Container ID: 1 96 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 97 Container Path: /db2/EB8/sapdata2/NODE0023/YMFACTD14.container001 98 File System Free Space: 16905196 File System utilization: 90% 99 Container ID: 2 100 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 101 Container Path: /db2/EB8/sapdata3/NODE0023/YMFACTD14.container002 102 File System Free Space: 16905180 File System utilization: 90% 103 Container ID: 3 104 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 105 Container Path: /db2/EB8/sapdata4/NODE0023/YMFACTD14.container003 106 File System Free Space: 16905236 File System utilization: 90% 107 Free space in the tablespace partition(MB): 481


108 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:27 statistics: 109 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 110 Free space (pages): 29840 Free space (MB): 466 111 Container ID: 0 112 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 113 Container Path: /db2/EB8/sapdata1/NODE0027/YMFACTD14.container000 114 File System Free Space: 16851072 File System utilization: 90% 115 Container ID: 1 116 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 117 Container Path: /db2/EB8/sapdata2/NODE0027/YMFACTD14.container001 118 File System Free Space: 16851056 File System utilization: 90% 119 Container ID: 2 120 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 121 Container Path: /db2/EB8/sapdata3/NODE0027/YMFACTD14.container002 122 File System Free Space: 16851048 File System utilization: 90% 123 Container ID: 3 124 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 125 Container Path: /db2/EB8/sapdata4/NODE0027/YMFACTD14.container003 126 File System Free Space: 16851060 File System utilization: 90% 127 Free space in the tablespace partition(MB): 466 128 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:31 statistics: 129 Pagesize: 16 K Total pages on the partition: 797056 Total usable pages on the partition: 797024 130 Free space (pages): 29784 Free space (MB): 465 131 Container ID: 0 132 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 133 Container Path: /db2/EB8/sapdata1/NODE0031/YMFACTD14.container000 134 File System Free Space: 6332664 File System utilization: 96% 135 Container ID: 1 136 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 137 Container Path: /db2/EB8/sapdata2/NODE0031/YMFACTD14.container001 138 File System Free Space: File System utilization: 139 Container ID: 2 140 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 141 Container Path: /db2/EB8/sapdata3/NODE0031/YMFACTD14.container002 142 File System Free Space: File System utilization: 143 Container ID: 3 144 Container Type: File Container Total Pages: 199264 Container Stripe Set: 0 145 Container Path: /db2/EB8/sapdata4/NODE0031/YMFACTD14.container003 146 File System Free Space: 6332684 File System utilization: 96% 147 Free space in the tablespace partition(MB): 465 148 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTD14 Tablespace Type:Data Partition:35 statistics:


149 Pagesize: K Total pages on the partition: Total usable pages on the partition: 150 Free space (pages): Free space (MB): 151 Free space in the tablespace partition(MB): 152 Tablespace total size (MB): 3349 153 Index Tablespace: YMFACTI14 154 Nodes: 7 11 15 19 23 27 31 35 155 This may take a long time.... 156 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:7 statistics: 157 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 158 Free space (pages): 52584 Free space (MB): 821 159 Container ID: 0 160 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 161 Container Path: /db2/EB8/sapdata1/NODE0007/YMFACTI14.container000 162 File System Free Space: 18031848 File System utilization: 89% 163 Container ID: 1 164 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 165 Container Path: /db2/EB8/sapdata2/NODE0007/YMFACTI14.container001 166 File System Free Space: 18031852 File System utilization: 89% 167 Container ID: 2 168 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 169 Container Path: /db2/EB8/sapdata3/NODE0007/YMFACTI14.container002 170 File System Free Space: 18031856 File System utilization: 89% 171 Container ID: 3 172 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 173 Container Path: /db2/EB8/sapdata4/NODE0007/YMFACTI14.container003 174 File System Free Space: 18031852 File System utilization: 89% 175 Free space in the tablespace partition(MB): 821 176 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:11 statistics: 177 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 178 Free space (pages): 52540 Free space (MB): 820 179 Container ID: 0 180 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 181 Container Path: /db2/EB8/sapdata1/NODE0011/YMFACTI14.container000 182 File System Free Space: 18135852 File System utilization: 89% 183 Container ID: 1 184 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 185 Container Path: /db2/EB8/sapdata2/NODE0011/YMFACTI14.container001 186 File System Free Space: 18135860 File System utilization: 89% 187 Container ID: 2 188 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 189 Container Path: /db2/EB8/sapdata3/NODE0011/YMFACTI14.container002 190 File System Free Space: 18135872 File System utilization: 89% 191 Container ID: 3


192 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 193 Container Path: /db2/EB8/sapdata4/NODE0011/YMFACTI14.container003 194 File System Free Space: 18135864 File System utilization: 89% 195 Free space in the tablespace partition(MB): 820 196 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:15 statistics: 197 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 198 Free space (pages): 52592 Free space (MB): 821 199 Container ID: 0 200 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 201 Container Path: /db2/EB8/sapdata1/NODE0015/YMFACTI14.container000 202 File System Free Space: File System utilization: 203 Container ID: 1 204 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 205 Container Path: /db2/EB8/sapdata2/NODE0015/YMFACTI14.container001 206 File System Free Space: 15466740 File System utilization: 91% 207 Container ID: 2 208 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 209 Container Path: /db2/EB8/sapdata3/NODE0015/YMFACTI14.container002 210 File System Free Space: 15466736 File System utilization: 91% 211 Container ID: 3 212 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 213 Container Path: /db2/EB8/sapdata4/NODE0015/YMFACTI14.container003 214 File System Free Space: 15466736 File System utilization: 91% 215 Free space in the tablespace partition(MB): 821 216 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:19 statistics: 217 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 218 Free space (pages): 52556 Free space (MB): 821 219 Container ID: 0 220 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 221 Container Path: /db2/EB8/sapdata1/NODE0019/YMFACTI14.container000 222 File System Free Space: 16002188 File System utilization: 90% 223 Container ID: 1 224 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 225 Container Path: /db2/EB8/sapdata2/NODE0019/YMFACTI14.container001 226 File System Free Space: File System utilization: 227 Container ID: 2 228 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 229 Container Path: /db2/EB8/sapdata3/NODE0019/YMFACTI14.container002 230 File System Free Space: 16002212 File System utilization: 90% 231 Container ID: 3 232 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 233 Container Path: /db2/EB8/sapdata4/NODE0019/YMFACTI14.container003


234 File System Free Space: 16002204 File System utilization: 90% 235 Free space in the tablespace partition(MB): 821 236 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:23 statistics: 237 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 238 Free space (pages): 52556 Free space (MB): 821 239 Container ID: 0 240 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 241 Container Path: /db2/EB8/sapdata1/NODE0023/YMFACTI14.container000 242 File System Free Space: 16905208 File System utilization: 90% 243 Container ID: 1 244 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 245 Container Path: /db2/EB8/sapdata2/NODE0023/YMFACTI14.container001 246 File System Free Space: File System utilization: 247 Container ID: 2 248 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 249 Container Path: /db2/EB8/sapdata3/NODE0023/YMFACTI14.container002 250 File System Free Space: 16905180 File System utilization: 90% 251 Container ID: 3 252 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 253 Container Path: /db2/EB8/sapdata4/NODE0023/YMFACTI14.container003 254 File System Free Space: 16905236 File System utilization: 90% 255 Free space in the tablespace partition(MB): 821 256 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:27 statistics: 257 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 258 Free space (pages): 52508 Free space (MB): 820 259 Container ID: 0 260 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 261 Container Path: /db2/EB8/sapdata1/NODE0027/YMFACTI14.container000 262 File System Free Space: 16851072 File System utilization: 90% 263 Container ID: 1 264 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 265 Container Path: /db2/EB8/sapdata2/NODE0027/YMFACTI14.container001 266 File System Free Space: 16851056 File System utilization: 90% 267 Container ID: 2 268 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 269 Container Path: /db2/EB8/sapdata3/NODE0027/YMFACTI14.container002 270 File System Free Space: 16851048 File System utilization: 90% 271 Container ID: 3 272 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 273 Container Path: /db2/EB8/sapdata4/NODE0027/YMFACTI14.container003 274 File System Free Space: 16851060 File System utilization: 90% 275 Free space in the tablespace partition(MB): 820


276 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:31 statistics: 277 Pagesize: 16 K Total pages on the partition: 80528 Total usable pages on the partition: 80512 278 Free space (pages): 52556 Free space (MB): 821 279 Container ID: 0 280 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 281 Container Path: /db2/EB8/sapdata1/NODE0031/YMFACTI14.container000 282 File System Free Space: 6332664 File System utilization: 96% 283 Container ID: 1 284 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 285 Container Path: /db2/EB8/sapdata2/NODE0031/YMFACTI14.container001 286 File System Free Space: 6332672 File System utilization: 96% 287 Container ID: 2 288 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 289 Container Path: /db2/EB8/sapdata3/NODE0031/YMFACTI14.container002 290 File System Free Space: 6332684 File System utilization: 96% 291 Container ID: 3 292 Container Type: File Container Total Pages: 20132 Container Stripe Set: 0 293 Container Path: /db2/EB8/sapdata4/NODE0031/YMFACTI14.container003 294 File System Free Space: 6332684 File System utilization: 96% 295 Free space in the tablespace partition(MB): 821 296 Action: UPLOAD check Infocube:ZGTFC026 Tablespace:YMFACTI14 Tablespace Type:Index Partition:35 statistics: 297 Pagesize: K Total pages on the partition: Total usable pages on the partition: 298 Free space (pages): Free space (MB): 299 Free space in the tablespace partition(MB): 300 Tablespace total size (MB): 5745 301 302 303 The INFOCUBE ZGTFC026 AGGREGATES fact tables are spread out over the following nodes: 304 The INFOCUBE 'F' fact table is spread over the following nodes: 305 Data Tablespace: YMAGGRD14 306 Nodes: 8 12 16 20 24 28 32 36 307 This may take a long time.... 308 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:8 statistics: 309 Pagesize: 16 K Total pages on the partition: 22160 Total usable pages on the partition: 22112 310 Free space (pages): 20592 Free space (MB): 321 311 Container ID: 0 312 Container Type: File Container Total Pages: 5540 Container Stripe Set: 0 313 Container Path: /db2/EB8/sapdata1/NODE0008/YMAGGRD14.container000 314 File System Free Space: 96 File System utilization: 100% 315 Container ID: 1 316 Container Type: File Container Total Pages: 5540 Container Stripe Set: 0 317 Container Path: /db2/EB8/sapdata2/NODE0008/YMAGGRD14.container001 318 File System Free Space: 52 File System utilization: 100%


319 Container ID: 2 320 Container Type: File Container Total Pages: 5540 Container Stripe Set: 0 321 Container Path: /db2/EB8/sapdata3/NODE0008/YMAGGRD14.container002 322 File System Free Space: 72 File System utilization: 100% 323 Container ID: 3 324 Container Type: File Container Total Pages: 5540 Container Stripe Set: 0 325 Container Path: /db2/EB8/sapdata4/NODE0008/YMAGGRD14.container003 326 File System Free Space: 84 File System utilization: 100% 327 Free space in the tablespace partition(MB): 321 328 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:12 statistics: 329 Pagesize: 16 K Total pages on the partition: 22128 Total usable pages on the partition: 22080 330 Free space (pages): 20560 Free space (MB): 321 331 Container ID: 0 332 Container Type: File Container Total Pages: 5532 Container Stripe Set: 0 333 Container Path: /db2/EB8/sapdata1/NODE0012/YMAGGRD14.container000 334 File System Free Space: 3911880 File System utilization: 98% 335 Container ID: 1 336 Container Type: File Container Total Pages: 5532 Container Stripe Set: 0 337 Container Path: /db2/EB8/sapdata2/NODE0012/YMAGGRD14.container001 338 File System Free Space: File System utilization: 339 Container ID: 2 340 Container Type: File Container Total Pages: 5532 Container Stripe Set: 0 341 Container Path: /db2/EB8/sapdata3/NODE0012/YMAGGRD14.container002 342 File System Free Space: 3911880 File System utilization: 98% 343 Container ID: 3 344 Container Type: File Container Total Pages: 5532 Container Stripe Set: 0 345 Container Path: /db2/EB8/sapdata4/NODE0012/YMAGGRD14.container003 346 File System Free Space: 3911856 File System utilization: 98% 347 Free space in the tablespace partition(MB): 321 348 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:16 statistics: 349 Pagesize: 16 K Total pages on the partition: 22188 Total usable pages on the partition: 22144 350 Free space (pages): 20624 Free space (MB): 322 351 Container ID: 0 352 Container Type: File Container Total Pages: 5547 Container Stripe Set: 0 353 Container Path: /db2/EB8/sapdata1/NODE0016/YMAGGRD14.container000 354 File System Free Space: 24454772 File System utilization: 85% 355 Container ID: 1 356 Container Type: File Container Total Pages: 5547 Container Stripe Set: 0 357 Container Path: /db2/EB8/sapdata2/NODE0016/YMAGGRD14.container001 358 File System Free Space: 24454752 File System utilization: 85% 359 Container ID: 2 360 Container Type: File Container Total Pages: 5547 Container Stripe Set: 0


361 Container Path: /db2/EB8/sapdata3/NODE0016/YMAGGRD14.container002 362 File System Free Space: 24454796 File System utilization: 85% 363 Container ID: 3 364 Container Type: File Container Total Pages: 5547 Container Stripe Set: 0 365 Container Path: /db2/EB8/sapdata4/NODE0016/YMAGGRD14.container003 366 File System Free Space: 24454776 File System utilization: 85% 367 Free space in the tablespace partition(MB): 322 368 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:20 statistics: 369 Pagesize: 16 K Total pages on the partition: 22028 Total usable pages on the partition: 21984 370 Free space (pages): 20464 Free space (MB): 319 371 Container ID: 0 372 Container Type: File Container Total Pages: 5507 Container Stripe Set: 0 373 Container Path: /db2/EB8/sapdata1/NODE0020/YMAGGRD14.container000 374 File System Free Space: 23266252 File System utilization: 86% 375 Container ID: 1 376 Container Type: File Container Total Pages: 5507 Container Stripe Set: 0 377 Container Path: /db2/EB8/sapdata2/NODE0020/YMAGGRD14.container001 378 File System Free Space: 23266232 File System utilization: 86% 379 Container ID: 2 380 Container Type: File Container Total Pages: 5507 Container Stripe Set: 0 381 Container Path: /db2/EB8/sapdata3/NODE0020/YMAGGRD14.container002 382 File System Free Space: 23266212 File System utilization: 86% 383 Container ID: 3 384 Container Type: File Container Total Pages: 5507 Container Stripe Set: 0 385 Container Path: /db2/EB8/sapdata4/NODE0020/YMAGGRD14.container003 386 File System Free Space: 23266224 File System utilization: 86% 387 Free space in the tablespace partition(MB): 319 388 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:24 statistics: 389 Pagesize: 16 K Total pages on the partition: 22068 Total usable pages on the partition: 22016 390 Free space (pages): 20496 Free space (MB): 320 391 Container ID: 0 392 Container Type: File Container Total Pages: 5517 Container Stripe Set: 0 393 Container Path: /db2/EB8/sapdata1/NODE0024/YMAGGRD14.container000 394 File System Free Space: 13810748 File System utilization: 92% 395 Container ID: 1 396 Container Type: File Container Total Pages: 5517 Container Stripe Set: 0 397 Container Path: /db2/EB8/sapdata2/NODE0024/YMAGGRD14.container001 398 File System Free Space: 13810736 File System utilization: 92% 399 Container ID: 2 400 Container Type: File Container Total Pages: 5517 Container Stripe Set: 0 401 Container Path: /db2/EB8/sapdata3/NODE0024/YMAGGRD14.container002 402 File System Free Space: 13810764 File System utilization: 92% 403 Container ID: 3


404 Container Type: File Container Total Pages: 5517 Container Stripe Set: 0 405 Container Path: /db2/EB8/sapdata4/NODE0024/YMAGGRD14.container003 406 File System Free Space: 13810752 File System utilization: 92% 407 Free space in the tablespace partition(MB): 320 408 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:28 statistics: 409 Pagesize: 16 K Total pages on the partition: 22140 Total usable pages on the partition: 22080 410 Free space (pages): 20560 Free space (MB): 321 411 Container ID: 0 412 Container Type: File Container Total Pages: 5535 Container Stripe Set: 0 413 Container Path: /db2/EB8/sapdata1/NODE0028/YMAGGRD14.container000 414 File System Free Space: 18316136 File System utilization: 89% 415 Container ID: 1 416 Container Type: File Container Total Pages: 5535 Container Stripe Set: 0 417 Container Path: /db2/EB8/sapdata2/NODE0028/YMAGGRD14.container001 418 File System Free Space: File System utilization: 419 Container ID: 2 420 Container Type: File Container Total Pages: 5535 Container Stripe Set: 0 421 Container Path: /db2/EB8/sapdata3/NODE0028/YMAGGRD14.container002 422 File System Free Space: 18316160 File System utilization: 89% 423 Container ID: 3 424 Container Type: File Container Total Pages: 5535 Container Stripe Set: 0 425 Container Path: /db2/EB8/sapdata4/NODE0028/YMAGGRD14.container003 426 File System Free Space: 18316140 File System utilization: 89% 427 Free space in the tablespace partition(MB): 321 428 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:32 statistics: 429 Pagesize: 16 K Total pages on the partition: 22200 Total usable pages on the partition: 22144 430 Free space (pages): 20624 Free space (MB): 322 431 Container ID: 0 432 Container Type: File Container Total Pages: 5550 Container Stripe Set: 0 433 Container Path: /db2/EB8/sapdata1/NODE0032/YMAGGRD14.container000 434 File System Free Space: 12677200 File System utilization: 92% 435 Container ID: 1 436 Container Type: File Container Total Pages: 5550 Container Stripe Set: 0 437 Container Path: /db2/EB8/sapdata2/NODE0032/YMAGGRD14.container001 438 File System Free Space: 12677196 File System utilization: 92% 439 Container ID: 2 440 Container Type: File Container Total Pages: 5550 Container Stripe Set: 0 441 Container Path: /db2/EB8/sapdata3/NODE0032/YMAGGRD14.container002 442 File System Free Space: 12677196 File System utilization: 92% 443 Container ID: 3 444 Container Type: File Container Total Pages: 5550 Container Stripe Set: 0 445 Container Path: /db2/EB8/sapdata4/NODE0032/YMAGGRD14.container003


446 File System Free Space: 12677200 File System utilization: 92% 447 Free space in the tablespace partition(MB): 322 448 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRD14 Tablespace Type:Data Partition:36 statistics: 449 Pagesize: 16 K Total pages on the partition: 22120 Total usable pages on the partition: 22080 450 Free space (pages): 20560 Free space (MB): 321 451 Container ID: 0 452 Container Type: File Container Total Pages: 5530 Container Stripe Set: 0 453 Container Path: /db2/EB8/sapdata1/NODE0036/YMAGGRD14.container000 454 File System Free Space: 17240784 File System utilization: 89% 455 Container ID: 1 456 Container Type: File Container Total Pages: 5530 Container Stripe Set: 0 457 Container Path: /db2/EB8/sapdata2/NODE0036/YMAGGRD14.container001 458 File System Free Space: 17240784 File System utilization: 89% 459 Container ID: 2 460 Container Type: File Container Total Pages: 5530 Container Stripe Set: 0 461 Container Path: /db2/EB8/sapdata3/NODE0036/YMAGGRD14.container002 462 File System Free Space: File System utilization: 463 Container ID: 3 464 Container Type: File Container Total Pages: 5530 Container Stripe Set: 0 465 Container Path: /db2/EB8/sapdata4/NODE0036/YMAGGRD14.container003 466 File System Free Space: 17240772 File System utilization: 89% 467 Free space in the tablespace partition(MB): 321 468 Tablespace total size (MB): 2567 469 Index Tablespace: YMAGGRI14 470 Nodes: 8 12 16 20 24 28 32 36 471 This may take a long time.... 472 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:8 statistics: 473 Pagesize: 16 K Total pages on the partition: 4496 Total usable pages on the partition: 4480 474 Free space (pages): 3416 Free space (MB): 53 475 Container ID: 0 476 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 477 Container Path: /db2/EB8/sapdata1/NODE0008/YMAGGRI14.container000 478 File System Free Space: 96 File System utilization: 100% 479 Container ID: 1 480 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 481 Container Path: /db2/EB8/sapdata2/NODE0008/YMAGGRI14.container001 482 File System Free Space: 52 File System utilization: 100% 483 Container ID: 2 484 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 485 Container Path: /db2/EB8/sapdata3/NODE0008/YMAGGRI14.container002 486 File System Free Space: 72 File System utilization: 100% 487 Container ID: 3 488 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0


489 Container Path: /db2/EB8/sapdata4/NODE0008/YMAGGRI14.container003 490 File System Free Space: 84 File System utilization: 100% 491 Free space in the tablespace partition(MB): 53 492 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:12 statistics: 493 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 494 Free space (pages): 6740 Free space (MB): 105 495 Container ID: 0 496 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 497 Container Path: /db2/EB8/sapdata1/NODE0012/YMAGGRI14.container000 498 File System Free Space: 3911880 File System utilization: 98% 499 Container ID: 1 500 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 501 Container Path: /db2/EB8/sapdata2/NODE0012/YMAGGRI14.container001 502 File System Free Space: 3911840 File System utilization: 98% 503 Container ID: 2 504 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 505 Container Path: /db2/EB8/sapdata3/NODE0012/YMAGGRI14.container002 506 File System Free Space: 3911880 File System utilization: 98% 507 Container ID: 3 508 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 509 Container Path: /db2/EB8/sapdata4/NODE0012/YMAGGRI14.container003 510 File System Free Space: 3911856 File System utilization: 98% 511 Free space in the tablespace partition(MB): 105 512 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:16 statistics: 513 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 514 Free space (pages): 6740 Free space (MB): 105 515 Container ID: 0 516 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 517 Container Path: /db2/EB8/sapdata1/NODE0016/YMAGGRI14.container000 518 File System Free Space: 24454772 File System utilization: 85% 519 Container ID: 1 520 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 521 Container Path: /db2/EB8/sapdata2/NODE0016/YMAGGRI14.container001 522 File System Free Space: 24454752 File System utilization: 85% 523 Container ID: 2 524 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 525 Container Path: /db2/EB8/sapdata3/NODE0016/YMAGGRI14.container002 526 File System Free Space: 24454796 File System utilization: 85% 527 Container ID: 3 528 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 529 Container Path: /db2/EB8/sapdata4/NODE0016/YMAGGRI14.container003 530 File System Free Space: 24454776 File System utilization: 85% 531 Free space in the tablespace partition(MB): 105


532 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:20 statistics: 533 Pagesize: 16 K Total pages on the partition: 4496 Total usable pages on the partition: 4480 534 Free space (pages): 3412 Free space (MB): 53 535 Container ID: 0 536 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 537 Container Path: /db2/EB8/sapdata1/NODE0020/YMAGGRI14.container000 538 File System Free Space: 23266252 File System utilization: 86% 539 Container ID: 1 540 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 541 Container Path: /db2/EB8/sapdata2/NODE0020/YMAGGRI14.container001 542 File System Free Space: 23266232 File System utilization: 86% 543 Container ID: 2 544 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 545 Container Path: /db2/EB8/sapdata3/NODE0020/YMAGGRI14.container002 546 File System Free Space: 23266212 File System utilization: 86% 547 Container ID: 3 548 Container Type: File Container Total Pages: 1124 Container Stripe Set: 0 549 Container Path: /db2/EB8/sapdata4/NODE0020/YMAGGRI14.container003 550 File System Free Space: 23266224 File System utilization: 86% 551 Free space in the tablespace partition(MB): 53 552 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:24 statistics: 553 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 554 Free space (pages): 6740 Free space (MB): 105 555 Container ID: 0 556 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 557 Container Path: /db2/EB8/sapdata1/NODE0024/YMAGGRI14.container000 558 File System Free Space: 13810748 File System utilization: 92% 559 Container ID: 1 560 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 561 Container Path: /db2/EB8/sapdata2/NODE0024/YMAGGRI14.container001 562 File System Free Space: 13810736 File System utilization: 92% 563 Container ID: 2 564 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 565 Container Path: /db2/EB8/sapdata3/NODE0024/YMAGGRI14.container002 566 File System Free Space: 13810764 File System utilization: 92% 567 Container ID: 3 568 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 569 Container Path: /db2/EB8/sapdata4/NODE0024/YMAGGRI14.container003 570 File System Free Space: 13810752 File System utilization: 92% 571 Free space in the tablespace partition(MB): 105 572 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:28 statistics:


573 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 574 Free space (pages): 6740 Free space (MB): 105 575 Container ID: 0 576 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 577 Container Path: /db2/EB8/sapdata1/NODE0028/YMAGGRI14.container000 578 File System Free Space: 18316136 File System utilization: 89% 579 Container ID: 1 580 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 581 Container Path: /db2/EB8/sapdata2/NODE0028/YMAGGRI14.container001 582 File System Free Space: 18316136 File System utilization: 89% 583 Container ID: 2 584 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 585 Container Path: /db2/EB8/sapdata3/NODE0028/YMAGGRI14.container002 586 File System Free Space: 18316160 File System utilization: 89% 587 Container ID: 3 588 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 589 Container Path: /db2/EB8/sapdata4/NODE0028/YMAGGRI14.container003 590 File System Free Space: 18316140 File System utilization: 89% 591 Free space in the tablespace partition(MB): 105 592 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:32 statistics: 593 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 594 Free space (pages): 6740 Free space (MB): 105 595 Container ID: 0 596 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 597 Container Path: /db2/EB8/sapdata1/NODE0032/YMAGGRI14.container000 598 File System Free Space: 12677200 File System utilization: 92% 599 Container ID: 1 600 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 601 Container Path: /db2/EB8/sapdata2/NODE0032/YMAGGRI14.container001 602 File System Free Space: 12677196 File System utilization: 92% 603 Container ID: 2 604 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 605 Container Path: /db2/EB8/sapdata3/NODE0032/YMAGGRI14.container002 606 File System Free Space: 12677196 File System utilization: 92% 607 Container ID: 3 608 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 609 Container Path: /db2/EB8/sapdata4/NODE0032/YMAGGRI14.container003 610 File System Free Space: 12677200 File System utilization: 92% 611 Free space in the tablespace partition(MB): 105 612 Action: ROLLUP check Infocube:ZGTFC026 Tablespace:YMAGGRI14 Tablespace Type:Index Partition:36 statistics: 613 Pagesize: 16 K Total pages on the partition: 7824 Total usable pages on the partition: 7808 614 Free space (pages): 6740 Free space (MB): 105 615 Container ID: 0


616 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 617 Container Path: /db2/EB8/sapdata1/NODE0036/YMAGGRI14.container000 618 File System Free Space: 17240784 File System utilization: 89% 619 Container ID: 1 620 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 621 Container Path: /db2/EB8/sapdata2/NODE0036/YMAGGRI14.container001 622 File System Free Space: 17240784 File System utilization: 89% 623 Container ID: 2 624 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 625 Container Path: /db2/EB8/sapdata3/NODE0036/YMAGGRI14.container002 626 File System Free Space: 17240740 File System utilization: 89% 627 Container ID: 3 628 Container Type: File Container Total Pages: 1956 Container Stripe Set: 0 629 Container Path: /db2/EB8/sapdata4/NODE0036/YMAGGRI14.container003 630 File System Free Space: 17240772 File System utilization: 89% 631 Free space in the tablespace partition(MB): 105 632 Tablespace total size (MB): 736



Appendix B. Query variables

This appendix describes the queries used to simulate a production environment.

B


Query 1Infoprovider ZGTFC002 (or ZGTFC004, ZGTFC006....)Operating Concern 0035Currency Type 10 company currencyValue Type (col 1) 10 actualVersion (col 1) 0Fiscal Year/Period 001.2005 Value Type (col 2) 10Version (col 2) 0Fiscal Year/Period 002.2005 Value Type (col 3) 10Version (col 3) 0Fiscal Year/Period 003.2005 Sales split WOMS

Example B-1 shows these variables. All combinations are used for the runs.

Example: B-1 Query 1: variable example

Query 2 Infoprovider ZGTFC002 (or ZGTFC004, ZGTFC006...), ZGTFC014 Operating Concern 0035Currency Type 10: company currencyValue Type (col 1) 10 actualVersion (col 1) 0Fiscal Year/Period 001.2005Value Type (col 2) 10Version (col 2) 0Fiscal Year/Period 002.2005Value Type (col 3) 10Version (col 3) 0Fiscal Year/Period 003.2005Sales split WOMS

Example B-2 on page 289 shows these variables; the queries are run for all combinations.

TIME 1 TIME 2 TIME 3

001.2005 002.2005 003.2005

001.2005 002.2005 004.2005

001.2005 002.2005 005.2005

001.2005 002.2005 006.2005

001.2005 003.2005 004.2005



Query 3Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005....)Operating Concern 0035Currency Type 10Value Type 10Cust Year / Week 052.2005Version 0Sales OrganizationCompany CodeDistribution ChannelCust Sales Channel 00 to ZZ

Example B-3 shows these variables.


RUN WITH WEEK 010.2005 to 052.2005

Query 4Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005....)Operating Concern 0035Division 00 Distribution channel 05 Sales Organization RU1BSub-Category RUDDBDB02Customer level 4 Select 841219 Planversion 0Value Type 10Fiscal Year 2005Sales Split WMP WMPCurrency Type 10Plant 0000 to ZZZZ

Example B-4 on page 290 shows these variables.


001.2005 002.2005 003.2005

001.2005 002.2005 004.2005

001.2005 002.2005 005.2005

001.2005 002.2005 006.2005

001.2005 003.2005 004.2005

Appendix B. Query variables 289


Query 5 Infoprovider ZGTFC002 (or ZGTFC004, ZGTFC006....)Operating Concern 0035 No Text AvailableCurrency Type B0Posting Period 1 to 1 Sales split WOMSCompany Code RU10Fiscal Year (comparison) 2005Version (comparison) 0Value Type (comparison) 10Fiscal Year 2006Version 0Value Type 10Sales OrganizationDistribution Channel 01PERIOD :1 to 11 to 21 to 32 to 3

The queries are run for all possible combinations.

Query 6Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005....)Operating Concern 0035Currency Type 10Value Type 10Cust Year/Week 019.2005 EXAMPLE, SEE BELOWLocal Product High RUB1EXAMPLE, SEE BELOWVersion 0Cust Sales Channel 00 to ZZ

DIVISION DISTR_CHAN SALESORG CUSTOMER LEVEL

00 05 RU1B 841219

00 05 RU1B 841219

... ... ... ...


The runs are done with week 010.2005 to 052.2005 and Local Product High.

01U101U801UC01UDRUB1...

Query 7Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005....)Operating Concern 0035Division 00Distribution channel 01 Sales Organization RU1A Customer Hier. 06563388 Planversion 0Value Type 10Fiscal Year 2005Sales Split WMP WMPCurrency Type 10



Query 8Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005....)Operating Concern 0035Division 00 Distribution channel 01 Sales Organization RU1A Sub-Category RUDDBDB02 Customer level 4Planversion 0Value Type 10Fiscal Year 2005Sales Split WMP WMP

DIVISION DISTR_CHAN SALESORG CUSTOMER LEVEL

00 01 RU1A 564534

00 01 RU1A 564572

00 01 RU1A 564559

00 01 RU1A 564634

00 01 RU1A. 564590


Currency Type B0Plant 0000 to ZZZZ



Query 9 Infoprovider ZGTFC002 (or ZGTFC004, ZGTFC006....)Operating Concern 0035Currency Type 10 company currencyValue Type (col 1)10 actualVersion (col 1) 0Fiscal Year/Period 001.2005 Value Type (col 2) 10Version (col 2) 0Fiscal Year/Period 002.2005 Value Type (col 3) 10Version (col 3) 0Fiscal Year/Period 003.2005 Sales split WOMS

Example B-7 shows these variables


The queries are run for all combinations.

DIVISION DISTR_CHAN SALESORG Subcategory CUSTOMER LEVEL

00 05 RU1B RUDDCDC01 841219

00 05 RU1B RUDDBDB02 841219

00 05 RU1B RUDDCDC06 841219

00 05 RU1B RUDDCD05 841219

00 05 RU1B RUDDBDB01 841219


001.2005 002.2005 003.2005

001.2005 002.2005 004.2005

001.2005 002.2005 005.2005

001.2005 002.2005 006.2005

001.2005 003.2005 004.2005


Query 10Infoprovider ZGTFC001 (or ZGTFC003, ZGTFC005, and so on)Operating Concern 0035DivisionDistribution channelSales OrganizationSub-Category 01UUBUB01 (or 01UUBCUC01, 01UUCUC02, and so on)Customer level 4Planversion 0Value Type 10Fiscal Year 2005Sales Split WMP WMPCurrency Type B0Plant 0000 to ZZZZ



Appendix C. Scripts for storage

This appendix provides the scripts developed for this project to manage the storage environment.

C


Do backup scriptThe script provided in Example C-1 is the main script and starts the script named backup_node0.

Example: C-1 Initial script

#!/bin/kshset -xold=$(date "+%y%m%d_%H%M")par1=$(date "+ begind=%m/%d/%Y begint=%H:%M")echo "Moving old tdp log files to /db2/EB8/dbs/tsm_config/tdplog/${old}"mkdir /db2/EB8/dbs/tsm_config/tdplog/${old}_oldmkdir /db2/EB8/dbs/tsm_config/tdplog/${old}mv /db2/EB8/dbs/tsm_config/tdplog/*.log /db2/EB8/dbs/tsm_config/tdplog/${old}_olddate

## Starting backup of NODE0000#nohup /db2/db2eb8/scripts/bu2/backup_node0.ksh > /db2/db2eb8/scripts/bu2/backup_node0.logdateecho Starting backup for partitions on host sys3db0ptouch /db2/db2eb8/scripts/bu2/BACKUP_sys3db0p_0_RUNNINGnohup /db2/db2eb8/scripts/bu2/backup_sys3db0p_0.ksh > /db2/db2eb8/scripts/bu2/backup_sys3db0p_0.log sleep 5echo Starting backup for partitions on host sys3db1ptouch /db2/db2eb8/scripts/bu2/BACKUP_sys3db1p_0_RUNNINGnohup /db2/db2eb8/scripts/bu2/backup_sys3db1p_0.ksh > /db2/db2eb8/scripts/bu2/backup_sys3db1p_0.log &sleep 5echo Starting backup for partitions on host sys3db2ptouch /db2/db2eb8/scripts/bu2/BACKUP_sys3db2p_0_RUNNINGnohup /db2/db2eb8/scripts/bu2/backup_sys3db2p_0.ksh > /db2/db2eb8/scripts/bu2/backup_sys3db2p_0.log &sleep 5echo Starting backup for partitions on host sys3db3ptouch /db2/db2eb8/scripts/bu2/BACKUP_sys3db3p_0_RUNNINGnohup /db2/db2eb8/scripts/bu2/backup_sys3db3p_0.ksh > /db2/db2eb8/scripts/bu2/backup_sys3db3p_0.log &sleep 5echo Starting backup for partitions on host sys3db4ptouch /db2/db2eb8/scripts/bu2/BACKUP_sys3db4p_0_RUNNINGnohup /db2/db2eb8/scripts/bu2/backup_sys3db4p_0.ksh > /db2/db2eb8/scripts/bu2/backup_sys3db4p_0.log &sleep 5## wait for completion of all backups#i=0while [[ $i -eq 0 ]]; do i=1 sleep 900 [[ -e /db2/db2eb8/scripts/bu2/BACKUP_sys3db0p_0_RUNNING ]] && i=0


[[ -e /db2/db2eb8/scripts/bu2/BACKUP_sys3db1p_0_RUNNING ]] && i=0 [[ -e /db2/db2eb8/scripts/bu2/BACKUP_sys3db2p_0_RUNNING ]] && i=0 [[ -e /db2/db2eb8/scripts/bu2/BACKUP_sys3db3p_0_RUNNING ]] && i=0 [[ -e /db2/db2eb8/scripts/bu2/BACKUP_sys3db4p_0_RUNNING ]] && i=0done## collect performance data#par2=$(date "+ endd=%m/%d/%Y endt=%H:%M")mv /db2/EB8/dbs/tsm_config/tdplog/*.log /db2/EB8/dbs/tsm_config/tdplog/${old}cd /db2/EB8/dbs/tsm_config/tdplog/${old}/db2/EB8/dbs/tsm_config/tdplog/analyze_logs > sessions.csv/db2/EB8/dbs/tsm_config/tdplog/collect_mount ${par1} ${par2} > mount.csv

Backup node0 script Example C-2 provides the script used to back up the DB partition 0.

Example: C-2 Script to back up the DB partition 0

#!/bin/kshdate mkdir /db2/db2eb8/scripts/bu2/log 2>/dev/nullstarted=0while [[ ${started} -eq 0 ]]do echo db2_all "<<+0< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" db2_all "<<+0< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node0.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node0.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 0"

started=1 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node0.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node0.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 0" started=1 else echo "seems that db2_all start failed for node 0; retrying ..." sleep 5 fi fi donedate

Appendix C. Scripts for storage 297

backup sys3DB0 script Example C-3 provides the script to back up the other partitions of the first group.

Example: C-3 Script to back up other DB partitions of the first group

#!/bin/kshdatesleep 30echo db2_all "<<+1< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+1< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node1.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node1.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 1" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node1.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node1.log | cut -d '=' -f2)echo "db2 backup db EB8 ONLINE completed with RC $RC for node 1" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 1; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+2< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+2< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node2.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node2.log if [[ $? -eq 0 ]] then


echo "db2 backup db EB8 ONLINE completed o.K. for node 2" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node2.log if [[ $? -eq 0 ]]thenRC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node2.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 2" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 2; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+3< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+3< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node3.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node3.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 3" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node3.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node3.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 3" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 3; retrying ..." sleep 5 fi fi done


datesleep 30echo db2_all "<<+4< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+4< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node4.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node4.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 4"retry=2else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node4.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node4.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 4" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 4; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+5< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+5< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node5.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node5.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 5" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node5.log if [[ $? -eq 0 ]] then


RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node5.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 5" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 5; retrying ..." sleep 5 fi fi donedatesleep 30rm /db2/db2eb8/scripts/bu2/BACKUP_sys3db0p_0_RUNNING

Backup sys3db1p scriptExample C-4 provides the script to back up the DB partition of the LPAR1; the same script is used to back up the DB partitions in the other LPARs.

Example: C-4 Script to back up DB partitions in one LPAR (LPAR1, in this example)

#!/bin/kshdateecho db2_all "<<+6< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+6< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node6.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node6.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 6"retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node6.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node6.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 6" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 ))


fi else echo "seems that db2_all start failed for node 6; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+7< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+7< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node7.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node7.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 7" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node7.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node7.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 7" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 7; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+8< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+8< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node8.loggrep "completed ok" /db2/db2eb8/scripts/bu2/log/node8.log if [[ $? -eq 0 ]]


then echo "db2 backup db EB8 ONLINE completed o.K. for node 8" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node8.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node8.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 8" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 8; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+9< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+9< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node9.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node9.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 9" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node9.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node9.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 9" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 9; retrying ..." sleep 5 fi


fi donedatesleep 30echo db2_all "<<+10< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+10< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node10.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node10.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 10" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node10.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node10.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 10" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 10; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+11< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+11< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node11.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node11.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 11" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node11.log


if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node11.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 11" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 11; retrying ..." sleep 5 fi fi donedatesleep 30echo db2_all "<<+12< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+12< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node12.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node12.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 12" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node12.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node12.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 12" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fi else echo "seems that db2_all start failed for node 12; retrying ..." sleep 5 fi fi donedatesleep 30


echo db2_all "<<+13< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting"retry=0while [[ ${retry} -lt 2 ]]dodb2_all "<<+13< db2 backup db EB8 ONLINE load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 1 sessions with 14 buffers buffer 1024 parallelism 8 without prompting" > /db2/db2eb8/scripts/bu2/log/node13.log grep "completed ok" /db2/db2eb8/scripts/bu2/log/node13.log if [[ $? -eq 0 ]] then echo "db2 backup db EB8 ONLINE completed o.K. for node 13" retry=2 else grep "completed rc" /db2/db2eb8/scripts/bu2/log/node13.log if [[ $? -eq 0 ]] then RC=$(grep "completed rc" /db2/db2eb8/scripts/bu2/log/node13.log | cut -d '=' -f2) echo "db2 backup db EB8 ONLINE completed with RC $RC for node 13" if [[ ${RC} -lt 4 ]] then retry=2 else (( retry=retry+1 )) fielse echo "seems that db2_all start failed for node 13; retrying ..." sleep 5 fi fi donedatesleep 30rm /db2/db2eb8/scripts/bu2/BACKUP_sys3db1p_0_RUNNING

RestoreThe following examples provide the scripts to restore the DB partitions after they have been backed up using the scripts provided in the previous examples.

Example C-5 provides the initial restore script.

Example: C-5 Initial restore script

#!/bin/kshnohup /db2/db2eb8/scripts/restore/restore_0.ksh > /db2/db2eb8/scripts/restore/restore_0.lognohup /db2/db2eb8/scripts/restore/restore_1.ksh > /db2/db2eb8/scripts/restore/restore_1.log &nohup /db2/db2eb8/scripts/restore/restore_2.ksh > /db2/db2eb8/scripts/restore/restore_2.log &


nohup /db2/db2eb8/scripts/restore/restore_3.ksh > /db2/db2eb8/scripts/restore/restore_3.log &nohup /db2/db2eb8/scripts/restore/restore_4.ksh > /db2/db2eb8/scripts/restore/restore_4.log &

Example C-6 provides the script used to restore the DB partition 0.

Example: C-6 Script to restore the DB partition 0

#!/bin/kshdateecho db2_all "<<+000< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606221504 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+000< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606221504 with 014 buffers buffer 01024 parallelism 008 without prompting"date

Example C-7 provides the script to restore one set of DB partitions from one LPAR.

Example: C-7 Script to restore one DB partition from one LPAR (LPAR1, in this example)

#!/bin/kshdateecho db2_all "<<+006< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606224633 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+006< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606224633 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+007< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606232619 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+007< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060606232619 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+008< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607000051 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+008< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607000051 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+009< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607003138 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+009< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607003138 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+010< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607010724 with 014 buffers buffer 01024 parallelism 008 without prompting"


db2_all "<<+010< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607010724 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+011< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607014657 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+011< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607014657 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+012< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607022035 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+012< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607022035 with 014 buffers buffer 01024 parallelism 008 without prompting"dateecho db2_all "<<+013< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607025222 with 014 buffers buffer 01024 parallelism 008 without prompting"db2_all "<<+013< db2 restore db EB8 load /usr/tivoli/tsm/tdp_r3/db264/libtdpdb264.a open 2 sessions taken at 20060607025222 with 014 buffers buffer 01024 parallelism 008 without prompting"date


ronyms
ABAP Advanced Business Application
Programming

AIX Advanced Interactive eXecutive

APAR Authorized Program Analysis Report

API Application Programming Interface

BAPI® Business Application Programming Interface

BCU Balanced Configuration Unit

BI Business Intelligence

BLOB Binary Large Object

CI Central Instance

CIM Common Information Model

CIMOM CIM Object Manager

CIO Concurrent I/O

CLI Command Line Interface

CPU Central Processor Unit

CWMI Common Warehouse Metadata Interchange

DA Device Adapter

DB2 HPU DB2 High Performance Unload

DB2 PE DB2 Performance Expert

DB2 UDB ESE DB2 Universal Database Enterprise Server Edition

DBA Database Administrator

DDL Data Definition Language

DIA Dialog Assembly

DMS Database Managed Space

DPF Data Partitioning Feature

DSS Decision Support System

EDU Engine Dispatch Unit

ERP Enterprise Resource Planning

ESE Enterprise Server Edition

ESS Enterprise Storage Server

ETL Extraction, Transformation and Loading

FC Fiber Channel

FCM Fast Communication Manager

FP FixPack

GB gigabyte

GHz gigaHertz

GUI Graphic User Interface

HA Host Adapter

Abbreviations and ac

© Copyright IBM Corp. 2007. All rights reserved.

HACMP High Availability Cluster MultiProcessing

HDD Hard Disk Drive

HSM Hierarchical Storage Management

HTML Hypertext Markup Language

HTTP Hypertext Transfer Protocol

HW Hardware

IBM International Business Machines Corporation

IO Input Output

IP Internet Protocol

ITSO International Technical Support Organization

J2EE Java 2 Enterprise Edition

JCA Java EE Connector Architecture

JDBC Java Database Connectivity

KPI Key Performance Indicator

LAN Local Area Network

LIC Licensed Internal Code

LOB Large Object

LPAR Logical Partition

LRU Last Recently Used

LSS Logical Subsystem

LUN Logical Unit Number

LVM Logical Volume Manager

MB megabyte

ML Maintenance Level

MOLAP Multidimensional OLAP

MPP Massively Parallel Processor

NAS Network Attached Storage

NDMP Network Data Management Protocol

nmon Nigel’s Monitor

NTFS NT File System

NTP Network Time Protocol

ODBO OLE DB for OLAP

ODS Operational Data Store

OLAP Online Analytical Processing

OLTP Online Transaction Processing

OS operating system

PC Personal Computer

309

PDCU Performance Data Collection Utility

POC Proof of Concept

PPRC Peer-to-Peer Remote Copy

PSA Persistent Staging Area

PSSC Product and Solutions Support Center

PTF Program Temporary Fix

RAID Redundant Array of Independent Disks

RAM Random Access Memory

RDBMS Relational Database Manager

RMAN Oracle Recovery Manager

ROLAP Relational OLAP

rpm rounds per minute

rsh remote shell

SAN Storage Area Network

SDDPCM Subsystem Device Driver Path Control Module

SDF SAN Data Gateway

SMI-S Storage Management Interface Standard

SMP Symmetric Multiprocessing

SMS System Managed Space

SMT Simultaneous Multithreading

SQL Structured Query Language

ssh secure shell

SVC SAN Volume Controller

SW Software

TB terabyte

TCP/IP Transmission Control Protocol/Internet Protocol

TDP IBM Tivoli Data Protection

TPC IBM Tivoli Productivity Center

UDB Universal Database

VBA Visual Basic for Application

VPN Virtual Private Network

WAN Wide Area Network

WLM Workload Manager

WP SAP work process

XML Extensible Markup Language

XMLA Extensible Markup Language for Analysis


GlossaryAdvanced Business Application Programming. A high level programming language created by SAP. It is currently positioned as the language for programming SAP's Web Application Server, part of the SAP NetWeaver platform for building business applications.

agent. A software entity that represents one or more objects by sending notifications regarding events and handling requests from managers (software or hardware entities) to modify or query the objects.

aggregate. Stores the dataset of an InfoCube in a summarized form on the database. When building an aggregate from the characteristics and navigation attributes from an InfoCube, you can group the data according to different aggregation levels. Remaining characteristics that are not used in the aggregate are summarized. New data is loaded into an aggregate using logical data packages (requests). Aggregates enable you to access InfoCube data quickly for reporting.

Application Programming Interface (API). The interface that a computer system, library or application provides in order to allow requests for services to be made of it by other computer programs, and to allow data to be exchanged between them.

array. An ordered collection, or group, of physical devices (disk drive modules) that are used to define logical volumes (LVOLs) or devices. In the IBM Enterprise System Storage™, an array is a group of disks designated by the user to be managed with a Redundant Array of Independent Disks (RAID).

Authorized Program Analysis Report (APAR). A term used in IBM for a description of a problem with an IBM program that is formally tracked until a solution is provided. An APAR is created (“opened”) after a customer (or IBM) discovers a problem that IBM determines is due to a defect in its code. The APAR is given a unique number for tracking. When the support group that maintains the code solves the problem, it develops a program temporary fix (PTF).

Balanced Configuration Unit. A Balanced Configuration Unit (BCU) is composed of software and hardware that IBM has integrated and tested as a pre-configured building block for data warehousing systems. A single BCU contains a balanced amount of disk, processing power and memory to optimize cost-effectiveness and throughput. IT departments can use BCUs to reduce design time, shorten deployments and maintain strong price/performance ratio as they add building blocks to enlarge their BI systems.


Binary Large Object (BLOB). A collection of binary data stored as a single entity in a database management system. A Binary Large Object (BLOB) is typically an image, audio or other multimedia object, though sometimes binary code is stored as a BLOB. Database support for BLOBS is not universal.

bit. Either of the digits 0 or 1 when used in the binary numeration system.

Business API. Used in SAP to achieve business related functionalities. It is a remote enabled function module provided by SAP.

Business Intelligence (BI). A broad category of applications and technologies for gathering, providing access to, and analyzing data for the purpose of helping enterprise users make better business decisions. BI is expert information, knowledge and technologies efficient in the management of organizational and individual business.

byte. A group of eight adjacent binary digits that represent one EBCDIC character.

Call Level Interface. A de facto standard software API for SQL-based database management systems, created by The Open Group.

CIM Object Manager (CIMOM). The core component of the implementation of the CIM specification. The CIMOM manages the CIM schema, instantiation, communication, and operation of the physical providers that represent the CIM classes.

cluster. A group of loosely coupled computers that work together closely so that in many respects they can be viewed as though they are a single computer. The components of a cluster are commonly, but not always, connected to each other through fast local area networks. Clusters are usually deployed to improve performance and/or availability over that provided by a single computer, while typically being much more cost-effective than single computers of comparable speed or availability.In certain filesystem types like the File Allocation Table (FAT) filesystem of MS-DOS® or the NTFS filesystem of Windows NT, a cluster is the unit of disk space allocation for files and directories. In order to reduce the overhead of managing on-disk data structures, the filesystem does not allocate individual disk sectors, but contiguous groups of sectors, called clusters. A cluster is the smallest logical amount of disk space that can be allocated to hold a file.

311

Command Line Interface (CLI). An interface provided by an operating system that defines a set of commands and enables a user (or a script-like language) to issue these commands by typing text in response to the command prompt (for example, DOS commands or UNIX shell commands).

Common Information Model (CIM). An implementation-neutral, object-oriented schema for describing network management information. The Distributed Management Task Force (DMTF) develops and maintains CIM specifications.

Concurrent copy. A facility on a storage server that enables a program to make a backup of a data set while the logical volume remains available for subsequent processing. The data in the backup copy is frozen at the point in time that the server responds to the request.

Concurrent I/0 (CIO). The feature of JFS2 provides the capability to not do inode locking of a file except in the case where the inode itself needs to change (such as when the file size is changing). Concurrent I/O also will implicitly use the Direct I/O path; therefore, CIO is the equivalent of Direct I/O without inode locking.To enable CIO, the application can open a file with the O_CIO flag in the open() system call, or the filesystem can be mounted with the CIO mount option. The use of named mounts can also be done with Concurrent I/O as with Direct I/O mounts.

Copy Services. Made of a collection of optional software features, with a Web-browser interface, used for configuring, managing, and monitoring data-copy functions.

Data Definition Language (DDL). A computer language for defining data. XML schema is an example of a pure DDL; a subset of SQL's instructions form another DDL.

data mart. An interface that enables the user to update data from one data target to another. The data mart interface allows the user to update data within a BW system and also between several other systems.If several BW systems are used, the system delivering the data is called the source SAP NetWeaver BI and the system that is receiving data is called the target SAP NetWeaver BI. The individual Business Information Warehouses in this type of setup are called data marts.

data provider. An object that delivers data for one or more Web items.

data set. A record of values that belong together in a relational database table. A data set is saved in the relational database management system (DBMS) as a line.

data warehouse. A data collection or database that is created by integrating various datasets and data from external sources. Data warehouses provide users with a global view of the data that has many applications.

DataSource. An object that makes data for a business unit available to SAP NetWeaver BI. The DataSource contains a number of logically-related fields that are provided in a flat structure for data transfer to SAP NetWeaver BI.

device adapter (DA). A physical component of the ESS that provides communication between the clusters and the storage devices. Multiple DAs are connected to the clusters in such a way that any cluster can access storage device via multiple paths, providing fault tolerance and enhanced availability.

Dialog Assembly (DIA). Used to determine whether the system is to display a list of missing parts with information on calculated quantities and dates; that is, the components that are not fully available. You can also use this indicator to control whether the quantities and dates of the selected components can be processed interactively in the sales order.

Enterprise Resource Planning (ERP). Integrates all data and processes of an organization into a single unified system. A typical ERP system will use multiple components of computer software and hardware to achieve the integration.

Extensible Markup Language (XML). A W3C-recommended general-purpose markup language that supports a wide variety of applications. Its primary purpose is to facilitate the sharing of data across different information systems, particularly systems connected via the Internet. XML allows diverse software reliably to understand information formatted and passed in multiple languages. XML is a simplified subset of Standard Generalized Markup Language (SGML).

fact table. A table in the center of an InfoCube star schema. The data part contains all key figures of the InfoCube, and the key is formed by links to the entries of the dimensions of the InfoCube.

failover. The process of transferring all control to a single cluster when the other cluster in the storage unit fails.

Fibre Channel. A technology for transmitting data between computer devices. It is especially suited for attaching computer servers to shared storage devices, and for interconnecting storage controllers and drives.


FlashCopy. A function on the IBM Enterprise Storage Server that can create a point-in-time copy of data while an application is running. A FlashCopy image is a space-efficient image of the contents of part of the SAN file system at a particular moment. A FlashCopy mapping is a continuous space on a direct access storage volume, occupied by or reserved for a particular data set, data space, or file. A FlashCopy service is a copy service that duplicates the contents of a source virtual disk (VDisk) on a target VDisk. In the process, the original contents of the target VDisk are lost.

Global Copy. An optional capability of the DS8000 remote mirror and copy feature that maintains a fuzzy copy of a logical volume on the same DS8000 or on another DS8000. In other words, all modifications that any attached host performs on the primary logical volume are also performed on the secondary logical volume at a later point in time. The original order of update is not strictly maintained.

Global Mirror. An optional capability of the DS8000 remote mirror and copy feature that provides a 2-site extended distance remote copy. Data that is written by the host to the storage unit at the local site is automatically maintained at the remote site.

IBM TotalStorage DS8000. A member of the IBM TotalStorage Resiliency Family of storage servers and attached storage devices (disk drive modules). The DS8000 delivers high-performance, fault-tolerant storage and management of enterprise data, affording access through multiple concurrent operating systems and communication protocols. High performance is provided by multiple symmetrical multiprocessors, integrated caching, RAID support for the disk drive modules, and disk access through a high-speed serial storage architecture interface.

IBM TotalStorage. The brand name used to identify storage products from IBM, including the IBM TotalStorage DS8000.

InfoArea. Groups meta-objects together in the Business Information Warehouse. Every data target is assigned to an InfoArea. The resulting hierarchy is then displayed in the Administrator Workbench In addition to their property as a data target, InfoObjects can also be assigned to different InfoAreas using InfoObject catalogs.

InfoCube. A quantity of relational tables that are created according to the star schema. An InfoCube describes a self-contained dataset (from the reporting view); for example, for a business-oriented area. InfoCubes are objects that can function both as data targets as well as InfoProviders.

InfoObject. Business evaluation objects (for example, customers or sales) are called InfoObjects in SAP NetWeaver BI. InfoObjects are subdivided into characteristics, key figures, units, time characteristics, and technical characteristics (such as request numbers).

InfoPackage. Describes which data in a DataSource should be requested from a source system. The data can be precisely selected using selection parameters (for example, only controlling area 001 in period 10.1997). An InfoPackage can request the following types of data: Transaction data, Attributes for master data, Hierarchies for master data and Master data texts.

InfoProvider. An analysis-relevant view of a SAP NetWeaver BI object for which queries in SAP NetWeaver BI can be created or executed. There are two types of InfoProviders. One type includes objects that contain physical data; these are known as data targets, such as InfoCubes, ODS objects, and InfoObjects (characteristics with attributes, texts, or hierarchies). The other type includes objects that display no physical data storage, such as InfoSets, RemoteCubes, SAP RemoteCubes, and MultiProviderserm3 definition.

InfoSet. A semantic view of ODS objects and InfoObjects (characteristics with master data) that allows you to create reports on these objects, particularly on the joins between these objects. Unlike the classic InfoSet, this view of data is SAP NetWeaver BI-specific. In the InfoSet builder, InfoSets are created and changed. InfoSets allow you to use the query designer to define reports.

InfoSource. A quantity of all available data for a business event, or type of business event (for example, Cost Center Accounting). An InfoSource is a quantity of information that has been grouped together from information that logically belongs together. InfoSources can contain transaction data or master data (attributes, texts, and hierarchies). An InfoSource is always a quantity of InfoObjects that belong together logically. The structure where they are stored is called a communication structure.

Java EE Connector Architecture (JCA). A Java-based technology solution for connecting application servers and enterprise information systems as part of enterprise application integration solutions. While JDBC is specifically used to connect Java EE applications to databases, JCA is a more generic architecture for connection to legacy systems (including databases). The BI Java Connectors are JCA-compliant.

JDBC. An API for the Java programming language that defines how a client may access a database. It provides methods for querying and updating data in a database. JDBC is oriented towards relational databases.

313

logical subsystem (LSS). Represents the logical functions of a storage controller that allow one or more host I/O interfaces to access a set of devices. The controller groups the devices according to the addressing mechanisms of the associated I/O interfaces. One or more LSSs exist on a storage controller. In general, the controller associates a given set of devices with only one LSS.

Logical Unit Number (LUN). Provided by the storage devices attached to the SAN. This number provides you with a volume identifier that is unique among all storage servers. The LUN is synonymous with a physical disk drive or a SCSI device. For disk subsystems such as the IBM Enterprise Storage Server, a LUN is a logical disk drive. This is a unit of storage on the SAN which is available for assignment or unassignment to a host server.

Logical Volume Manager (LVM). A set of system commands, library routines, and other tools that allow the user to establish and control logical volume storage. The LVM maps data between the logical view of storage space and the physical disk drive module.

master data. Includes the permitted values for a characteristic, also called characteristic values. Characteristic values are discrete names.

metadata. Metadata are data that describe other data. Generally, a set of metadata describes a single set of data, called a resource.

Metro Mirror. A function of a storage server that maintains a consistent copy of a logical volume on the same storage server or on another storage server. All modifications that any attached host performs on the primary logical volume are also performed on the secondary logical volume.

MOLAP. An analytic tool designed to allow analysis of data through the use of a multidimensional data model. MOLAP differs significantly from ROLAP in that it requires the pre-computation and storage of information in the InfoCube (the operation known as “processing”). MOLAP stores this data in an optimized multi-dimensional array storage, rather than in a relational database.

multi-dimensional data. Data in a multidimensional database. Data can include basic data values (loaded from an external source) that represent combinations of the lowest level of members in the dimensions of the database, data values that are calculated from the base data values, and rolled up data values that are created by combining values for members in dimension hierarchies. They are data suitable for business analytics. In the BI Java SDK, the term “multidimensional data” is used synonymously with “OLAP data”.

MultiProvider. A type of InfoProvider that combines data from several InfoProviders and makes it available for reporting. The MultiProvider itself contains no data; its data comes exclusively from the InfoProviders on which it is based. You can assemble a MultiProvider from different combinations of InfoProviders. MultiProviders, like InfoProviders, are objects or views that are relevant for reporting.

Object DataSource (ODS). An object that stores consolidated and cleaned transaction data on a document level. An ODS object describes a consolidated dataset from one or several InfoSources. This dataset can be evaluated using a BEx query. An ODS object contains a key (for example, document number, position) as well as data fields that, as key figures, can also contain character fields (for example, customer). In contrast to multi-dimensional data stores for InfoCubes, data in ODS objects is stored in transparent, flat database tables.

Online Analytical Processing (OLAP). A multidimensional, multi-user, client/server computing environment for users who need to analyze consolidated enterprise data in real time. OLAP systems feature zooming, data pivoting, complex calculations, trend analyses, and data modeling. OLAP is an approach to quickly provide the answer to analytical queries that are dimensional in nature. It is part of the broader category business intelligence, which also includes extract, transform, and load (ETL), relational reporting and data mining. Databases configured for OLAP employ a multidimensional data model, allowing for complex analytical and queries with a rapid execution time. The output of an OLAP query is typically displayed in a matrix format.

OLE-DB. An API designed by Microsoft for accessing different types of data stores in a uniform manner. It is a set of interfaces implemented using the Component Object Model (COM). OLE DB separates the data store from the application that needs access to it through a set of abstractions that include the DataSource, session, command and rowsets. The consumers are the applications that need access to the data, and the provider is the software component that implements the interface and therefore provides the data to the consumer.

Peer-to-Peer Remote Copy (PPRC). A remote-copy service that provides a synchronous copy of a volume or disk for disaster recovery, device migration, and workload migration.


Persistent Staging Area (PSA). A transparent database table in which request data is stored in the form of the transfer structure. A PSA is created per DataSource and source system. It represents an initial store in SAP NetWeaver BI, in which the requested data is saved unchanged for the source system.

Process. A naturally occurring or designed sequence of changes of properties/attributes of a system or object.

Redundant Array of Independent Disks (RAID).A methodology of grouping disk drives for managing disk storage to insulate data from a failing disk drive.RAID 5 is a type of RAID that optimizes cost-effective performance while emphasizing use of available capacity through data striping. RAID 5 provides fault tolerance for up to two failed disk drives by distributing parity across all the drives in the array plus one parity disk drive. The DS8000 automatically reserves spare disk drives when it assigns arrays to a device adapter pair (DA pair).RAID 10 is a type of RAID that optimizes high performance while maintaining fault tolerance for up to two failed disk drives by striping volume data across several disk drives and mirroring the first set of disk drives on an identical set. The DS8000 automatically reserves spare disk drives when it assigns arrays to a device adapter pair (DA pair).

Remote Mirror and Copy. A feature of a storage server that constantly updates a secondary copy of a logical volume to match changes made to a primary logical volume. The primary and secondary volumes can be on the same storage server or on separate storage servers.

RemoteCube. An InfoCube whose transaction data is not managed in the Business Information Warehouse, but externally. Only the structure of the RemoteCube is defined in SAP NetWeaver BI. The data for reporting is read from another system using a BAPI.

ROLAP. An analytic tool designed to allow analysis of data through the use of a multidimensional model. ROLAP differs from MOLAP in that it does not require the pre-computation and storage of information. ROLAP tools access the data in a relational database and generate SQL queries to calculate information at the appropriate level when an end user requests it. With ROLAP, it is possible to create additional database tables (summary tables or aggregations) which summarize the data at any desired combination of dimensions.

Roll back. To remove changes that were made to database files under commitment control since the last commitment boundary.

Roll forward. To update the data in a restored database or table space by applying changes recorded in the database log files.

roll up. Loads data packages (requests) for an InfoCube that are not yet available into all aggregates of the InfoCube. After it has been rolled up, the new data is used in queries.

SAP work process. SAP is a multi-process application as opposed to a multi-threaded architecture. SAP multi-tasks work on a number of defined processes. In configuring the SAP system, the administrator defines how many processes of what kind the system should manage. The types or SAP work processes (WP) are Batch, Dialog, enqueue, update and update2.

source system. System that makes the Business Information Warehouse available for data extraction.

staging. A process that prepares (stages) data in a Data Warehouse.

Subsystem Device Driver Path Control Module (SDDPCM). A loadable path control module designed to support the multipath configuration environment in the IBM TotalStorage Enterprise Storage Server, the IBM System Storage SAN Volume Controller, and the IBM TotalStorage DS family. When the supported devices are configured as MPIO-capable devices, SDDPCM is loaded and becomes part of the AIX MPIO FCP (Fibre Channel Protocol) device driver. The AIX MPIO device driver with the SDDPCM module enhances the data availability and I/O load balancing. SDDPCM manages the paths to provide:- High availability and load balancing of storage I/O- Automatic path-failover protection- Concurrent download of licensed internal code- Prevention of a single-point-failure caused by host bus adapter, fibre channel cable, or host-interface adapter on supported storage.

Symmetric Multiprocessing (SMP). A multiprocessor computer architecture where two or more identical processors are connected to a single shared main memory. Most common multiprocessor systems today use an SMP architecture.

target system. Target Business Information Warehouse BW System to which another BW System is connected as a source system, and into which you can load data using export DataSources.

virtual machine facility. A virtual data processing machine that appears to the user to be for the exclusive use of that user, but whose functions are accomplished by sharing the resources of a shared data processing system. An alternate name for the VM/370 IBM operating system.

315

XML for Analysys (XMLA). A protocol specified by Microsoft for exchanging analytical data between client-applications and servers using HTTP and SOAP as a service on the Web. XML for analysis is not restricted to any particular platform, application, or development language. Using XML for Analysis in the Business Information Warehouse allows a third-party reporting tool that is connected to the BW to communicate directly with the Online Analytical Processing (OLAP) processor.


Related publications

The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this redbook.

IBM RedbooksFor information on ordering these publications, see “How to get IBM Redbooks” on page 318. Note that some of the documents referenced here may be available in softcopy only.

� Building and Scaling SAP Business Information Warehouse on DB2 UDB ESE, SG24-7094

� IBM TotalStorage DS8000 Series: Copy Services in Open Environments, SG24-6788

� IBM System Storage DS8000 Series: Architecture and Implementation, SG24-6786

� IBM System Storage Solutions Handbook, SG24-5250

Other publicationsThese publications are also relevant as further information sources:

� IBM Tivoli Storage Manager for Advanced Copy Services Data Protection for Snapshot Devices, SC33-8208

Online resourcesThese Web sites and URLs are also relevant as further information sources:

� IBM Global Services Method


� Balanced Configuration Unit


� Background SAP NetWeaver BI processes


� NTP and the xntpd daemon


� Fast Communication Manager feature of DB2


� DB2 objects and concepts


� NMON











� NMON Analyer


� Tivoli products


� Locales

http://nlmas.ams.nl.ibm.com/sunsoft/files/SUN/docs/manuals/806-0169.pdf

How to get IBM RedbooksYou can search for, view, or download Redbooks, Redpapers, Hints and Tips, draft publications and Additional materials, as well as order hardcopy Redbooks or CD-ROMs, at this Web site:

ibm.com/redbooks

Help from IBMIBM Support and downloads

ibm.com/support

IBM Global Services

ibm.com/services





http://www.ibm.com/support/

http://www.ibm.com/support/

http://www.ibm.com/services/

http://www.ibm.com/services/


http://nlmas.ams.nl.ibm.com/sunsoft/files/SUN/docs/manuals/806-0169.pdf

Index

Numerics0PSTNG_DATE 11164 bits 14, 164, 169, 210, 250

AABAP 32, 66, 149address group 190Administrator Workbench 66aggregate 6, 98, 101, 143, 158

aggregate tables 83, 131, 143, 158AGGREGATE_NAME 168hierarchy 33

Aggregation 41Aggregation load 29AIX 5L V5.2 14, 19, 115, 210, 250AIX 5L V5.3 36, 220, 243analysis

benchmark 4analysis tools 66Analysis Views 97, 103API 195–196

DS Open 14, 209TSM 14

architecture 4, 74, 167, 175, 185, 207, 242BI Reference architecture 72shared nothing 8, 176shared-nothing 207

array 177auto-extend 124AUTOMATIC 173automatic storage 122AUTORESIZE 124auto-resize 124, 145

Bbackup 196

backup of RDBMS supporting files 200backup with the LAN 186brbackup 204copy backup 201full database backup 199incremental 199–200LAN-free backup 10, 186LANfree backup 210offline 198online 195, 198partial database backup 200server-free backup 10, 186simulated incremental backup 199true incremental backup 199

Balanced Configuration Unit, See also BCU 76BAPI 66baseline 2


Basic Edition 194batch extractor 29BCU 164, 244benchmark 3best load distribution method 104BEx 32, 68BGD_roll 99BGD_trigger 99BI best practices 75bitmap 188blocksize 45, 114, 167br-tools 202buffer pool 135Business Content 66Business Explorer 66Business Information Warehouse Server 66Business Intelligence Method 72business process 66

Ccache 95cache mode 96CIM 209, 222

agent 14, 209, 222, 227Pegasus 14, 210

CIO 210CLI 149cluster 131combined load test 29COMPRESSION 127conclusion

benchmark 3container tag 170control data 198control files 198control instance 120copy

instantaneous copy 188cryptography 210

DDA 179, 189Data Definition Language (DDL) 150Data Definition Language, See also DDL 122data extractors 66data load 15, 40data package size 114Data Partitioning Feature (DPF) 74, 121Data Partitioning Feature, See also DPF 119data protection 196Data Protection for FlashCopy 203Data Protection for FlashCopy Devices for mySAP 201Data Protection for SAP 203Data Protection for Storage for SAP 202

319

Data Protection module 196database container 9database export 199Database Managed Space, See also DMS 210database partitions 164, 207

node number 207DataMarts 33DataSources 67DB2 2, 14, 206

container 181data and index 176, 248DB2 ESE 14, 119, 172, 210DB2 Universal Database ESE, See also DB2 ESE 210DB2 V8.2 3HPU 148–149HPU configuration 152HPU control file 151instance 164logger 176, 248monitoring 11multi-node 8process 130temporary tablespace 176version 9 28

DB2 PE 133DB2 Performance Expert 11, 133DB2 Performance Monitor 11DB2 V9 41DB2_APM_PERFORMANCE 170DB2_BLOCK_ON_LOG_DISK_FULL 171DB2_CORRELATED_PREDICATES 170DB2_EEE_PARALLEL_BACKUP. 213DB2_FORCE_APP_ON_MAX_LOG 170DB2_FORCE_FCM_BP 172DB2_HASH_JOIN 171DB2_PARALLEL_IO 172DB2_STRIPED_CONTAINERS 170DB2_USE_FAST_PREALLOCATION 169DB2_USE_LATCH_TRACKING 170DB2_VENDOR_INI 170DB2_WORKLOAD 167DB2ATLD_PORTS 171DB2CODEPAGE 172DB2COMM 172DB2DBDFT 172DB2ENVLIST 171db2hpu 153DB2MEMDISCLAIM 171DB2MEMMAXFREE 171db2move 149db2nodes.cfg 120, 164, 207, 227db2set 167db6conv 149DBConnect 32DDL 150decision

benchmark 4devices adapter 178DIA 46, 112

dialog process 46, 48dialog task 49dialog-task 52dimension table 69disk drive 177disk mirroring 199DMS 145, 169DPF 14, 36, 74, 131, 244, 249DS8000 8, 176, 185, 203DS8300

internal addressing 190dual extractor 221

EEDW 72E-Fact table 69, 140e-mail servers 196Engine Dispatch Unit, See also EDU 130environment 167Ether channel 36Excel spreadsheet 138execution

benchmark 3Expert user mode 97, 103Express 194Extended Edition 194extended star schema 69, 72Extraction Transformation and Loading, See also ETL 70extractor 47, 49, 112

data extractor 115double extractor 115

Ffact table 69, 145–146fact table, see also Fact-table 7Fact-table 161fallback 153Fast Communication Manager (FCM) 120FC 178, 209

Fibre Channel Host Adapter 178FCM 172

FCM_NUM_BUFFERS 167F-Fact table 69, 140Fiber Channel, See also FC 9file system 8, 145FlashBack 201

Restore 205Flashback

restore 6FlashCopy 6, 188, 200, 206

backup 204pair 188, 201relationship 177

flat structure 67forward recovery 198full volume copy 188


GGlobal Services Method 72

HHACMP 3Hard Disk Drive, See also HDD 179hash partitioning 132hashing key 149, 151hdisk 190High Availability Cluster Multiprocessing (HACMP) 3Hostconnect 185

II/O port 185IBM Subsystem Device Driver 209IBM Tivoli Data Protection 210IBM Tivoli Data Protection for SAP 210IBM Tivoli Storage

Agent 210IBM Tivoli Storage Manager 194

Administrative interface 194Application program interface 195Backup-Archive client 194, 199Basic Edition 194Express 194Extended Edition 194for Advanced Copy Services 195for Application Servers 195for Databases 195for Enterprise Resource Planning 195for Hardware 195for Mail 196for Space Management 196for Storage Area Networks 196for System Backup and Recovery 196server 194

import utilities 199improvements, benchmark 5INCREASESIZE 124incremental 188incremental mode 177, 211index 9InfoCube 11, 29, 68–69

compression 32design 32load 47

InfoObject 67InfoPackages 11, 15, 111InfoProvider 32, 68Informix 195, 197InfoSource 68infrastructure test 5initial node 208instance 120instance owner 120interfaces 195

JJ2_METAMAP 169J2EE 66job-chain 56job-chains 52

KKEYCARD 170KPI 15, 133

KPI-A 20TB 219KPI-A53 220KPI-Frame 216

KPI-A 58KPI-D 34, 57, 59

LLAN 186, 209Level of aggregation 95log

circular log 122file 19, 130, 171, 197–198linear 122log file active 9log file archive 223log file backup 199

Logical Volume Manager (LVM) 3Lotus Domino 196LPAR 6, 177

monitoring 11lssrc 121LTO3 9LTO3 tape drive 9LUN 8, 177, 209LVM 210

Mmaster data 71MAX_LOG 170MAXSIZE 124Methodology 27Metro Mirror 12Microsoft

Microsoft Exchange 196Microsoft SQL Server 195, 197Mode 0 96Mode 1 96Mode 2 96Mode 3 96Mode 4 96model 4MultiProvider 33, 80, 96multi-threading 27

NNAS 194NMON 56, 106, 136

analyser 138

Index 321

NTP 121

OODS 6–7, 51, 68, 112, 140, 146, 221

packet-size 47Offline backup 199OLAP 33, 68, 95Online Query Load 16online test 5Operational Data Store, See also ODS 68Oracle 195

Pp595 2partitioning key 131, 149passive mode 11PDCU 11Pegasus 14Persistent Staging Area (PSA) 6, 67pipe 150Point-in-Time Copy 188PORT_RANGE 171POWER5 3, 178, 220, 249POWER5+ 28, 63, 243, 249process chain 70, 99protect 195PSA 6, 71, 146

Qqrfc 47qualification

benchmark 3query 39query load 30query process 94

Rr3load 148, 154raw device 198RDBMS 5, 197rdisp/autoabaptime 104Redbooks Web site 318

Contact us xiiredistribute 148reference data 71reference point 28REG_LIST_VARIABLES 169registry 167reinstallation tool 196Relational DataBase Management System, See also RDBMS 5relations 197relationship 188RemoteCube 68report tools 66request 98resource requirements 42restore 196

LAN-free restore 11roll-forward 6, 198rollup 142, 144rollup proces 98round robin 36, 104RQRIOBLK 167RSA1 101rsh 121, 136RSPC 100

SSAN 9, 186

connection 209Data Gateway 187Data Gateway data mover 187

SAP 14br-tools 195Graphic User Interface 12Job-Chains 29monitoring 11note 629541 33SAP 196SAP Business Intelligence 66SAP NetWeaver 74SAP NetWeaver BI 3.5 2SAP NetWeaver BI 7.0 74SAP NetWeaver BI Business Explorer 68SAP R/3 31, 66SAPinst 154SAPMSSY6 104

SDDPCM 228SDG 187share nothing architecture 181SHEAPTHRES 167SMI-S 201SMP 207SMS 123SMT 42, 46SORTHEAP 167splitint 203–204SQL 127SQL query 168ssh 121ST03 97, 103star schema 31storage agent 11, 186, 196Storage Area Network, See also SAN 9storage monitoring 11SVC 196SVCENAME instance parameter. 121SYSCATSPACE 123System p 2

Tt0 copy 188table space extent maps 123table space maps 123Table spaces 198Tables 197


TAM 14tape library 9TDP 14tdpdb2hdw 213, 222tdphdwdb2 204temporary data 9TEMPSPACE 123time synchronization 121time-zero copy 188Tivoli Data Protection 9Tivoli Data Protection for Enterprise Resource Planning 10, 206Tivoli Data Protection for FlashCopy 204Tivoli Data Protection for Storage for SAP on DB2 201Tivoli Storage Manager 9

backup-archive client 208client 186, 210for Advanced Copy Services 201for ERP 11for Hardware 10, 206for SAN 11

Tivoli Storage Manager Extended Edition 195Tivoli Storage Manager Server 186, 206, 210TPC 11TPCH 76TSM 6

Uupload 29, 140, 144usage categories 31USERSPACE 123utilities 195

VVBA 138Volume Group 185

WWebSEAL 14WebSphere Application Server 195WLM 11write suspend 206, 211, 227

XXMPERF 11, 57

Index 323


(0.5” spine)0.475”<

->0.873”

250 <->

459 pages

Infrastructure Solutions: Design, Manage, and Optim

ize a 20 TB SAP NetWeaver BI Data W

arehouse

Infrastructure Solutions: Design, M

anage, and Optimize a 20 TB SAP



NetWeaver BI Data W

arehouse

Infrastructure Solutions: Design, Manage, and Optim

ize a 20 TB SAP NetWeaver BI



NetWeaver BI Data W

arehouse



NetWeaver BI Data W

arehouse

®

SG24-7289-00 ISBN 0738486078

INTERNATIONAL TECHNICALSUPPORTORGANIZATION

BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE

IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment.

For more information:ibm.com/redbooks

Infrastructure Solutions:Design, Manage, and Optimizea 20 TB SAP NetWeaver Business Intelligence Data Warehouse

Scalability study of SAP NetWeaver Business Intelligence on IBM System p5

Architectural description, test results, lessons learned

Manage the solution using Tivoli products

In order to improve the performance and operational efficiency of businesses worldwide, a customer using mySAP.com wanted to establish a global business program to define and implement a standardized, group-wide business process architecture and associated master data for the parameterization of the group software tools.

The expected growth of the number of users and the size of the database would be at a level never reached by other customers, however, so IBM was asked to undertake the following:� Test the application to be sure it could sustain such growth.� Prove the manageability of the solution.� Provide recommendations to optimize the infrastructure

architecture.

This IBM Redbooks publication describes the testing that was done in terms of performance and manageability in a SAP NetWeaver BI and DB2 environment on IBM System p when scaling a client’s solution to a data warehouse of 20 terabytes (TB). It also provides recommendations for an architecture to support a potential 60 TB data warehouse.

The book resulted from a joint cooperative effort among the PSSC, the IBM/SAP International Competency Center, the DB2-SAP Center of Excellence, SAP AG, and a customer.

Back cover




Documents

Front cover Infrastructure Solutions - IBM Redbooks · Front cover Infrastructure Solutions: Design, Manage, and Optimize a 20 TB SAP NetWeaver Business Intelligence Data Warehouse