Upload
sharmaisha0902
View
15
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Balanced Optimizer in datastage
Citation preview
2010 IBM Corporation
Information Management
Advanced DataStage WorkshopModule 07 Balanced Optimization
2 2010 IBM Corporation
Information Management
After completing this topic, you should be able to:
Describe what is DataStage Balanced Optimization
Understand the different optimization options available for users
Understand what stages that the Balanced Optimizer will consider for pushing to source and/or target
Module Objectives
2010 IBM Corporation
Information Management
InfoSphere DataStage Balanced Optimization
Provides the same job design as traditional DataStage jobs so there is no recoding required
Leverages investments in DBMS hardware by executing data integration tasks within the DBMS
Optimizes job run-time by allowing the developer to control where the entire job or various parts of the job will execute.
Transform and aggregate any volume of information in batch or real time
through visually designed logic
ArchitectsDevelopers
Optimizing run time through intelligent use of DBMS hardware
2010 IBM Corporation
Information Management
Balanced Optimization: Leveraging Best-of-Breed Systems
Optimization is not constrained to a single implementation style such as ETL or ELT
InfoSphere DataStage Balanced Optimization fully harnesses available capacity and computing power in the DBMS as well as InfoSphere DataStage
Delivering unlimited scalability and performance through parallel execution everywhere, all the time
2010 IBM Corporation
Information Management
How Balanced Optimizer Works
Standard DataStage Design Use the same DataStage stage/link design paradigm Compile it, run it, verify that it works correctly (as normal) Allows the process to capture rich metadata that supports impact and
dependency analysis
Optimization Process optimizer rewrites job graph into a new optimized job defaults push as much I/O and processing as possible into database
targets, then into sources Run optimized job to assess performance and resource usage
Re-optimize as required Supports multiple versions of the optimized job to exist concurrently Test various versions to validate which best balances system
resources and performance characteristics
2010 IBM Corporation
Information Management
Balanced Optimization: Intelligent Pattern Recognition
Intelligence based on a known and prioritized list of processing patterns
Examines the job design looking for these known patterns Determines which patterns can be pushed to source or target database based on the specific DBMS
and user options selected
Optimizes out stages which the DBMS will address as part of its optimizer
Iterative approach - after a known pattern is optimized, the job is reanalyzed until no more known patterns can be optimized
2010 IBM Corporation
Information Management
Repository
Balanced Optimization User InterfaceOptimization Options
Additional Properties
Original and Optimized Job
Detailed Trace / Logging
2010 IBM Corporation
Information Management
Balanced Optimizer: User Driven Options
User can influence the optimization process
Options that are presented are the ones relevant to the job design
Options are preset to maximize performance
User can override to tune the job as they see fit.
2010 IBM Corporation
Information Management
Balanced Optimization Options
Leverage high performance bulk loads into staging with post processing
Use Bulk Loading
If all sources, targets reside in the same database and transformation logic support, push all processing into target
Push all processing into the database
Name for an alternative database where bulk staging is to be used
Staging database name
Push Transformations, Joins, Lookups, Sorts, and Aggregation into database targets where possible
Push processing to database targets
Push Transformations, Sorts, and Aggregation into database sources
Push processing to database sources
2010 IBM Corporation
Information Management
Balanced Optimization Stage Optimization Overview
1as supported by database 2involving data already in the target
1,2
1
Push to Target
1
1
1
Push to Source
Push everything into the (target) databaseUse bulk staging operations (load)Drop unnecessary processing (e.g., sorting)FunnelJoin, LookupAggregationSortingTransformation
2010 IBM Corporation
Information Management
Using Balanced Optimization
originalDataStage
job
designjob
DataStageDesigner
jobresults
compile& run
verify
rewrittenoptimized
job
optimizejob
Balanced Optimization
compile& run
choose different optionsand re-optimize
manually review/edit optimized job
2010 IBM Corporation
Information Management
Balanced Optimization: Performance Considerations
Minimize I/O and data copying/movement source data reductions move the processing to the data keep data in the database(s) - avoid target extractions
Maximize optimization within sources or targets indices, native optimizations, database-specific features
Maximize parallelism I/O from/to databases in the DataStage parallel engine inside the database(s)
2010 IBM Corporation
Information Management
Example
Within Balanced Optimizer Dialog
OriginalDataStage
Job
2010 IBM Corporation
Information Management
Why Leverage Both Engines
Balance processing against operations that scale well on the database (like operations working on indexes) and scalability of the DataStage Parallel Engine
Processing requirements that have no direct SQL equivalents (see box on right for sample)
Leveraging Data Quality components alongside other data integration tasks
Connectivity to other enterprise data sources outside the dbms (ftp, mainframe file, ERP sources, etc)
Sample Unique Functions Transformer
stage and loop variable derivations with circular references
most macros and system variables a few functions and operators (see
User Guide for list) custom transform functions
Lookup stage lookup-fail condition-not-met
Sorting nulls last unique sorts EBCDIC sorting
XML Mix hierarchical and relational
processing
2010 IBM Corporation
Information Management
When Balanced Optimization is Most Attractive
Significant amount of homogenous DBMS integration requirements Existing DBMS infrastructure can support the capacity of processing
for data integration tasks Desire to invest future HW decisions in the DBMS so it can serve
both purposes (database and data integration)
16 2010 IBM Corporation
Information Management
After completing this topic, you should be able to:
Describe what is DataStage Balanced Optimization
Understand the different optimization options available for users
Understand what stages that the Balanced Optimizer will consider for pushing to source and/or target
Module Summary