29
©2006 IBM Corporation. All Rights Reserved. Page 1 White Paper Product Release Information IBM Information Server What’s New in IBM® WebSphere® DataStage® 8.0 IBM Information Server is the industry’s first comprehensive, unified foundation for enterprise information architectures, capable of scaling to meet any information volume requirement. It combines the technologies within the Information Integration Solutions portfolio (WebSphere DataStage, WebSphere QualityStage, WebSphere Information Analyzer, and Information Integrator) into a single unified platform, allowing companies to easily understand, cleanse, transform, and deliver trustworthy and context-rich information. Understand Cleanse Transform Deliver Parallel Processing IBM Information Server Discover, model, and govern information structure and content Standardize, merge, and correct information Combine and restructure information for new uses Synchronize, virtualize and move information for in-line delivery Unified Deployment Unified Metadata Management Rich Connectivity to Applications, Data, and Content This paper outlines what is new in the WebSphere DataStage 8.0 release. This exciting new release is a revolution in data integration and transformation, which contains many enhancements and new features, including many enhancements requested by customers. New features that are specific to the parallel framework are also noted in this document. For more detailed information on these features, please consult the product documentation. Note: This document will be updated prior to the General Availability of the DataStage 8.0 release with additional information about the release.

Data Stagev8

Embed Size (px)

DESCRIPTION

A Brief Picture on Datastage 8

Citation preview

Page 1: Data Stagev8

©2006 IBM Corporation. All Rights Reserved. Page 1

White Paper Product Release Information

IBM Information Server What’s New in IBM® WebSphere® DataStage® 8.0 IBM Information Server is the industry’s first comprehensive, unified foundation for enterprise information architectures, capable of scaling to meet any information volume requirement. It combines the technologies within the Information Integration Solutions portfolio (WebSphere DataStage, WebSphere QualityStage, WebSphere Information Analyzer, and Information Integrator) into a single unified platform, allowing companies to easily understand, cleanse, transform, and deliver trustworthy and context-rich information.

Understand Cleanse Transform Deliver

Parallel Processing

IBM Information Server

Discover, model, and govern information

structure and content

Standardize, merge, and correct information

Combine and restructure information

for new uses

Synchronize, virtualize and move information

for in-line delivery

Unified Deployment

Unified Metadata Management

Rich Connectivity to Applications, Data, and Content

This paper outlines what is new in the WebSphere DataStage 8.0 release. This exciting new release is a revolution in data integration and transformation, which contains many enhancements and new features, including many enhancements requested by customers. New features that are specific to the parallel framework are also noted in this document.

For more detailed information on these features, please consult the product documentation.

Note: This document will be updated prior to the General Availability of the DataStage 8.0 release with additional information about the release.

Page 2: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 2

What are the key WebSphere DataStage 8.0 new features and

capabilities?

• New WebSphere Metadata Server Foundation to better integrate the products across IBM Information Server and support the enterprise; with new meta data services DataStage provides graphical impact analysis and job diff right within the DataStage and QualityStage Designer

• Completely Integrated Data Quality to ensure the most accurate, complete information is made available

• Significant ease-of-use enhancements to improve usability and productivity

• New and Expanded Transformation Functionality to aid DataStage users in job design

• Continued focus on Performance and Throughput Improvements while providing detailed performance analysis and system resource estimation

• Connectivity improvements, including the next generation of connectors

• Common and Enhanced Installation, Configuration, Administration and Reporting across IBM Information Server

• General Enhancements related to specific customer technical support requests

• Migration, Upgrade, and Platform Support

WebSphere Metadata Server Foundation

Architecture IBM Information Server embraces the design concepts of Service-Oriented Architecture (SOA) to deliver multiple discrete services that hide the complexities of distributed configurations, thus allowing services to concentrate on their functionality. With this architecture, individual components within IBM Information Server can be used to compose intricate tasks without custom programming.

A Service-Oriented Architecture enables the design of common components which are accessible and shared by all the other elements of the platform. These Common components allow all of the products in IBM Information Server to operate in a uniform and well integrated manner. By eliminating duplication of functions the architecture makes efficient use of hardware resources and reduces the amount of development and administrative effort required to deploy a data integration platform.

Page 3: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 3

Analysis Interface

Web Admin Interface

Development Interface

UNIFIED USER INTERFACE

COMMON SERVICES

Metadata Services

Security Services

Logging & Reporting Services

UNIFIED METADATA

Design Operational

UNIFIED PARALLEL PROCESSING

Understand Cleanse Transform

COMMON CONNECTIVITY

Unified Service

Deployment

Structured, Unstructured, Applications, Mainframe

Deliver

The figure above shows the five top-level components of the IBM Information Server architecture. The common repository and services are explained in the sections below. The common services layer is deployed on a J2EE compliant application servers, the IBM WebSphere Application Server for the 8.0 release.

Common Metadata Repository IBM Information Server introduces the next generation metadata repository, the WebSphere Metadata Server, that is fully integrated and common across all products in the IBM Information Server, including WebSphere Information Analyzer (the next generation data profiling & analysis technology), WebSphere QualityStage, WebSphere DataStage, and WebSphere Business Glossary. This new repository resides on an open RDBMs (DB2 UDB, Oracle, or SQL Server). If you do not want to use your own database instance, IBM provides DB2 for use specifically with the 8.0 product which is integrated as part of the installation process.

Page 4: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 4

This new dynamic enterprise metadata foundation replaces the metadata prisons of the past and transforms metadata from an “end” to a “means” to manage data and simplify integration. Because the repository is common across all products, when data profiling is occurring using WebSphere Information Analyzer, for instance, the table definitions and the pertinent profiling information – such as primary key information, foreign keys, notes, etc. – is available to a DataStage or QualityStage user in the DataStage and QualityStage Designer, with no export/import, as shown in the screen below.

WebSphere Metadata Server also provides a number of services, located on the server for performance and scalability, that DataStage takes advantage of. This provides, for instance, impact analysis in the context which the DataStage user can better use. More on this in Ease of Use Enhancements below.

Information Services Framework IBM Information Server release brings the next generation of enterprise services, common to all products. This simplifies administration, operation, licensing, and deployment. These services reside on an application server; WebSphere Application Server is provided as part of the installation.

Logging

All products in IBM Information Server will log messages to a common place. Customers using multiple products in the IBM Information Server will no longer have to look in multiple logs for problem determination. For DataStage users, the log will still be available in the Director and through the existing command line interfaces. Users can also view logs from the Web Console for IBM Information Server, a browser-based interface. Administrators can define users to a specific role where that user can only view logs. This allows developers, for instance, to only be able to view logs from a browser to aid in problem determination in a production environment, but they are not allowed to do anything else such as start jobs, change jobs, delete logs, etc. This is critical for locked down production environments. Administrators can also create log views which allow users to only view specific entries in a log.

Security

See the Security section of Enhanced Installation, Configuration, and Administration below.

Integrated Data Quality With the 8.0 release, IBM WebSphere QualityStage™, the best in class data quality product, evolves data quality with the unification of data quality and data transformation capabilities via a combined framework and enhanced design paradigm. QualityStage evolves via a new user interface which is based on a visual drag-and-drop design; an improved set of features delivering increased productivity along with innovations in data matching through a data driven design experience. This is all enabled by a foundational framework of dynamic meta data, active integration services and a high performance engine! To state it simply, QualityStage is now a set of parallel stages which completely integrates DataStage and QualityStage.

Page 5: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 5

The QualityStage stages include:

• Investigate: Analyze your information and re-use that knowledge in the match. This includes field based and pattern based analysis.

• Standardize: Cleanse your data to deliver high quality information with packaged rules or customer built to meet your business.

• Match: Link matching records in your data. The new Match design interface is shown above.

• Survive: Roll up your information to create a single best of breed record

Note: WebSphere QualityStage is a separately licensed product from WebSphere DataStage.

Ease-of-Use Enhancements With the Metadata Server described above, DataStage is taking advantage of the services directly in the DataStage and QualityStage Designer. These include:

• Impact Analysis

• Object Difference (job, table, and routine difference)

• Quick Find and Advanced Find

Note: These new capabilities are available for all DataStage (and QualityStage) users, regardless of job type (server, parallel, and job sequences).

Impact Analysis From the contextual menu of an object in the repository tree or in some cases the stages on the design canvas, several new options are available, including Find Dependencies and Find Where Used.

Page 6: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 6

The results show “What does this item depend on?” and “Where is this item used?”, respectively. The results are shown in a tabular view and graphically. This brings more information to the DataStage and QualityStage user to assess the impact of a change for instance.

Page 7: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 7

The Impact Analysis also allows the selection of an object from the result list and then shows where and how that object is used in a flow in the Impact Analysis – Path Viewer.

In this example, Job HVCustomerContainerStanFreq has a process stage, which has the Standardized output link, containing 20 columns, which came from the BankDemoAccounts table.

The object editor can also be launched from this viewer.

The graphical view has navigation features including a bird’s eye view and zooming. Results can be printed or saved to an XML file for additional processing, or remote user viewing, and can also be published to the Reporting Console.

Page 8: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 8

Object Difference Object difference is now available for jobs, routines, and table definitions. A textual report in a DataStage context is returned.

Hot links inside the report bring the user to the relevant editor in the Designer for the object selected.

Page 9: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 9

Jobs and table definitions can also be compared across projects. The user is required to log into the second project. The results are then shown as described above. This new feature will significantly improve productivity for DataStage and QualityStage users.

The Job Report, first introduced with the DataStage release 7.5 along with the Impact Analysis, Job Diff, and Advanced Find can be printed and all of them can be published to the Reporting Console for viewing from a web browser by anyone with access (see below).

Find Customers have built up their DataStage repositories with literally tens of thousands of objects – jobs, job sequences, shared containers, table definitions, and more. Finding those objects can, at times, be a daunting task, even for the most organized and well documented repositories.

The 8.0 release adds new Quick and Advanced Find features to make it easier to locate objects and work with them.

Available at the top of the Repository View and from the toolbar, the Quick Find allows users to locate objects with the following capabilities:

• Find Name (full and partial)

• Wild card support

• Find next

• Filter on object type

• Include the objects’ descriptions in the search

Page 10: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 10

The Advanced Find, available from an Object’s contextual menu and the tool bar, allows the user to add more advanced filtering criteria to the Find:

• Object type

• Creation

o Date/time

o User

• Last modification

o Date/time

o User

• Where used (What other objects use this object)

• Dependencies (What does this object use)

Now a user can, for example, find all the jobs changed within the past week by Keith.

Advanced options include the ability to restrict case and match on “and description” or “name or description”.

name

re

Results from both the Find and Advanced Find are the same as from Impact Analysis results (tabular view). Quick Find is available anywhere you browse the repository, for example from a stage editor when you abrowsing for a table definition and in the new Export dialog.

New Repository Tree View In the DataStage and QualityStage Designer, Folders have replaced Categories. The restriction on where objects ”live” in the folder structure has also been removed. Jobs, table definitions, routines, etc. can all be in one folder or split among many folders which the user can name. This allows the user to configure the repository content in the way that suits their applications.

The new repository provides locking semantics that now allow more than one user to have a job open. The first user in opens the job for write, and the second user opens the job read-only (the second user is presented with a dialog informing them who has the job locked). This is a long-standing enhancement request from customers that provides increased collaboration.

In addition, the repository tree now has an “expanded view” to optionally show more details of repository items. The properties visible are configurable and each column in this expanded view is sortable.

Page 11: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 11

No More Manager and Export Improvements The DataStage Manager capabilities are now merged into the DataStage and QualityStage Designer. The following are now directly available from within the Designer:

• DSX/XML Import/Export

• DataStage Enterprise Edition Configuration File Editor

• Message Handler Manager

• MetaBroker Import/Export

• Web Service Definition Import

• Import IMS definitions

• JCL templates editor

This means one less Client tool and allows users to directly import meta data in the Designer.

A new Export dialog – launched from the Export menu pulldown, contextual menu, or based on the results from a Find – makes it easier to use and provides functional enhancements. The new Export dialog allows users to export items of different types:

• A single item

• All items in a folder

• A project

• Several items of mixed type/folders

• Export based on the result of a Find

• Export of dependent items

The new GUI allows modification of the original export list by Adding additional objects where users can use the Find capability described above. Filtering options also exist to export job designs and/or executables, and to filter Read-only items.

Page 12: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 12

Job Parameter Sets Job parameters make it easy to parameterize the job at execution time. However, there is no sharing of job parameters between jobs and adding a parameter means adding it through the job property window. There is also no mechanism to easily manage and change job parameters when moving jobs through the product life cycle – Development, Test, System Test, Production.

Now with the 8.0 release, a new repository object called a Job Parameter Set contains the names and values of job parameters. A Job Parameter Set can be added and used by one or more jobs. In addition multiple parameter sets can be associated with a job.

Job Parameter Sets enable users to share job parameters and their associated values across multiple jobs. This makes it easier to share common properties and also enables easier deployment of jobs across machines. And, since Job Parameter Sets are objects in the repository, the user can perform impact analysis to see where (which jobs) use a particular Job Parameter Set.

A new parameter set dialog allows parameters to be added to the parameter set. Environment variables can also be added to a parameter set.

Page 13: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 13

The Values tab on the Parameter Set dialog is used to specify sets of values to be used for the parameters in this set when executing a job. Each set of values is stored in a file of the given name when the parameter set is saved. Parameter set files are stored in the DataStage directory at the same level as the project folder. The Values file can be changed dynamically and the values will be picked up by the job when it is run.

Once a parameter set is associated with a job, it can be used in the stages of the job by referencing ParameterSet.Parameter.

When a job is executed either through the Director (see below) or the command line (dsjob...-param KTParamSet=TestSystem), the user can specify which default values to use for the parameter set.

Page 14: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 14

SQL Builder Enhancements DataStage 7.5.1A added the SQL Builder, a graphical interface for building complex SQL. The SQL Builders contain database specific grammar and parsers which allow users to take advantage of database specific functionality.

The 8.0 release expands the SQL Builder support beyond DB2 UDB and Oracle to SQL Server 2000 & 2005, Teradata v2r6/TTU 8.x, and ODBC 3.52. Support has also been added for INSERT, UPDATE, and DELETE SQL statements. This means the SQL Builder can be used for both the source and target ends of DataStage jobs.

The SQL Builder works within DataStage server and parallel jobs, and with plug-ins, parallel stages, and the new common connectors (see below).

Documentation Improvements

Error Message Manual

The DataStage parallel framework can generate a significant number of messages in the log. These messages do not have a unique identifier and sometimes problem determination can be daunting.

The 8.0 release adds unique message identifiers to every message in the DataStage parallel framework. A new Error Manual will begin to document each message. The message meaning and resolution will also be documented in 8.0 and upcoming releases.

DSEE-TFRS-00013 The record_format variable must have a sub-property type. {0} was the returned value. Explanation:

The record_format variable was used without a sub-property. User response:

You must use either the implicit or varying sub-property when you use the record_format variable. Use varying to specify one of the following blocked or spanned formats: V, VB, VS, or VBS. Data is imported by using the selected format. If you use the implicit sub-property, data is imported or exported as a stream with no explicit record boundaries.

Parallel Job Tutorial

A parallel job tutorial is now included to aid new users in getting started with DataStage. This will also benefit QualityStage users.

New and Expanded Transformation Capabilities The enhancements described in this section are specific to the DataStage parallel environment.

Lookup Enhancement The Lookup stage has been extended to support lookup on a range of values. It now allows a single or multiple row result of “input field A is between table field B and table field C.” This is very useful for date processing.

Page 15: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 15

Surrogate Key Generation Today, users can generate surrogate keys as required in DataStage jobs using the Surrogate Key Generator. However, the user is required to manage the key generation across job runs through parameters. With DataStage 8.0, enhancements have been made to the Surrogate Key Generator stage, the Transformer, and the new SCD stage (see below) where DataStage will now manage the generation of surrogate keys across job execution runs. The keys can be managed in a file or in a DBMS (DB2 and Oracle are supported). With databases, the DBMS sequence functionality is utilized.

Slowly Changing Dimension Stage Many users use DataStage to build and populate star schema data warehouses, usually with Type 1 and Type 2 dimensions to maintain history. While DataStage provides rich capabilities to do this in existing releases, a new stage is now available with the 8.0 release that encapsulates most of the work for the user.

The new Slowly Changing Dimension (SCD) stage processes source data against a dimension table within the context of a star schema database structure. Type 1, Type 2, and a hybrid of both are supported. The SCD stage automatically performs the following actions:

• Prepare the data for loading. This means that the following process is performed for each dimension in the star schema:

o Business key(s) from the source are used to lookup a surrogate key in each dimension table

Page 16: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 16

o Typically the dimension row will be found. If not, a dimension row needs to be created. If a dimension row is found but needs to be updated, the update is performed

o The source data is augmented by the inclusion of the surrogate key, and is reduced by the elimination of non-fact data (i.e., data that is present in the input only for the case that a dimension row would need to be created or updated)

• The record is written or loaded into the fact table (with all surrogate keys)

The SCD stage also introduces a new “Fast Path” concept for improved usability and faster implementation. The fast path walks the user through the screens/tabs of the stage properties required to process the stage. Help is available for each tab by hovering the mouse over the “I” in the lower left.

The first tab of the fast path in the dialog above defines the output link from the SCD stage.

Page 17: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 17

The second step matches the source column with the dimension column to define the lookup. For performance reasons, DataStage will only load the latest dimension records into memory for each partition.

The next tab defines how to create the surrogate key information. As described above, DataStage now handles surrogate key generation and management across job runs. In this example, a specific file is used. A job parameter can also be used to specify a file name. Alternatively, a database (DB2 or Oracle) can be used.

Page 18: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 18

Step 4 defines how to detect changes to dimension records and what data to use when records are created or updated. For Type 2 dimensions for instance, the user defines the current record indicator verses the history records of the dimension.

Finally, map the output columns coming out of the SCD stage. The next stage could be another dimension, or any other DataStage stage.

The new SCD stage will greatly enhance productivity of users that are working with star schema data warehouses.

Page 19: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 19

Performance Improvements The enhancements described in this section are specific to the DataStage parallel environment.

Job Startup Time and More When DataStage parallel jobs start-up, the framework performs job validation, sets up internal process communication, copies transformer libraries to remote nodes, and more. Depending on the job and hardware environment, this startup time is improved with the 8.0 release.

Additional internal performance enhancements have been made in the areas of buffer optimizations and the way the framework combines processes.

Job Monitoring Improvements Job monitoring has been re-architected with the 8.0 release. Not only is performance improved, but time-based monitoring can once again be utilized.

Job monitoring provides useful information for job execution and performance problem determination. However, it should not interfere with the performance of a job executing. Adaptive Job Monitoring is a new feature with the 8.0 release which detects when CPU utilization by the parallel framework’s conductor reaches 80%. When this threshold is reached, the job monitoring data is throttled back and a warning message is issued to the user. Internally, the conductor sends control messages to each player to reduce their output rate.

When time-based monitoring is used, the monitor time interval on players is increased. When record count-based monitoring is used, the record interval is increased until the conductor’s CPU utilization becomes less than 80%.

Only monitor messages are throttled back; metadata and summary messages are not affected.

Resource Estimation Predicting hardware resources needed to run DataStage jobs in order to meet your processing time requirements can sometimes be more of an art than a science. With new sophisticated analytical information and deep understanding of the parallel framework, IBM has added Resource Estimation to DataStage (and QualityStage) 8.0.

With a job open, a new toolbar option is available called Resource Estimation.

This option opens a new dialog called Resource Estimation. The Resource Estimation works by first creating a model of the DataStage job. There are two types of models that can be created:

• Static. The static model does not actually run the job to create the model. CPU utilization can not be estimated, but disk space can be. The record size is always fixed. The “best case” scenario is considered when the input data is propagated. The “worst case” scenario is considered when computing record size.

• Dynamic. The Resource Estimation tool actually runs the job with a sample of the data. Both CPU and disk space are estimated. This is a more predictable model to use for estimating.

Page 20: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 20

Resource Estimation is used to project the resources required to execute the job based on varying data volumes for each input data source.

A projection is then executed using the model selected. The results show the total CPU needed, disk space requirements, scratch space requirements, and more.

Page 21: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 21

Different projections can be run with different data volumes and each can be saved. Graphical charts are also available for analysis, which allow the user to drill into each stage and each partition. A report can be generated or printed with these estimations.

This new feature will greatly assist users in estimating the time and machine resources needed for job execution.

Page 22: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 22

Performance Analysis Isolating job performance bottlenecks during a job execution or even seeing what else was being performed on the machine during the job run can be extremely difficult. DataStage 8.0 adds a new capability called Performance Analysis. It is enabled through a job property on the Execution tab which collects data at job execution time. Note: by default, this option is disabled. Once enabled and with a job open, a new toolbar option is available called Performance Analysis.

This option opens a new dialog called Performance Analysis. The first screen asks the user which job instance to perform the analysis on.

Detailed charts are then available for that specific job run including:

• Job timeline

• Record Throughput

• CPU Utilization

• Job Timing

• Job Memory Utilization

• Physical Machine Utilization – which shows what else is happening overall on the machine, not just DataStage

Each partition’s information is available in different tabs.

Page 23: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 23

A report can be generated for each chart.

Using the information in these charts, a developer can for instance pinpoint performance bottlenecks and re-design their job to improve performance.

In addition to instance performance, overall machine statistics are available. When a job is running, information about the machine is also collected and is available in the Performance Analysis tool including:

• Overall CPU utilization

• Memory utilization

• Disk utilization

Users can also correlate statistics between the machine information and the job performance.

Filtering capabilities exist to only display specific stages.

IBM understands performance analysis can be a complex task. The information collected and shown in the Performance Analysis tool can be easily be sent to IBM for assistance in performance analysis when requested through our Product Support group.

Connectivity Improvements

Next Generation “Rich” Common Connectors The enhancements described in this section are specific to the DataStage parallel environment.

With the 8.0 release, new connectors will be available that are common for all products in IBM Information Server. The new connectors are easier to use and extend functionality from the existing connectors. The new connectivity architecture will also make it easier for IBM to release new connectors and enhancements to them independent of a product (DataStage) release.

Notes: The existing connectors will continue to be supported (see below). The new common connectors for the IBM Information Server are:

• ODBC

o Embedded DataDirect v5.2 Connect for ODBC drivers

• WebSphere MQ

o Adds support for “client only” configuration

• Oracle 10g

• DB2 UDB

Page 24: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 24

o DPF and non-DPF environments

• Teradata

o New support for Teradata Parallel Transport (TPT)

Note: Some of the new common connectors will be delivered after DataStage 8.0 initially becomes Generally Available.

New GUI

A new common GUI is provided for each common connector. A navigator panel allows users to select stages and links easily with Explorer-style navigation. Drag and drop connection objects make it easy to configure a connection (see below). The SQL Builder is integrated to assist users to build SQL statements. The source/target and properties are validated at design time, with warning indicators for properties requiring user attention. Job parameters can be used/inserted for any property.

Stage/Link Overview

Stage/Link Properties in Explorer model

Built in Connection Test

BLOB Support

With the new common connectors, DataStage has been extended to support BLOB’s. BLOB support allows BLOB’s to be moved from a data source to a target without paying a huge performance penalty. As BLOBs typically are not manipulated as part of a data integration flow, they are referenced by a location in the job versus sending the BLOB through the DataStage job itself. Only when the target is written, is the BLOB moved. BLOB support will be added as the new common connectors are released after the 8.0 release.

Connection Objects

Connection objects are new objects that hold connection path information (username, password,

Page 25: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 25

ditor.

Connection objects are used to save stage connection properties to be later used when building a job.

e

Connection objects are tied to a particular

connectors

• Server: Database

ilt-

xpanded Support for the Stored Procedure Stage er and Teradata databases. The support defined functions, table user-defined

Netezza. There are also new parallel iWay and WebSphere Classic Federation stages for easier access to distributed and

database name, etc.) to a particular source or target which allows saving and reusing connection information. Connection objects are created manually, during metadata import, and from a stage e

They can be dragged and dropped from the repository tree and also be used for metadata import from that source or target. Drag and drop the table imported from that source or target onto the canvas to create a pre-populated stage instance. Connection objects are used “by reference” at design time. Thstage editor displays the current state of the Data Connection, not the state when it was first loaded in thestage instance.

stage. Connection objects are supported on the following stages:

• New common

• Parallel stages: DB2 UDB, Informix XPS, Oracle, Teradata

DataStageplug-ins (e.g., Oracle OCI, DRS, DB2 API, etc.) and buin stages including ODBC,Universe, and Unidata

EThe Stored Procedure Stage is expanded to support SQL Servfor Teradata includes: stored procedures, macros, scalar user-functions, and external stored procedures.

New & Enhanced Connectivity New parallel stages exist to connect toInformation Integration Federation andmainframe data. Support for Informix v10 has be added along with Oracle 10gR2, SQL Server 2005,Sybase ASE 15, sftp, Teradata V2R6.1/TTU 8.1, and more.

Page 26: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 26

Enhanced CFF Stage The enhancements described in this section are specific to the DataStage parallel environment.

The CFF stage has been enhanced to make it easier to read in files, particularly mainframe files, that have multiple formats for each record.

Enhanced Installation, Configuration, Administration & Reporting

Installation The installation process has been completely re-written with all of the software of IBM Information Server in one platform installation process and media. Multiple CD’s just for DataStage are a thing of the past. Also, authorization codes are gone. Licensing is done by IBM through a simple licensing file that is read at installation time.

Security Users, assignment of groups, and roles are now done at the Web Console for IBM Information Server. Integration with LDAP or Active Directory is also provided. All products in IBM Information Server, including DataStage, authenticate using this new service. This provides one place for userid administration for all products.

Note: LAN Manager support is removed from the DataStage Client logon screen with the 8.0 release. You will no longer see the “Omit” check box.

Page 27: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 27

DataStage Administration

The DataStage Administrator client tool still exists for DataStage (and QualityStage) specific administration tasks. The DataStage Administrator client tool is used to set-up DataStage and QualityStage projects, assign users & roles, and perform other DataStage specific tasks. Only authorized DataStage administrator-level users can use the DataStage Administrator tool.

The DataStage user roles have been expanded with the DataStage 8.0 release.

• There is new “DataStage Administrator” role at the IBM Information Server level for DataStage and QualityStage use of the DataStage Administrator tool.

• A new “Super Operator” role who can run and view objects in the Designer, but cannot change them.

Reporting Console A new browser-based Reporting Console is provided with IBM Information Server. Reports are available to users who have access. The products of Information Server, such as Information Analyzer, publish reports to the Reporting Console. Information Analyzer will publish reports on the results of data profiling. DataStage and QualityStage can publish reports such as the job report, results of Find & Impact Analysis, and more.

Page 28: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 28

New Source-to-Target and Target-to-Source Job and Database reports are available in the Reporting Console. This allows users to build a report based on selecting a job and the columns to see in the report. The report traverses the job either forward (source-to-target) or “in reverse” (target-to-source) of the columns and their transformations inside the job.

General Enhancements An expand FILLER capability has been added to the CFF stage in the WebSphere DataStage MVS Edition. DataStage MVS Edition also gets the benefit of many of the services explained above including “where used” impact analysis and Find.

DataStage Enterprise Edition has better handling of failed conversions in the transformer e.g. when a string to decimal conversion fails, we used to just report it in the log, but we now send the record down the reject link if one exists for the transform.

In DataStage Enterprise Edition, in the CFF stage better support for scaled COMP types, such as

Page 29: Data Stagev8

What’s New in WebSphere DataStage 8.0

©2006 IBM Corporation. All Rights Reserved. Page 29

S9(16)V99 has been added. Previously DataStage would read this in as an integer and then the user would need to divide by 100 to the right value. DataStage now handles this transparently.

Migration and Upgrades Existing DataStage installations from 5.x through 7.5.1A can be upgraded into the 8.0 release and repository with no changes to your job designs. For Unix Server users, DataStage 8.0 can be installed alongside the existing DataStage Server installation. Migration can be performed from your existing DataStage projects into 8.0 using export/import.

Notes:

• DataStage Version Control will not be supported in DataStage 8.0.

• The job release feature is no longer supported. Users should use the export functionality provided.

• XML Pack v1 is not supported with DataStage 8.0.

What Platforms Are Supported? The DataStage 8.0 release supports the following server platforms:

• Microsoft Windows Server 2003

• AIX 5.2, 5.3

• HP-UX 11i (11.11), 11iv2 (11.23) for PA-RISC

• Solaris 2.9, 2.10

• Red Hat Enterprise Linux AS 4.0

• SuSE Enterprise Linux 9, 10

DataStage Enterprise MVS Edition will also be available for IBM z/OS 1.1, OS/390 v2.6, 2.8, v2.10

The DataStage and QualityStage Client platform support is:

• Microsoft Windows XP SP2

More Information Contact your IBM representative or log on to www.ibm.com.