Upload
gabriella-minge
View
235
Download
5
Tags:
Embed Size (px)
Citation preview
Cindy Fung
Milano, Italy September 9 – September 10, 2003
IBM Red Brick Warehouse Server 6.3 Features
IBM Corporation 2003
European Red Brick Users’ Group Conference 2003
European Red Brick Users’ Group Conference 2003
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
6.3 Features
Major Performance Improvements Dynamic Smartscan
Memory mapping of dimension index/tables for STARjoin
TARGETjoin improvements for local index
Optimizer hints to specify STARindex for fact-to-fact STARjoin
Table Management Utility (TMU) Memory Tuning
Vista Maintenance Enhancement
Major Usability Improvements Additional SQL/OLAP functions
Expression support in the Loader
XML Improvements in Loader
Compact System Catalog Utility
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
6.3 Features
Major Usability Improvements – cont’d Allow 3GB Address Space on Windows Platform
Delimiter enhancements in Loading and Exporting
Alter Table with Working Segment
Interoperability with DB2 products
System Port Support HP Itanium Product Family (IPF)
OS Versions Upgrade AIX v5.2
Sun Solaris 9
HP-UX IPF 11i
Windows32 on Server 2003
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Dynamic SmartScan
Additional queries could be considered for SmartScan segment elimination
Include constraints not on the segment column and the fact table is segmented by the referenced foreign key
RISQL> Select Sum(Dollars) From Sales, Period
Where Sales.perkey = Period.perkey and
Period.date >= ’01-01-01’ and (dimension non-primary
Period.date <= ’12-31-01’; key constraints)
Improve selectivity estimates, consider results of segment elimination, both static and dynamic segment elimination
more accurate STARjoin plan choices selection
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
MMAP Dimension Table/Index
Reduce CPU overhead and I/O system calls by mmap dimension indexes and tables into shared memory
Apply to STARjoin/TARGETjoin/tablescan plans Typically contain Btree-1-1-Match (B11M) and Functional Join operators
to perform joins and row fetches to dimension tables
High benefit for queries with large number of rows produced from join(s) below the B11M and Functional Join operators
Mmap could potentially improve performance of
B11M when mmap corresponding dimension primary index Functional Join when mmap corresponding table Multiple queries/users sharing the single copy in shared memory
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
TARGETjoin Performance
Improve performance of TARGETjoin Particularly for local indexes
More consistent performance between tightly and loosely constraints
More efficient index access for target join and scan operators
Local Index TARGETjoin improvement from 0 - 500% Preliminary test results approaching STARjoin performance in about 50%
of the time
Allow single column B-Tree indexes in TARGETjoin Consider B-Tree index on foreign keys with very large dimensions
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Optimizer Hints
STAR indexes can be specified for queries on a per table basis Must be careful when overriding optimizer selection
Only settable available, not persistent across sessions
Can be set for : A specific STAR index
Multiple STAR indexes
Set of STAR indexes on a multi-fact table combination
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
TMU Memory Tuning
Better control over TMU memory resource usage
Allows TMU buffer memory to be tuned according to the load job
Introduces memory balancing between parallel loader tasks
Quickly allocate large amounts of buffer memory
Prevents excessive use of system memory by defining a maximum amount of buffer memory that could be used by the load job
Reports on TMU buffer usage: fine tuning for repetitive load jobs
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
TMU Memory Tuning
Tuning rule of thumb: more logical I/O requires more buffers
Syntax:
SET TMU MAX BUFFERS number_of_blocks
SET TMU CONVERSION BUFFER PERCENT p
SET TMU OUTPUT BUFFER PERCENT p
SET TMU INDEX BUFFER PERCENT p
Recommend using new tunables over SET TMU BUFFERS approach
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Vista Maintenance Enhancements
Currently, precomputed views grouped by nullable columns can only be maintained by rebuilding them from the detail data
This restriction will be relaxed in 6.3
Precomputed views grouped by nullable columns can be maintained incrementally
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Vista Maintenance Enhancements
create view sales_view as (select region, year, sum(dollars) as sum_dollars from sales, market, period
where sales.perkey = period.perkey and sales.mktkey = market.mktkey
group by region, year)using sales_agg (region, year, sum_dollars);
create view sales_view as (select region, year, sum(dollars) as sum_dollars from sales, market, period
where sales.perkey = period.perkey and sales.mktkey = market.mktkey
group by region, year)using sales_agg (region, year, sum_dollars);
Where region and year are nullableWhere region and year are nullable
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Vista Maintenance Enhancements
RISQL> update sales set dollars = dollars+1 where promokey > 5;** WARNING ** (1932) The precomputed view SALES_VIEW has been marked invalid.** INFORMATION ** (212) Rows updated: 2.** STATISTICS ** (1458) CHOOSE PLAN (ID: 1) Choice: 1 was chosen 1 times.** STATISTICS ** (1976) Precomputed view SALES_VIEW maintained by incremental
maintenance.** STATISTICS ** (1458) CHOOSE PLAN (ID: 5) Choice: 1 was chosen 1 times.** STATISTICS ** (1971) Precomputed view maintenance for SALES_VIEW caused rows
to be updated in SALES_AGG. Rows updated: 2.** STATISTICS ** (1458) CHOOSE PLAN (ID: 8) Choice: 1 was chosen 1 times.** WARNING ** (1967) Precomputed view maintenance for SALES_VIEW succeeded and
it has been marked valid.
RISQL> update sales set dollars = dollars+1 where promokey > 5;** WARNING ** (1932) The precomputed view SALES_VIEW has been marked invalid.** INFORMATION ** (212) Rows updated: 2.** STATISTICS ** (1458) CHOOSE PLAN (ID: 1) Choice: 1 was chosen 1 times.** STATISTICS ** (1976) Precomputed view SALES_VIEW maintained by incremental
maintenance.** STATISTICS ** (1458) CHOOSE PLAN (ID: 5) Choice: 1 was chosen 1 times.** STATISTICS ** (1971) Precomputed view maintenance for SALES_VIEW caused rows
to be updated in SALES_AGG. Rows updated: 2.** STATISTICS ** (1458) CHOOSE PLAN (ID: 8) Choice: 1 was chosen 1 times.** WARNING ** (1967) Precomputed view maintenance for SALES_VIEW succeeded and
it has been marked valid.
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
More SQL/OLAP Functions
Distribution FunctionsCUME_DISTPERCENT_RANK
Inverse Distribution Functions (Median)PERCENTILE_CONTPERCENTILE_DISC
Scalar function ROUND
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Distribution Functions CUME_DIST() computes the position of specified row value relative
to the set of values (# of values equal to or less than x) / (total # of values)
PERCENT_RANK() returns the percent rank of a value relative to a group of values
(rank of row in partition –1) / (# of rows in partition –1)
Example
Price Percent_rank
Cume_dist
100,000 0 0.25
220,000 0.33 0.75
220,000 0.33 0.75
230,000 1 1
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Inverse Distribution Functions
Answers question such as “What is the median (50th percentile) value of my data?”
Require a sort specification and a parameter that takes a value between 0 and 1
Use the new WITHIN GROUP clause to specify the data ordering
Compute the results within a partition
ExampleSelect Area, Price,
PERCENTILE_CONT(0.5) WITHIN GROUP (Order by Price) OVER (Partition by Area) as Median_cont,
PERCENTILE_DISC(0.5) WITHIN GROUP (Order by Price) OVER (Partition By Area) as Median_disc
From Homes;
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Inverse Distribution Functions - 2
For continuous distribution, interpolate between uptown rows 2 and 3
For discrete distribution, select a distinct value from row 2
AreaArea Area Price Median_cont Median_disc
Uptown 310,000 550,000 500,000
Uptown 500,000 550,000 500,000
Uptown 600,000 550,000 500,000
Uptown 700,000 550,000 500,000
Downtown 100,000 220,000 220,000
Downtown 220,000 220,000 220,000
Downtown 230,000 220,000 220,000
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
ROUND() scalar function
ROUND() returns rounded number to the integer places left or right of the decimal point
Examples ROUND (864.827, 2) = 864.830
ROUND (864.827, 1) = 864.800
ROUND (864.827, 0) = 865.000
ROUND (864.827, -1) = 860.000
ROUND (864.827, -2) = 900.000
ROUND (864.827, -3) = 1000.000
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Expression Support in Loader
Input data can now be modified while being loaded to a table
Basic arithmetic operations now supported
Modification also possible based on conditions
A pseudo column can now be assigned to a target column
Multiple conditions now possible in ACCEPT/REJECT clause with some limitations
Highly requested functionality
More integrated with the server than ETL tools
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Expression Support in Loader
Syntax: (snippets from TMU control file)
Arithmetic expressions:$A POSITION(2) INTEGER EXTERNAL(10),ColA ($A + 5)/2
Conditions:$A POSITION(2) INTEGER EXTERNAL(10),ColB CASE WHEN $A > 5 THEN $A+3
WHEN $A = 5 THEN $A-1 ELSE $A+1
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Expression Support in Loader
Pseudo column assignment$A POSITION(2) INTEGER EXTERNAL(10),ColC $A
ACCEPT/REJECT clause
ACCEPT ($A > 5 AND $B < 10) OR ($C = 15)
Limitation:If real columns are used, then just a single condition is allowed. With pseudo columns multiple conditions are allowed (example above)
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
XML Improvements in Loader
Extends TMU and SQL Export functionality to provide additional XML support
XML multiple namespaces support
Export generates default namespace
Upgrade to the IBM XML4C v5.x parser key performance enhancements as well as critical fixes over 6.2
Xerces version
Seamless upgrade to new parser
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
XML Improvements in Loader
TMU Support for using multiple namespaces in the input XML file Choose elements and attributes from the input XML file,
by their Namespace Prefixes.
SQL EXPORT generates default Namespace URI.
Facilitates use of Namespaces: Data integration: Where data from different sources to be loaded
can be identified by different namespaces.
Tools/Process Integration: XML data that are generated or used by various tools rely on namespaces.
Standards compliance: All XML standards like XML-Schema, XSL etc. make heavy use of XML namespace terminology.
IBM Software Group
®
IBM Software Group
Nested XML structures semanticsMultiple occurrences of an XML tag within a ‘Processing Unit’
can cause the generation of multiple row combinations:<ALLProducts> …::<product><brand> Aroma
<category name=’Coffee’grind=’Whole bean’ weight=’2’/>
<category name=’Coffee’grind=’Espresso’ weight=’1’/>
</brand></product>::</ALLProduces>
One ‘product’Processing Unit
Multiple ‘product’ Rows 1EspressoCoffeeAroma
2Whole bean
CoffeeAroma
WeightGrindNameBrand
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Compact System Catalog
Compacts free space within system catalog Occurs when objects are freed but not at the end of the catalog
Extension of System Catalog enhancement in 6.2 where free space is released at the end of the system catalog
Rb_syscompact Does the compaction
Requires DBA privilege
Creates a backup file
Checks for catalog sanity before compaction
-show option reports percentage that could be recovered
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
3GB Address Space on Windows
Extend beyond 32-bit memory access limit
Increase virtual address space from 2GB to 3GB
Feature Advantages More data can be cached in physical memory
Greater scalability and performance
Supported on 32-bit versions of the Windows® 2000 Advanced Server
32-bit versions of Windows.NET Server
Enabled on executables: rb_tmu.exe, rb_ptmu.exe, rbw.exe, rbwtest.exe and risqltty.exe
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
3GB Address Space on Windows
Things to do before the Feature taking the effect :
1. Adding /3GB into the boot.ini file
[boot loader] timeout=30 default=multi(0)disk(0)rdisk(0)partition(2)\WINDOWS [operating systems] multi(0)disk(0)rdisk(0)partition(2)\WINDOWS="????“ /3GB
2. Allocating enough swapping spaces
set the total paging file size to 150% of physical memory or if your system has more than 2GB of physical memory, then set the total paging file size to 3GB
Maximum of 3GB virtual bytes can be committed per process. We can monitor the behavior through the performance monitor
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Loading with Multiple Characters Separator
Apply to loading and exporting in delimited format
new load format clause syntax: format separated by ‘ <separator> ’ [ enclosed by ‘<string delimiter>’ ]
Separator may consist of 1 to 10 characters may be composed of single or multi-byte characters
Feature Advantages: Data generated by other ETL tools that use multiple characters
separator could be loaded directly without modification
Data containing separator string will be loaded correctly as long as the data is enclosed within the string delimiter
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Loading in Multiple Characters Separator &Export with String Delimiter Support
Changes affect only the loading and exporting in delimited format.
Loading:
new loading format clause syntax: format separated by ‘ <separator> ’ [ enclosed by ‘<string delimiter>’ ]
Separator can consist of one to ten characters and can be composed of single or multibyte characters
String delimiter must be one character and it can be composed of single or multibyte character. String delimiter usage is optional.
String delimiter cannot match with any separator characters
Both separator and string delimiter cannot contain a carriage return character, a line feed character, a single quote, a radix point, or a binary zero.
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Loading in Multiple Characters Separator &Export with String Delimiter Support
If the data contains the string delimiter character, we have to escape the data
e.g. If we plan to load the data Apt#12 and string delimiter is # , then the input file will look like : #Apt##12#|#…..#|#……#|
Feature Advantages: Data generated by other ETL tools that using multiple characters separator can be
loaded directly without modification.
Data containing separator string will be loaded correctly as long as the data is enclosed within the string delimiter.
Export: New export command syntax: export to ‘xxx’ format delimited [by ‘<export delimiter>’ ] [enclosed by ‘<string delimiter>’ ]
(<select query>);
Export delimiter and string delimiter must be one character and it can be composed of single or multibyte character.
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Loading in Multiple Characters Separator &Export with String Delimiter Support
String delimiter cannot be the same as export delimiter.
Both export delimiter and string delimiter cannot be a carriage return character, a line feed character, a single quote, a radix point, or a binary zero.
If the export data that contains the string delimiter character, we have to escape the character in the output file
e.g. data inside the table is Apt#12 and the string delimiter is #, the export data output file will look like:
#Apt##12#|#…#|#…#| Feature Advantages:
User can choose different export delimiter for each export command.
Export data can be directly load back into the Database using loader.
Export delimiter can be part of the data content when string delimiter encloses the data.
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Export with String Delimiter Support
Adds delimiter support to enclose a string
New export command syntax: export to ‘xxx’ format delimited [by ‘<export delimiter>’ ] [enclosed by ‘<string
delimiter>’ ] (<select query>);
Export delimiter and string delimiter must be one character may be composed of single or multi-byte character
Feature Advantages: May specify a different export delimiter for each export command
Export data may be directly loaded back into a database using the loader
Export delimiter can be part of the data content when string delimiter encloses the data
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
ALTER TABLE Using a Working Segment
Provides more reliable recoverability of failed alter operations than existing alter table IN_PLACE
Working segment can be reused after the alter operation is over
The table is still altered in place, working segment is backup copy
Syntax:
ALTER TABLE <table_name> [ADD | DROP] COLUMN
IN_PLACE [USING <segment_name>]
New feature is strongly recommended over IN_PLACE alter
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
ALTER TABLE enhancements
Much requested feature
Combines nominal space requirements of IN_PLACE alter with reliable recovery characteristic of alter in other segments
As a table segment is altered, its original contents are temporarily stored in a standard, user-defined segment (a ‘working segment’)
If the alter fails (e.g. due to a full disk), original contents of the table segment are available in the working segment for the alter to be resumed and completed successfully
Additional disk space required for the working segment is only as much as the largest segment of the table
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
HP-Itanium Porting Project
A native port, not architectural emulation Yields high performance by directly taking advantage of Itanium’s
architecture
No need to convert data Red Brick databases created on PA-RISC will be fully compatible with
Red Brick on HP-Itanium
For TRU64, data must be unloaded externally, then loaded to HP-UX
Must be done because of little-to-big Endian conversion
64-bit Driver Manager port from DataDirect
Currently, no vendors plan to support XBSA Backup/Restore interface on HP-Itanium
BAR to files or UNIX tapes
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Interoperability DB2 Products
Warehouse Manager support Red Brick in DB2 8.1 FP2
QMF for Windows (ODBC only)
Tivoli Storage Manager
Information Integrator R1 with DB2 V8.1 FP4 Started beta in August
GA in November 2003
Intelligent Miner for Data Under consideration
IBM Software Group | DB2 Information Management Software
IBM Data Management Technical Conference
Questions ?