Teradata 12.0

Embed Size (px)

Citation preview

  • 7/31/2019 Teradata 12.0

    1/4

    ctive data warehousing is the

    process of mixing strategic and

    operational intelligence on the same plat-

    form by utilizing up-to-date enterprise

    data. With active data warehousing, very

    high requirements are placed on the

    underlying technology.

    For one, the active data warehouse

    must perform considerably well on all

    parts of the workload. Not only must

    it manage and track a widely mixed

    workload under all conditions, but it

    must also be capable of managing the

    flow of changing data. Second, it must

    be possible to implement new applica-tions quickly and easily.

    Teradata 12.0 is the next major step

    toward making it easy for anyone to

    build a high-performance active

    data warehouse.

    QWhat changes deliver new levels

    of performance?

    A Partitioned primary index (PPI)is extended to multiple levels,statistics are enhanced to deliver better

    estimates, many query estimating and

    planning improvements have been

    added, and the explain plans are

    extended with information that

    previously had to be assumed.

    Q

    How does multi-level PPI

    (ML-PPI) work?

    A The current PPI capability allowsfor one function to determine thehorizontal partitioning of the table.With

    ML-PPI, those partitioning capabilities

    are extended to multiple levels. Separate

    functions within ML-PPI define the parti-

    tioning for each level, which can then be

    utilized together or independently for much

    more granular access to the desired data in

    the table.

    For example, an insurance company

    may partition first by state, then month.Questions may be asked by state, month

    or both, allowing the quantity of data

    accessed to better match the requested

    analysis. Reducing the data accessed reduces

    the I/O load on the system and improves

    overall system throughput.

    QHow is statistics-based estimation

    improved?

    A More and varied users can submita greater assortment of queries,making accurate estimation for query

    planning extremely important. Statistics

    are crucial for guiding the cost-based

    optimization of queries.

    The number of intervals in the statistics

    structure has been doubled, significantlyincreasing the detail, accuracy and under-

    standing of skew. Demographics of mul-

    tiple column statistics are improved with

    TeradataMagazin

    Teradata 12.0:Active performance

    A

    New release adds more value to your enterprise data warehouse.by Todd Walter with contributions from Alan Greenspan

    TECH2TECH ASK THE EXPERTS

    Benefits of multi-level partitioned primary index

    > Increases performance of multi-

    dimensional queries where the set of

    constrained dimensions changes from

    query to query

    >Enhances performance of queries withconstraints (including range con-

    straints) on multiple columns without

    the overhead of secondary indexes

    > Decreases the amount of I/O needed

    to execute a query

    > Provides an access method that

    allows multiple ways to access the

    data efficiently

    > Allows DBAs more partitioning expres-

    sions flexibility when there are multiplechoices; the DBA can define multiple

    partitioning expressions and obtain

    the benefits from each of them and

    in combination

    A.G.

    PAGE 1 | Teradata Magazine | September 2007 | 2007 Teradata Corporation | AR-5386

  • 7/31/2019 Teradata 12.0

    2/4

    a better picture of how nulls appear

    in the data.

    Q

    What happens if my statistics are

    not current?

    A Stale statistics can lead to invalidestimations that, in turn, can nega-tively affect query planning. Balancing the

    overhead of re-collecting statistics against

    the potential of inaccurate statistics often

    demands trade-offs. For example, with cer-

    tain columns (especially dates), it has been

    necessary to collect statistics frequently to

    get reasonable plans for queries at or near

    the end of the data range.

    Teradata 12.0 significantly reduces this

    overhead by extrapolating appropriate values

    when the collected statistics are not current,

    thus avoiding daily collection and reducing

    overall system utilization. To continue with

    the above scenario, when statistics in a date

    column have not been recently re-collected,

    the optimizer will determine how much

    additional data has been added to the table

    and will extend the statistics to account for

    the new date ranges. Query planning will

    improve even when the statistics are not

    collected every day. (See figure.)

    QWhat is new in query estimation

    and planning?

    A Query complexity continues to risewith many nested views, subqueries,derived tables and complex joins. The

    Teradata optimizer has been enhanced to

    capture additional information about each

    nested level in the query and then to carry

    that information throughout the planning

    process. Improvements have been made

    to the estimation of the cost and results

    of complex joins. New query rewrite

    rules have been added to address many

    customer-provided optimization oppor-

    tunities, and the rewrite engine can now

    execute in multiple passes, giving further

    opportunities to improve the plan. More

    accurate and consistent query plans will

    be produced as a result.

    QWhat changes have been made in

    the explain plans?

    A Size of spools, join and groupkey columns, and qualifiedreferenced object names in the query,

    along with additional information, are

    included in the explain plan output to

    provide more information to the usersreading the explain.

    QHow does Teradata improve the

    ability to manage the ever-increasing

    active workload?

    A All of the aforementioned queryestimation accuracy improvementspull double-duty by helping the accuracy

    of the workload management rules. More

    information can be provided by the user

    or application about each query. Work-

    load exceptions have been extended and

    a traffic cop has been added to automate

    changes to workload rules and settings

    based on user-defined or system events.

    An entirely new way of accessing infor-

    mation is being developed to provide

    additional management information t

    more data warehouse users.

    QWhat is a traffic cop?

    A New workload management profcan be implemented when systemconditions change or when a user-defin

    event such as the end of the load wind

    is reached. For example, if a system com

    nent has failed or the system is exceptiobusy, the traffic cop can switch rule set

    appropriately adjust the workflow base

    the current system conditions.

    QHow do users and applications pr

    more information about the work

    on the system?

    A Like banding a bird to track itsflight path, a query band can nbe provided with each query. The ba

    can contain any number of attribute

    and values that provide detailed info

    mation about the query and may be

    referenced by workload managemen

    rules during the query classification

    phase. The information is then captu

    in the query log, where it can be use

    Figure Extrapolated statistics reducecollection requirements

    Teradata 12.0 can extrapolate appropriate statistics values when collected statistics are not current.

    As this example shows, when a query references a date beyond the range with collected statistics, tdatabase will automatically extrapolate the collected statistics to create an accurate plan. This proce

    allows the reduction of frequent statistics collection while getting accurate cost-based optimizations

    PAGE 2 | Teradata Magazine | September 2007 | 2007 Teradata Corporation | AR-5386

  • 7/31/2019 Teradata 12.0

    3/4

    future analysis of the work flowing

    through the system.

    Query banding is especially valuable for

    applications that send work through pooled

    sessions, such as traditional session pool

    applications, analytic applications using

    pooled sessions, business intelligence (BI)

    tools and new Web service applications.

    Information useful for workload manage-

    ment, or for tracking the use of the data

    warehouse (such as the application, work unit,

    requesting user, etc.,) can be acquired through

    the query banding. After the application is

    adjusted to provide the information, all of

    the tracking and linking to workload man-

    agement is handled automatically.

    QWhat new workload exceptions

    are available?

    A System-level workload exceptions havebeen added in Teradata 12.0. Now asingle rule can be used to raise exceptions,

    regardless of the querys workload group.

    Multiple rules can be assigned to a single work-

    load group to allow multiple levels of control

    for requests that are not behaving as expected.

    QHow will systems management

    information be provided in the future?

    A Teradata 12.0 includes a new appli-cation programming interface (API)to make the system management informa-

    tion available via standard SQL interfaces.

    With this capability, data can be retrieved

    and control settings altered. This makes the

    information available through the standard

    ODBC and JDBC interfaces.

    On top of those APIs, a new system

    management user interface has been

    implemented. Using Web services, portal

    delivered displays and a fully extensible

    architecture, a fundamentally new form

    of management interface is provided to

    administrative and end users.

    QHow will the increasing flow of change

    to the data warehouse be handled?

    A The continuous flow of data needsto get into the database efficiently andthen must be backed up without affecting

    the flow. If availability and disaster recovery

    requirements lead to a dual system implemen-

    tation, the data must be synchronized with asystem in another data center.

    QWhat is new for the data acquisition

    and integration process?

    A Many implementations use extract,load and transform (ELT) processes,which perform the transformation and

    apply steps within the database, taking full

    advantage of the parallel data engine to do

    the work. Teradata 12.0 adds the ability to

    perform bulk merge (upsert), working from

    a table of change data and doing all the work

    within the database.

    Insert and merge operations have been

    enhanced with an option to log errors to an

    error table rather than initiating an abort. This

    means that less effort needs to be spent on get-

    ting the data perfect before applying it. Togeth-

    er, these two functions significantly improve

    the ELT capabilities for the warehouse.

    ELT using bulk SQL operations avoids

    constraints of the load utilities,allowing

    greater use of physical tuning techniques

    for active queries (unique secondary and

    join indexes) and allowing more use of

    active features (triggers, referential integrity).

    QHow do we perform backup

    without affecting the continuous

    loading processes?

    A Online Archive now allows backupsto be performed without stopping theload processes. It will automatically initiate a

    checkpoint, save a log of changes and back up

    the log as part of the backup. On a restore, thelog will automatically be restored and rolled

    back to the checkpoint ensuring that a consis-

    tent restore has been done.

    QHow will the increasing volume of data

    changes be synchronized?

    A Teradata Replication Services has beenupgraded to significantly increase thebandwidth per replication group. Greater load

    volumes and more tables can be handled by a

    single replication group.

    QWhat is new for building applications?

    A Teradatas extensibility strategy takesseveral steps forward in Teradata 12.0,

    TeradataMagazin

    TECH2TECH ASK THE EXPERTS

    Partitioned primary index is extended to multiple levels,

    statistics are enhanced to deliver better estimates, many

    query estimating and planning improvements have beenadded, and the explain plans are extended with information

    that previously had to be assumed.

    PAGE 3 | Teradata Magazine | September 2007 | 2007 Teradata Corporation | AR-5386

  • 7/31/2019 Teradata 12.0

    4/4

    allowing for more applications to be im-

    plemented closer to the data in the data

    warehouse. The extensibility functions will

    be used to deliver Teradata functionality as

    well; XML support and spatial data supportwill be the first examples.

    QAre we able to wr ite our applications

    in Java?

    A Stored procedures (SPs) can nowbe implemented in Java.An implemen-ter can choose the ANSI Stored Procedure

    Language, C/C++ or Java as the implemen-

    tation language for procedures to run in the

    database. Application developers who wish to

    implement their entire application in Java can

    write the database portion in Java as well.

    QWhen can we return result sets from

    Teradata stored procedures?

    A Teradata 12.0 includes the ability toreturn sets of result rows from a SQLstored procedure. This will allow SPs to per-

    form a wider range of application functions.

    QHow can we write C/C++ proce-

    dures that easily utilize data from

    the warehouse?

    A With a newly defined interface, itis easier to submit SQL to Teradatafrom within a C/C++ stored procedure.

    This makes it simpler to utilize data from

    the warehouse while also accessing external

    interfaces or libraries such as connecting

    to the message bus or linking to another

    database engine.

    Q

    What if we want to return more general

    result sets from a table function?

    A This release includes a new methodfor defining the result row from atable function that allows the result para-

    meters and data types at the invocation

    time of the function.

    A general-purpose function can now be

    written to operate upon a more general

    source of data and return data in the form

    desired by each request.

    QWhat else is included in Teradata 12.0?

    A Many additional features addressopportunities in scalability, secur-ity, unicode support, system tools (such as

    scandisk and checktable) and index wizard

    support for recommending PPIs for new

    or existing tables. To review the detailed

    release documentation for a complete view,

    visit Teradata.com/resources and select

    Technical Documentation.

    Activate your enterpriseTeradata has made major continuous

    progress in delivering the supporting

    technology for building an active data

    warehouse. Each step makes it easier and

    more automatic to deliver the endemic

    access to integrated data that every enterprise

    is demanding. Teradata 12.0 is the next major

    step in that evolution, making major progress

    in query planning, statistics collection, per-

    formance, workload management, system

    operations and extensibility.Installing and taking advantage of this tech-

    nology will position your Teradata platform

    for the easiest possible active data warehouse

    implementation or as an extension of your

    current implementation to provide more users

    the data they need to deliver the best possible

    value to your organization every day.T

    Todd Walter, CTO, Teradata Development

    Division, oversees R&D efforts for Teradata

    Database software and systems. He is also

    responsible for the future vision and devel-

    opment of the active data warehouse.

    Alan Greenspan is the product marketing

    manager for Teradata Database and

    load utilities. He has been with Teradata

    for 15 years.

    The next release of the core Teradat

    product set has a new name in add

    tion to a broad collection of exciting and

    innovative capabilities: Teradata 12.0. N

    all of the software products at the core o

    the data warehouse infrastructure will ha

    a consistent, unified numbering sequen

    Beginning with this release, the set of

    products that are designed, developed

    and tested together will have a single

    release number.

    Teradata selected the number 12 to s

    the unified numbering sequence becaus

    it is the 12th major database release in

    Teradatas history. This current release w

    preceded by five major database releas

    of Version 1 (TOS-based, Teradata

    Operating System), and six major datab

    releases of Version 2 (Open OS-based

    UNIX, Linux and Windows).

    Effective with the new release, the

    numbering system will be:

    > Teradata 12.0

    > Teradata Database 12.0

    > Teradata Tools and Utilities (TTU) 1

    Teradata Database 12.0 and Teradata

    Tools and Utilities 12.0 are included in thTeradata 12.0 release. The individual clie

    Tools and Utilities products will also adju

    to the 12.0 numbering sequence.

    The term Teradata Warehouse, init

    developed for use within the company,

    adopted externally to package releases

    and indicate system-level certification of

    the various software products with differ

    release numbers. This certification will

    continue as an important step in the dev

    opment cycle, but the unified numberin

    scheme will replace the need for a sepa

    rate warehouse package designation.

    Minor releases will continue to be

    numbered in the current method. For

    example, the next major release will be

    Teradata 13.0 and the minor release

    following it will be Teradata 13.1.

    A

    Teradata 12.0: The 12tmajor database release

    PAGE 4 | Teradata Magazine | September 2007 | 2007 Teradata Corporation | AR-5386