Upload
pindiganti
View
23
Download
0
Embed Size (px)
DESCRIPTION
Teradata
Citation preview
TERADATA- DAY 1
Teradata IntroductionTeradata ArchitectureTypes of spacesTeradata Data Protection Mechanisms
Prepared ByAnilKumar P
Teradata Introduction:
What Is Teradata?
-Teradata is a relational database management system (RDBMS) that drives a company's data warehouse.
-The origin of the name Teradata is "tera-,“ derived from Greek which means "trillion.“
-Teradata was the first commercial database system to scale to and support a trillion bytes of data.
Teradata Advantages :
Single data store Scalability Unconditional parallelism (parallel architecture) Ability to model the business Mature, parallel-aware Optimizer
Single Data Store
Teradata acts as a single data store, with multiple client applications making inquiries against it concurrently
Scalability
Addition of new components to the system increases the performance linearly.
Adding components allows the system to accommodate increased workload without decreased throughput .
ComplexityTeradata is adept at complex data models that satisfy the information needs throughout an enterprise.
It has the ability to perform large aggregations during query run time and can perform up to 64 joins in a single query.
Concurrent UsersTeradata has the ability to handle from hundreds to thousands of users , who are often running multiple, complex queries on the system simultaneously.
Unconditional ParallelismTeradata’s ability to manage large amounts of data is accomplished using the concept of parallelism, wherein many individual processors perform smaller tasks concurrently to accomplish an operation against a hugerepository of data.
Teradata's parallelism does not depend on query tuning, limited data quantity, column range constraints, or specialized data models --Teradata has "unconditional parallelism."
Ability to Model the Business
It support all types of data model.
Mature, Parallel-Aware Optimizer
Teradata's Optimizer is the most robust in the industry, able to handle: Multiple complex queries 64 Joins per query Unlimited ad-hoc processingThe Optimizer is parallel-aware, meaning that it has knowledge of system components and determines the least expensive plan (time wise)to process queries fast and in parallel.
Teradata System :
A Teradata system contains one or more nodes where the processing occurs for the Teradata Database .
There are two types of Teradata systems:
Symmetric multiprocessing (SMP) - An SMP Teradata system has a single node that contains multiple CPUs sharing a memory pool.
Massively parallel processing (MPP) - Multiple SMP nodes working together comprise a larger, MPP implementation of Teradata. The nodes are connected using the BYNET, which allows multiple virtual processors on multiple nodes to communicate with each other.
BYNET
The BYNET is a high-speed interconnect that enables nodes in the systemto communicate. It has several unique features:
Scalable: Addition of nodes to the system, increases the system size without performance penalty -- and sometimes even increase performance.
High performance: An MPP system typically has two BYNET networks (BYNET 0 and BYNET 1). Because both networks in a system are active, the system benefits from having full use of the aggregate bandwidth of both the networks.
Fault tolerant: Each network has multiple connection paths. If the BYNET detects an unusable path in either network, it will automatically reconfigure that network so all messages avoid the unusable path.
Load balanced: Traffic is automatically and dynamically distributed between both BYNETs.
BYNET Hardware and Software
The BYNET hardware and software handle the communication between the vprocs and the nodes. Hardware: The nodes of an MPP system are connected with the BYNET hardware, consisting of BYNET boards and cables. Software: The BYNET software is installed on every node. This BYNET driver is an interface between the PDE software and the BYNET hardware.
Parallel Database Extensions (PDE)
The Parallel Database Extensions (PDE) software layer was added to the operating system to support the parallel software environment.
Teradata Architecture
Channel Driver
Channel Driver software is the means of communication between an application and the PEs assigned to channel-attached clients. Thereis one Channel Driver per node.
Teradata Gateway
Teradata Gateway software is the means of communication between an application and the PEs assigned to network-attached clients. There isone Teradata Gateway per node.
Basic components of Teradata Architecture:
The Parsing Engine Message Passing Layer Access Module Processor
Parsing Engine : A Parsing Engine (PE) is a vproc that manages the dialogue between a client application and the Teradata Database, once a valid session has been established. Each PE can support a maximum of 120 sessions. Components : 1. PARSER 2.OPTIMIZER 3. DISPATCHER
Message Passing Layer : •Carrying messages between the AMPs and PEs. •Point-to-Point, Multi-Cast, and Broadcast communications. •Merging answer sets back to the PE.
Access Module Processor (AMP) :The AMP is a virtual processor that controls its portion of the data on the system. The AMPs work in parallel, each AMP managing the data rows stored on its vdisk. AMPs are involved in data distribution and data access in different ways.
Finding the rows requested Lock management Sorting rows Aggregating columns Join processing Output conversion and formatting Creating answer set for client Disk space management Recovery processing
TERADATA Database or Users :
Database or user must created from DBC or existing DB Perm space must be extracted from immediate owner. Perm Space used only by Tables , Join Index or Stored procedure. Un used Perm is utilized by Temp/Spool space.
Teradata Database Spaces :
PERM – Permanent space Tables , Indexes , Sub Table , JOIN Indexes , Stored Procedure used PERM Space. PERM space deducted from Owner database.
CREATE DATABASE WMT_EDW FROM DBC AS PERM = 2000000 SPOOL = 5000000 NO FALLBACK NO AFTER JOURNAL DUAL BEFORE JOURNAL DEFAULT JOURNAL TABLE = WMT.journals ;
SPOOL – Working space Temporary working space to store intermediate query result/answers set. SELECT statement use spool space. Large number of non unique values , poor distribution of data or join on columns results in “Insufficient spool” error. Volatile and Derived table uses SPOOL space.
TEMP – Working spaceTEMP space is acquired by GTT (Global Temporary Tables) when it is materialized.
Data Protection:
-LOCKS-RAID-FALLBAK-JOURANLS-CLIQUE
Locks : We have 4 types of locks applied on three levels. -Database level-Table level-Row hash level
Types of Locks:-Exclusive locks-Write locks-Read locks-Access locks
RAID: Redundant Array of Inexpensive Disks (RAID) is a storage technology that provides data protection at the disk drive level.RAID 1 : Disk Mirror TechniqueRAID 5 : Parity Checking Method.
Fallback: Fallback is a Teradata feature that protects Data against AMP failure. Fallback uses groups of AMPs that provide data availability and consistency if an AMP is unavailable.
Clique :Set of SMPs/Nodes that share commonset of diskarrays.Provides protection from Node failure.If a node fails, all vprocs will migrate to the remaining nodes in the clique (VprocMigration).A clique can support up to 128 vprocs.
Journal:
TD Journals used for specific types of data recovery or process recovery.
1.Recovery Journals : -Automatically activated when AMP is taken offline.
2.Transaction Journal : A journal of Transaction "BEFOREIMAGE“, Automatic rollback in the EVENT of transaction Failure.
3.Permanent Journal : User specified , systemmaintainedjournal. Use for unexpected software and hardware Disaster.
Thanku