integration * intelligence * insight
Dynamic PartitioningDynamic Partitioning
integration * intelligence * insight
High Availability
Grid Computing
Dynamic Partitioning
AGENDA
integration * intelligence * insight
IntroductionIntroduction
PowerCenter Domains
PowerCenter introduces a service-oriented architecture
PowerCenter introduces a domain, which serves as the primary unit of administration for the PowerCenter environment.
A domain is a collection of nodes and services in the PowerCenter environment.
The first time you install Informatica Services, you create a domain and add a node to the domain.
integration * intelligence * insight
Administration Console Administration Console
• The Administration Console is a browser-based utility that enables you to view domain properties and perform basic domain administration tasks
• The Navigator displays the following types of objects:
• Domain. You can view one domain in the Administration Console
• Node. A node represents a machine in the domain.
• Grid. Create a grid to run the Integration Service on multiple nodes.
integration * intelligence * insight
Administration ConsoleAdministration Console
integration * intelligence * insight
Administration Console contd..Administration Console contd..
integration * intelligence * insight
Administration Console contd..Administration Console contd..
integration * intelligence * insight
High Availability High Availability
• High availability is the PowerCenter option that eliminates a single point of failure in the PowerCenter environment
• High availability provides the following functionality:
• Resilience.
• Failover.
• Recovery.
integration * intelligence * insight
The Partitioning Option The Partitioning Option
• The Partitioning Option increases PowerCenter’s performance through parallel data processing .
• When the Integration Service runs the session, it can achieve higher performance by partitioning the pipeline and performing the extract, transformation, and load for each partition in parallel.
• Partition Types :• Database partitioning.
• Hash auto-keys.
• Hash user keys.
• Key range.
• Pass-through .
• Round-robin.
integration * intelligence * insight
Configuring PartitioningConfiguring Partitioning
• Create or edit a session .
• Update partitioning information using the Partitions view on the Mapping tab of session properties.
• Add, delete, or edit partition points on the Partitions view of session properties .
integration * intelligence * insight
Configuring a Partition Point Configuring a Partition Point
• You can configure the following information when you edit or add a partition point:
• Specify the partition type at the partition point.
• Add and delete partitions.
• Enter a description for each partition.
integration * intelligence * insight
Hash user keysHash user keys
• The Integration Service uses a hash function to group rows of data among partitions .
• Improves the performance of the session , the hash function usually processes numerical data more quickly than string data.
• Specify a hash key for user hash key. • We have created a sample mapping when we don’t configure this
mapping(m_orders_scd3) for Partitioning then the run time comes up to 37 seconds
integration * intelligence * insight
Hash user keys contd..Hash user keys contd..
• using hash user key partition the run time comes up to 22 seconds to complete the
session as shown in the below figure.
integration * intelligence * insight
Key range partition Key range partition
• With key range partitioning, the Integration Service distributes rows of data based on a port.
• you define a range of values.
integration * intelligence * insight
Key range partition contd..Key range partition contd..
• using key range partition the run time comes up to 33 seconds to complete the session as shown in the below figure.
integration * intelligence * insight
Partition detailsPartition details
• Source/target statistics
integration * intelligence * insight
Hash auto-keysHash auto-keys
• Use hash auto-keys partitioning at or before Rank, Sorter, Joiner, and unsorted Aggregator transformations.
• The Integration Service distributes rows to each partition according to group before they enter the Sorter and Aggregator
transformations .
integration * intelligence * insight
Pass-Through Partition Type Pass-Through Partition Type
• In pass-through partitioning, the Integration Service processes data without redistributing rows among partitions.
• Increases data throughput , without increasing number of partitions.
integration * intelligence * insight
Round-Robin Partition Type Round-Robin Partition Type
• In round-robin partitioning, the Integration Service distributes rows of data evenly to all partitions .
• The session based on this mapping reads item information from three flat files of different sizes: • Source file 1: 80,000 rows• Source file 2: 5,000 rows• Source file 3: 15,000 rows• When the Integration Service reads the source data, the first partition begins processing 80% of the
data, the second partition processes 5% of the data, and the third partition processes 15% of the data.
• To distribute the workload more evenly, set a partition point at the Filter transformation and set the partition type to round-robin. The Integration Service distributes the data so that each partition processes approximately one-third of the data.
integration * intelligence * insight
Dynamic Partitioning Dynamic Partitioning
• If the volume of data grows or you add more CPUs, you might need to adjust partitioning so the session run time does not increase.
• When you use dynamic partitioning, you can configure the partition information so the Integration Service determines the number of partitions to create at run time.
• The Integration Service scales the number of session partitions at run time based on factors such as source database partitions or the number of nodes in a grid.
integration * intelligence * insight
Configuring Dynamic Partitioning Configuring Dynamic Partitioning
integration * intelligence * insight
Configuring Dynamic Partitioning contd..Configuring Dynamic Partitioning contd..
• Configure dynamic partitioning using one of the following methods:
• Disabled. Do not use dynamic partitioning. Defines the number of partitions on the Mapping tab.
• Based on number of partitions. Sets the partitions to a number that you define in the Number of Partitions attribute. Use the $DynamicPartitionCount session parameter, or enter a number greater than 1.
• Based on number of nodes in grid. Sets the partitions to the number of nodes in the grid running the session. If you configure this option for sessions that do not run on a grid, the session runs in one partition and logs a message in the session log.
• Based on source partitioning. Determines the number of partitions using database partition information. The number of partitions is the maximum of the number of partitions at the source.
integration * intelligence * insight
Based on number of partitionsBased on number of partitions
• Edit the task , go to config object tab. Set the dynamic partition as based on number of partitions, number of partitions 3.
integration * intelligence * insight
Based on number of partitions contd..Based on number of partitions contd..
• Using Dynamic partition the run time comes up to 32 seconds to complete the session as shown in the below figure.
integration * intelligence * insight
Partition detailsPartition details
• Source/target statistics
integration * intelligence * insight
Based on number of nodes in gridBased on number of nodes in grid
• Edit the task , go to config object tab. Set the dynamic partition as based on number of nodes in grid.
integration * intelligence * insight
Based on number of nodes in grid contd..Based on number of nodes in grid contd..
• Using Dynamic partition the run time comes up to 25 seconds to complete the session as shown in the below figure.
integration * intelligence * insight
Based on source partitioningBased on source partitioning
• Edit the task , go to config object tab. Set the dynamic partition as based on source partition
integration * intelligence * insight
Based on source partitioning contd..Based on source partitioning contd..
• Using this option Dynamic partition the run time comes up to 20 seconds to complete the session as shown in the below figure.
integration * intelligence * insight
Advantages of Dynamic Partition Advantages of Dynamic Partition
Session run time does not increase with volume of data grows or you add more CPUs.
Scales cost-effectively to handle large data volumes.
• Enhances developer productivity.
• Optimizes system performance in response to changing business requirements.
• Even though any system fails , session will be completed. ( grid computing).
integration * intelligence * insight
LIMITATIONS OF DYNAMIC PARTITION LIMITATIONS OF DYNAMIC PARTITION
• You cannot use dynamic partitioning with XML sources and targets.
• You cannot use dynamic partitioning with the Debugger.
integration * intelligence * insight
Thanks