123
What is BI? OBIEE http://www.oracle.com/technology/documentation/bi_ee.html http://download.oracle.com/docs/cd/E10415_01/doc/nav/portal_booklist.htm http://zed.cisco.com/confluence/display/siebel/Home http://zed.cisco.com/confluence/display/siebel/Enterprise+Architecture+BI+Standards https://cisco.webex.com/ciscosales/lsr.php? AT=pb&SP=MC&rID=39544447&rKey=9C8D63F2C74ED9DA http://informatica.techtiks.com/ informatica_questions.html#RQ1 http://www.allinterview.com/showanswers/32477.html http://www.1keydata.com/datawarehousing/glossary.html http://www.forum9.com/ http://www.livestore.net/ http://www.kalaajkal.com/ BR100-High level requirement Document MD50-Functional Specification MD70-Technical specification .After approval of md70 we have to start development, ERMO-TCA,TCB,TCC and FPR, SIT,UAT. MD120-Deployment process—AIM methodology Page 1 of 123 TCS Internal

Bi Concepts

Embed Size (px)

Citation preview

Page 1: Bi Concepts

What is BI?

OBIEE

http://www.oracle.com/technology/documentation/bi_ee.html 

http://download.oracle.com/docs/cd/E10415_01/doc/nav/portal_booklist.htm

http://zed.cisco.com/confluence/display/siebel/Home

http://zed.cisco.com/confluence/display/siebel/Enterprise+Architecture+BI+Standards

https://cisco.webex.com/ciscosales/lsr.php?AT=pb&SP=MC&rID=39544447&rKey=9C8D63F2C74ED9DA

http://informatica.techtiks.com/informatica_questions.html#RQ1

http://www.allinterview.com/showanswers/32477.htmlhttp://www.1keydata.com/datawarehousing/glossary.htmlhttp://www.forum9.com/http://www.livestore.net/http://www.kalaajkal.com/

BR100-High level requirement DocumentMD50-Functional SpecificationMD70-Technical specification .After approval of md70 we have to start development, ERMO-TCA,TCB,TCC and FPR, SIT,UAT.MD120-Deployment process—AIM methodology

Iteration – the solution is delivered in short iterations, with each cycle adding more business value and implementing requested changes.

10 Key Principles of Agile Software Development, and how it fundamentally differs from a more

traditional waterfall approach to software development, are as follows:

Page 1 of 93TCS Internal

Page 2: Bi Concepts

1. Active user involvement is imperative

2. The team must be empowered to make decisions

3. Requirements evolve but the timescale is fixed

4. Capture requirements at a high level; lightweight & visual

5. Develop small, incremental releases and iterate

6. Focus on frequent delivery of products

7. Complete each feature before moving on to the next

8. Apply the 80/20 rule

9. Testing is integrated throughout the project lifecycle – test early and often

10. A collaborative & cooperative approach between all stakeholders is essential

Unix

http://www.sikh-history.com/computers/unix/commands.html#catcommand

Cat file1cat file1 file2 > all -----it will combined (it will create file if it doesn’t exit)cat file1 >> file2---it will append to file 2

o > will redirect output from standard out (screen) to file or printer or whatever you like.

o >> Filename will append at the end of a file called filename. o < will redirect input to a process or command.

Below line is the first line of the script

#!/usr/bin/sh

Or

#!/bin/ksh

What does #! /bin/sh mean in a shell script?

It actually tells the script to which interpreter to refer. As you know, bash shell has some specific functions that other shell does not have and vice-versa. Same way is for perl, python and other languages.

It's to tell your shell what shell to you in executing the following statements in your shell script.

how to find all processes that are running

Page 2 of 93TCS Internal

Page 3: Bi Concepts

ps -A

Crontab command. Crontab command is used to schedule jobs. You must have permission to run this command by Unix Administrator. Jobs are scheduled in five numbers, as follows.

Minutes (0-59) Hour (0-23) Day of month (1-31) month (1-12) Day of week (0-6) (0 is Sunday) so for example you want to schedule a job which runs from script named backup jobs in /usr/local/bin directory on sunday (day 0) at 11.25 (22:25) on 15th of month. The entry in crontab file will be. * represents all values.

25 22 15 * 0 /usr/local/bin/backup_jobs

The * here tells system to run this each month. Syntax is crontab file So a create a file with the scheduled jobs as above and then typecrontab filename .This will scheduled the jobs.

Below cmd gives total no of users logged in at this time.who | wc -lecho "are total number of people logged in at this time."

Below cmd will display only directories

$ ls -l | grep '^d'

Pipes:

The pipe symbol "|" is used to direct the output of one command to the input of another.

Moving, renaming, and copying files:

cp file1 file2 copy a filemv file1 newname move or rename a filemv file1 ~/AAA/ move file1 into sub-directory AAA in your home directory.rm file1 [file2 ...] remove or delete a file

Viewing and editing files:

Page 3 of 93TCS Internal

Page 4: Bi Concepts

cat filename Dump a file to the screen in ascii. head filename Show the first few lines of a file.head -n filename Show the first n lines of a file.tail filename Show the last few lines of a file.tail -n filename Show the last n lines of a file.

Searching for files : The find command

find . -name aaa.txt Finds all the files named aaa.txt in the current directory or any subdirectory tree. find / -name vimrc Find all the files named 'vimrc' anywhere on the system. find /usr/local/games -name "*xpilot*" Find all files whose names contain the string 'xpilot' which exist within the '/usr/local/games' directory tree.

You can find out what shell you are using by the command:

echo $SHELL

If file exists then send email with attachment.

if [[ -f $your_file ]]; thenuuencode $your_file $your_file|mailx -s "$your_file exists..." your_email_addressfi

Interactive History

A feature of bash and tcsh (and sometimes others) you can use the up-arrow keys to access your previous commands, edit them, and re-execute them.

Basics of the vi editor

Opening a filevi filename

Creating text Edit modes: These keys enter editing modes and type in the textof your document.

i Insert before current cursor positionI Insert at beginning of current line

Page 4 of 93TCS Internal

Page 5: Bi Concepts

a Insert (append) after current cursor positionA Append to end of liner Replace 1 characterR Replace mode<ESC> Terminate insertion or overwrite mode

Deletion of text

x Delete single characterdd Delete current line and put in buffer

:w Write the current file.:w new.file Write the file to the name 'new.file'.:w! existing.file Overwrite an existing file with the file currently being edited. :wq Write the file and quit.:q Quit.:q! Quit with no changes.

Business Intelligence refers to a set of methods and techniques that are used by organizations for tactical and strategic decision making. It leverages methods and technologies that focus on counts, statistics and business objectives to improve business performance.

The objective of Business Intelligence is to better understand customers and improve customer service, make the supply and distribution chain more efficient, and to identify and address business problems and opportunities quickly.

Warehouse is used for high level data analysis purpose.It is used for predictions, timeseries analysis, financial Analysis, what -if simulations etc. Basically it is used for better decision making.

OLTP is NOT used for analysis purpose.It is used for transaction and data processing.Its basically used for storing day-to-day transactions that take place in an organisation.The main focus of OLTP is easy and fast inputing of data, While the main focus in data warehouse is easy retrieval of Data.OLTP doesnt store historical data.(this is the reason why it cant be used for analysis)

Page 5 of 93TCS Internal

Page 6: Bi Concepts

DW stores historical data.

What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile collection of data in support of decision making".

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/department level.

Subject Oriented:

Data that gives information about a particular subject instead of about a company's ongoing operations.

Integrated:

Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.

Time-variant:

All data in the data warehouse is identified with a particular time period.

Non-volatile

Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Informatica Transformations:

Mapping: Mapping is the Informatica Object which contains set of transformations including source and target. Its look like pipeline.Session: A session is a set of instructions that tells informatica Server how to move data from sources to targets.WorkFlow: A workflow is a set of instructions that tells Informatica Server how to execute tasks such as sessions, email notifications and commands. In a workflow multiple sessions can be included to run in parallel or sequential manner.Source Definition: The Source Definition is used to logically represent database table or Flat files.

Page 6 of 93TCS Internal

Page 7: Bi Concepts

Target Definition: The Target Definition is used to logically represent a database table or file in the Data Warehouse / Data Mart.Aggregator: The Aggregator transformation is used to perform Aggregate calculations on group basis. Expression: The Expression transformation is used to perform the arithmetic calculation on row by row basis and also used to convert string to integer vis and concatenate two columns.Filter: The Filter transformation is used to filter the data based on single condition and pass through next transformation. Router: The router transformation is used to route the data based on multiple conditions and pass through next transformations.It has three groups1) Input group2) User defined group3) Default groupJoiner: The Joiner transformation is used to join two sources residing in different databases or different locations like flat file and oracle sources or two relational tables existing in different databases.Source Qualifier: The Source Qualifier transformation is used to describe in SQL the method by which data is to be retrieved from a source application system and also used to join two relational sources residing in same databases.What is Incremental Aggregation?A. Whenever a session is created for a mapping Aggregate Transformation, the session option for Incremental Aggregation can be enabled. When PowerCenter performs incremental aggregation, it passes new source data through the mapping and uses historical cache data to perform new aggregation calculations incrementally.

Lookup: Lookup transformation is used in a mapping to look up data in a flat file or a relational table, view, or synonym. Two types of lookups:

1) Connected 2) Unconnected Connected Unconnected

Which is connected to Pipeline and receives the input values from pipeline

Which is not connected to pipeline and Receives input values from the result of a: LKP expression in another transformation via arguments.

We can’t use this lookup More than once in a mapping.

We can use this transformation more than once within the mapping

We can return multiple columns from the same row.

Designate one return port (R). Returns one column from each row.

We can configure to use dynamic cache We can’t configure to use dynamic cachePass multiple output values to another transformation. Link lookup/output ports to

Pass one output value to another transformation. The lookup/output/return

Page 7 of 93TCS Internal

Page 8: Bi Concepts

another transformation. port passes the value to the transformation calling: LKP expression.

Lookup Caches:

When configuring a lookup cache, you can specify any of the following options:

Persistent cache Recache from lookup source Static cache Dynamic cache Shared cache

Dynamic cache: When you use a dynamic cache, the PowerCenter Server updates the lookup cache as it passes rows to the target.If you configure a Lookup transformation to use a dynamic cache, you can only use the equality operator (=) in the lookup condition.NewLookupRow Port will enable automatically.

NewLookupRow Value Description

0 The PowerCenter Server does not update or insert the row in the cache.

1 The PowerCenter Server inserts the row into the cache.

2 The PowerCenter Server updates the row in the cache.

Static cache: It is a default cache; the PowerCenter Server doesn’t update the lookup cache as it passes rows to the target.Persistent cache: If the lookup table does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The PowerCenter Server then saves and reuses cache files from session to session, eliminating the time required to read the lookup table.

Normalizer: The Normalizer transformation is used to generate multiple records from a single record and transform structured data (such as COBOL or flat files) into relational dataRank: The Rank transformation allows you to select only the top or bottom rank of data. You can use a Rank transformation to return the largest or smallest numeric value in a port or group. The Designer automatically creates a RANKINDEX port for each Rank transformation.Sequence Generator: The Sequence Generator transformation is used to generate numeric key values in sequential order.

Page 8 of 93TCS Internal

Page 9: Bi Concepts

Stored Procedure: The Stored Procedure transformation is used to execute externally stored database procedures and functions. It is used to perform the database level operations.Sorter: The Sorter transformation is used to sort data in ascending or descending order according to a specified sort key. You can also configure the Sorter transformation for case-sensitive sorting, and specify whether the output rows should be distinct. The Sorter transformation is an active transformation. It must be connected to the data flow.

Union Transformation:The Union transformation is a multiple input group transformation that you can use to merge data from multiple pipelines or pipeline branches into one pipeline branch. It merges data from multiple sources similar to the UNION ALL SQL statement to combine the results from two or more SQL statements. Similar to the UNION ALL statement, the Union transformation does not remove duplicate rows.Input groups should have similar structure.

Update Strategy: The Update Strategy transformation is used to indicate the DML statement. We can implement update strategy in two levels:1) Mapping level2) Session level.Session level properties will override the mapping level properties.

Mapplet:

Mapplet is a set of reusable transformations. We can use this mapplet in any mapping within the Folder.

A mapplet can be active or passive depending on the transformations in the mapplet. Active mapplets contain one or more active transformations. Passive mapplets contain only passive transformations.

When you add transformations to a mapplet, keep the following restrictions in mind:

If you use a Sequence Generator transformation, you must use a reusable Sequence Generator transformation.

If you use a Stored Procedure transformation, you must configure the Stored Procedure Type to be Normal.

You cannot include the following objects in a mapplet: o Normalizer transformations o COBOL sources o XML Source Qualifier transformations o XML sources o Target definitions o Other mapplets

Page 9 of 93TCS Internal

Page 10: Bi Concepts

The mapplet contains Input transformations and/or source definitions with at least one port connected to a transformation in the mapplet.

The mapplet contains at least one Output transformation with at least one port connected to a transformation in the mapplet.

Input Transformation: Input transformations are used to create a logical interface to a mapplet in order to allow data to pass into the mapplet.Output Transformation: Output transformations are used to create a logical interface from a mapplet in order to allow data to pass out of a mapplet.

System Variables

$$$SessStartTime returns the initial system date value on the machine hosting the Integration Service when the server initializes a session. $$$SessStartTime returns the session start time as a string value. The format of the string depends on the database you are using.

Adventage of Teradata: 1. Can store Billions of rows. 2. parallel processing makes teradata faster than other RDBMS3. Can be accessed by network attached and channel attached system4. supports the requirements from diverse clients5. automatically detects and recovers from hardware failure6. Allows expansion without sacrifice performance.

Datawarehouse - Concepts

Beginners

1. What is a Data Warehouse?

Data Warehouse is a "Subject-Oriented, Integrated, Time-Variant Nonvolatile collection of data in support of decision making".

2. What is a DataMart?

Datamart is usually sponsored at the department level and developed with a specific issue or subject in mind, a Data Mart is a data warehouse with a focused objective.

3. What is Data Mining?

Data Mining is an analytic process designed to explore hidden consistent patterns, trends and associations with in data stored in a data warehouse or other large databases.

Page 10 of 93TCS Internal

Page 11: Bi Concepts

4. What do you mean by Dimension Attributes?

The Dimension Attributes are the various columns in a dimension table.

For example , attributes in a PRODUCT dimension can be product category, product type etc.

Generally the Dimension Attributes are used in query filter condition and to display other related information about an dimension.

5. What is the difference between a data warehouse and a data mart?

In terms of design data warehouse and data mart are almost the same.

In general a Data Warehouse is used on an enterprise level and a Data Marts is used on a business division/department level.

A data mart only contains data specific to a particular subject areas.

6. What is the difference between OLAP, ROLAP, MOLAP and HOLAP?

ROLAP stands for Relational OLAP. Cencentually data is organized in cubes with dimensions.

MOLAP stands for Multidimensional OLAP. Cencentually data is organized in cubes with dimensions

HOLAP stands for Hybrid OLAP, it is a combination of both worlds.

7. What is a star schema?

Star schema is a data warehouse schema where there is only one "fact table" and many denormalized dimension tables.

Fact table contains primary keys from all the dimension tables and other numeric columns columns of additive, numeric facts.

8. What does it mean by grain of the star schema?

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general, the grain of the fact table is the grain of the star schema.

9. What is a snowflake schema?

Page 11 of 93TCS Internal

Page 12: Bi Concepts

Unlike Star-Schema, Snowflake schema contain normalized dimension tables in a tree like structure with many nesting levels.

Snowflake schema is easier to maintain but queries require more joins.

10. What is a surrogate key?

A surrogate key is a substitution for the natural primary key. It is a unique identifier or number ( normally created by a database sequence generator ) for each record of a dimension table that can be used for the primary key to the table.

A surrogate key is useful because natural keys may change.

11. What oracle tools are available to design and build a data warehosue/data mart?

Data Warehouse Builder,

Oracle Designer,

Oracle Express,

Express Objects etc.

12. What is a Cube?

A multi-dimensional representation of data in which the cells contain measures (i.e. facts) and the edges represent data dimensions by which the data can be sliced and diced

For example:

A SALES cube can have PROFIT and COMMISSION measures and TIME, ITEM and REGION dimensions

13. What does ETL stand for?

ETL stands for "Extract, Transform and Load".

ETL tools are used to pull data from a database, transform the data so that it is compatible with a second database ( datawarehouse or datamart) and then load the data.

14. What is Aggregation?

Page 12 of 93TCS Internal

Page 13: Bi Concepts

In a data warehouse paradigm "aggregation" is one way of improving query performance. An aggregate fact table is a new table created off of an existing fact table by summing up facts for a set of associated dimension. Grain of an aggregate fact is higher than the fact table. Aggreagate tables contain fewer rows thus making quesries run faster.

15. what is Business Intelligence?

Business Intelligence is a term introduced by Howard Dresner of Gartner Group in 1989. He described Business Intelligence as a set of concepts and methodologies to improve decision making in business through use of facts and fact based systems.

16. What is transitive dependency?

When a non-key attribute identifys the value of another non-key atribute then the table is set to contain transitive dependecncy.

17. what is the current version of informatica?

The current version of informatica is 8.6

18. What are the tools in informatica?Why we are using that tools?

Powermart and powercenter are the popular tools.

Powercenter is generally used in production environment.

powermart is generally used in developement environment

19. What is a transformation?

It is a function (or) object that process (or)transforms the data.

Designer tool provides a set of transformations for different data transformations.

Transformations such as filter,source qualifior,expression,aggrigator,joiner etc.

20. What is a mapping?

It defines the flow of data from source to the target.

Page 13 of 93TCS Internal

Page 14: Bi Concepts

It also contains different rules to be applied on the data before the data get loaded to the target.

21. what is fact less fact table?

a fact table that contains only primary keys from the dimension tables, and that do not contain any

measures that type of fact table is called fact less fact table

22. What is a Schema?

Graphical Representation of the datastructure.

First Phase in implementation of Universe

23. What is A Context?

A Method by which the designer can decide which path to choose when more than one path is possible from one table to another in a Universe

24. What is a Bomain key?

A file that contains the address of the repository's security domain.

Advanced

25. Who are the Data Stewards and whats their role?

Data Stewards is a group of experienced people who are responsible for planning , defining business processes and setting directions. Data Stewards are familiar with the organization data quality , data issues and overall business processes.

26. What are the most important features of a data warehouse?

DRILL DOWN, DRILL ACROSS and TIME HANDLING

To be able to drill down/drill across is the most basic requirement of an end user in a data warehouse. Drilling down most directly addresses the natural end-user need to see more detail in an result. Drill down should be as generic as possible becuase there is absolutely no good way to predict users drill-down path.

Page 14 of 93TCS Internal

Page 15: Bi Concepts

27. What the easiest way to build a corporate specific time dimension?

Unlike most dimensions "Time dimension" do not change. You can populate it once and use for years.

So the easiest way is to use spread-sheet.

28. What is a Real-Time Data Warehouse - RTDW?

Real Time Data warehous is an analytic component of an enterprise level data stream that supports continuous, asynchronous, multi-point delivery of data.

In a RTDW data moves straight from the source systems to decision makers without any form for staging.

29. What is Slowly Changing Dimension?

Slowly changing dimensions refers to the change in dimensional attributes over time.

An example of slowly changing dimension is a product dimension where attributes of a given product change over time, due to change in component or ingredients or packaging details.

There are three main techniques for handling slowly changing dimensions in a data warehouse:

Type 1: Overwriting. No History maintained . The new record replaces the original record.

Type2: Creating another dimension record with time stamps.

Type3: Creating a current value field. The original record is modified to reflect the change.

Each technique handles the problem differently. The designer chooses among these techniques depending on the company's need to preserve an accurate history of the dimensional changes.

30. What is a Conformed Dimension?

Confirmed dimension is a dimension modelling technique promoted by Ralph Kimball.

A Conformed Dimension is a dimension that has a single meaning and content throughout a datawarehouse. A confirmed dimension can be used in any star schema. For example Time/Calendar dimension is normally used in all star schemas so can be designed once and used with many fact tables across a data warehouse.

Page 15 of 93TCS Internal

Page 16: Bi Concepts

31. What is TL9000?

TL9000 is a quality measurement system, determined by the QuEST Forum to be vital to an organization's success. It offers a telecommunications-specific set of requirements based on ISO 9001; it defines the quality system requirements for design, development, production, delivery, installation and maintenance of telecommunication products and services.

Difference between 7.x and 8.x

Power Center 7.X Architecture.

Page 16 of 93TCS Internal

Page 17: Bi Concepts

Power Center 8.X Architecture.

Page 17 of 93TCS Internal

Page 18: Bi Concepts

Page 18 of 93TCS Internal

Page 19: Bi Concepts

Developer Changes: Java Transformation Added in 8.x

• For example, in PowerCenter:• PowerCenter Server has become a service, the Integration Service

• No more Repository Server, but PowerCenter includes a Repository Service

• Client applications are the same, but work on top of the new services framework

below are the difference between 7.1 and 8.1 of infa..

1)powercenter connect for sap netweaver bw option 2)sql transformation is added 3)service oriented architecture 4)grid concept is additional feature 5) random file name can genaratation in target 6) command line programms: infacmd and infasetup new commands were added. 7) java transformation is added feature 8)concurrent cache creation and faster index building are additional feature in lookup transformation 9) caches or automatic u dont need to allocate at transformation level 10) push down optimization techniques,some 11)we can append data into the flat file target.

Page 19 of 93TCS Internal

Page 20: Bi Concepts

1) the diff btw 8.1 and 8.5 is we can find push down operation in mapping wch gives more flexible performance tunning.

Pushdown optimization

A session option that allows you to push transformation logic to the source or target database.

GRID

Effective in version 8.0, you create and configure a grid in the Administration Console. You configure a grid to run on multiple nodes, and you configure one Integration Service to run on the grid. The Integration Service runs processes on the nodes in the grid to distribute workflows and sessions. In addition to running a workflow on a grid, you can now run a session on a grid. When you run a session or workflow on a grid, one service process runs on each available node in the grid.

Integration Service (IS)The key functions of IS are

Interpretation of the workflow and mapping metadata from the repository. Execution of the instructions in the metadata Manages the data from source system to target system within the memory and

disk The main three components of Integration Service which enable data movement are,

Integration Service Process Load Balancer Data Transformation Manager 

6.1 Integration Service Process (ISP) 

The Integration Service starts one or more Integration Service processes to run and monitor workflows. When we run a workflow, the ISP starts and locks the workflow, runs the workflow tasks, and starts the process to run sessions. The functions of the Integration Service Process are,

Locks and reads the workflow Manages workflow scheduling, ie, maintains session dependency Reads the workflow parameter file Creates the workflow log Runs workflow tasks and evaluates the conditional links Starts the DTM process to run the session Writes historical run information to the repository Sends post-session emails

6.2 Load Balancer

The Load Balancer dispatches tasks to achieve optimal performance. It dispatches tasks to a single node or across the nodes in a grid after performing a sequence of steps. Before

Page 20 of 93TCS Internal

Page 21: Bi Concepts

understanding these steps we have to know about Resources, Resource Provision Thresholds, Dispatch mode and Service levels

Resources – we can configure the Integration Service to check the resources available on each node and match them with the resources required to run the task. For example, if a session uses an SAP source, the Load Balancer dispatches the session only to nodes where the SAP client is installed

Three Resource Provision Thresholds, The maximum number of runnable threads waiting for CPU resources on the node called Maximum CPU Run Queue Length. The maximum percentage of virtual memory allocated on the node relative to the total physical memory size called Maximum Memory %. The maximum number of running Session and Command tasks allowed for each Integration Service process running on the node called Maximum Processes

Three Dispatch mode’s – Round-Robin: The Load Balancer dispatches tasks to available nodes in a round-robin fashion after checking the “Maximum Process” threshold. Metric-based: Checks all the three resource provision thresholds and dispatches tasks in round robin fashion. Adaptive: Checks all the three resource provision thresholds and also ranks nodes according to current CPU availability

Service Levels establishes priority among tasks that are waiting to be dispatched, the three components of service levels are Name, Dispatch Priority and Maximum dispatch wait time. “Maximum dispatch wait time” is the amount of time a task can wait in queue and this ensures no task waits forever

 A .Dispatching Tasks on a node1. The Load Balancer checks different resource provision thresholds on the node

depending on the Dispatch mode set. If dispatching the task causes any threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later

2. The Load Balancer dispatches all tasks to the node that runs the master Integration Service process

B. Dispatching Tasks on a grid,1. The Load Balancer verifies which nodes are currently running and enabled 2. The Load Balancer identifies nodes that have the PowerCenter resources required

by the tasks in the workflow 3. The Load Balancer verifies that the resource provision thresholds on each

candidate node are not exceeded. If dispatching the task causes a threshold to be exceeded, the Load Balancer places the task in the dispatch queue, and it dispatches the task later

4. The Load Balancer selects a node based on the dispatch mode 

6.3 Data Transformation Manager (DTM) Process

When the workflow reaches a session, the Integration Service Process starts the DTM process. The DTM is the process associated with the session task. The DTM process performs the following tasks:

Retrieves and validates session information from the repository. Validates source and target code pages. Verifies connection object permissions.

Page 21 of 93TCS Internal

Page 22: Bi Concepts

Performs pushdown optimization when the session is configured for pushdown optimization.

Adds partitions to the session when the session is configured for dynamic partitioning.

Expands the service process variables, session parameters, and mapping variables and parameters.

Creates the session log. Runs pre-session shell commands, stored procedures, and SQL. Sends a request to start worker DTM processes on other nodes when the session is

configured to run on a grid. Creates and runs mapping, reader, writer, and transformation threads to extract,

transform, and load data Runs post-session stored procedures, SQL, and shell commands and sends post-

session email

After the session is complete, reports execution result to ISP 

Pictorial Representation of Workflow execution:

1. A PowerCenter Client request IS to start workflow

2. IS starts ISP

3. ISP consults LB to select node

4. ISP starts DTM in node selected by LB

DWH ARCHITECTURE

Page 22 of 93TCS Internal

Page 23: Bi Concepts

Granularity

Principle: create fact tables with the most granular data possible to support analysis of the business process.

In Data warehousing grain refers to the level of detail available in a given fact table as well as to the level of detail provided by a star schema.

It is usually given as the number of records per key within the table. In general, the grain of the fact table is the grain of the star schema.

 Facts: facts must be consistent with the grain ... all facts are at a uniform grain

watch for facts of mixed granularity total sales for day & montly total

dimensions: each dimension associated with fact table must take on a single value for each fact row

each dimension attribute must take on one value outriggers are the exception, not the rule

Page 23 of 93TCS Internal

Page 24: Bi Concepts

What is DM?

DM is a logical design technique that seeks to present the data in a standard, intuitive framework that allows for high-performance access. It is inherently dimensional, and it adheres to a discipline that uses the relational model with some important restrictions. Every dimensional model is composed of one table with a multipart key, called the fact table, and a set of smaller tables called dimension tables. Each dimension table has a single-part primary key that corresponds exactly to one of the components of the multipart key in the fact table.

Page 24 of 93TCS Internal

Page 25: Bi Concepts

What is Conformed Dimension?

Conformed Dimensions (CD): these dimensions are something that is built once in your model and can be reused multiple times with different fact tables. For example, consider a model containing multiple fact tables, representing different data marts. Now look for a dimension that is common to these facts tables. In this example let’s consider that the product dimension is common and hence can be reused by creating short cuts and joining the different fact tables.Some of the examples are time dimension, customer dimensions, product dimension.

What is Junk Dimension?

A "junk" dimension is a collection of random transactional codes, flags and/or text attributes that are unrelated to any particular dimension. The junk dimension is simply a structure that provides a convenient place to store the junk attributes. A good example would be a trade fact in a company that brokers equity trades.

When you consolidate lots of small dimensions and instead of having 100s of small dimensions, that will have few records in them, cluttering your database with these mini ‘identifier’ tables, all records from all these small dimension tables are loaded into ONE dimension table and we call this dimension table Junk dimension table. (Since we are storing all the junk in this one table) For example: a company might have handful of manufacture plants, handful of order types, and so on, so forth, and we can consolidate them in one dimension table called junked dimension table

It’s a dimension table which is used to keep junk attributes

What is De Generated Dimension?

An item that is in the fact table but is stripped off of its description, because the description belongs in dimension table, is referred to as Degenerated Dimension. Since it looks like dimension, but is really in fact table and has been degenerated of its description, hence is called degenerated dimension. Now coming to the slowly changing dimensions (SCD) and Slowly Growing Dimensions (SGD): I would like to classify them to be more of an attributes of dimensions its self.

Degenerated Dimension: a dimension which is located in fact table known as Degenerated dimesion

What is Data Mart?

A Data Mart is a subset of data from a Data Warehouse.  Data Marts are built for specific user groups.  They contain a subset of rows and columns that are of interest to the

Page 25 of 93TCS Internal

Page 26: Bi Concepts

particular audience.   By providing decision makers with only a subset of the data from the Data Warehouse, privacy, performance and clarity objectives can be attained.

What is Data Warehouse?

A Data Warehouse (DW) is simply an integrated consolidation of data from a variety of sources that is specially designed to support strategic and tactical decision making.  The main objective of a Data Warehouse is to provide an integrated environment and coherent picture of the business at a point in time. 

What is Fact Table?

A Fact Table in a dimensional model consists of one or more numeric facts of importance to a business.  Examples of facts are as follows:

1. the number of products sold 2. the value of products sold 3. the number of products produced 4. the number of service calls received

Businesses have a need to monitor these "facts" closely and to sum them using different "dimensions". For example, a business might find the following information useful:

1. the value of products sold this quarter versus last quarter 2. the value of products sold by store 3. the value of products sold by channel (e.g. phone, Internet, in-store shopping) 4. the value of products sold by product (e.g. blue widgets, red widgets)

Businesses will often need to sum facts by multiple dimensions:

1. the value of products sold store, by product type and by day of week 2. the value of products sold by product and by channel

In addition to numeric facts, fact table contain the "keys" of each of the dimensions that related to that fact (e.g Customer Nbr, Product ID, Store Nbr).  Details about the dimensions (e.g customer name, customer address) are stored in the dimension table (i.e. customer)

What is Fact and Dimension?

A "fact" is a numeric value that a business wishes to count or sum.  A "dimension" is essentially an entry point for getting at the facts.Dimensions are things of interest to the business.

A set of level properties that describe a specific aspect of a business, used for analyzing the factual measures

Page 26 of 93TCS Internal

Page 27: Bi Concepts

What is Factless Fact Table?

Factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information.

Common examples of factless fact tables include:

Identifying product promotion events (to determine promoted products that didn’t sell)

Tracking student attendance or registration events

Tracking insurance-related accident events

Identifying building, facility, and equipment schedules for a hospital or University"

Types of facts?

There are three types of facts:

Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.

Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table, but not the others.

Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

What is incremental Aggrgation?

When using incremental aggregation, you apply captured changes in the source to aggregate calculations in a session. If the source changes only incrementally and you can capture changes, you can configure the session to process only those changes. This allows the PowerCenter Server to update your target incrementally, rather than forcing it to process the entire source and recalculate the same data each time you run the session.

For example, you might have a session using a source that receives new data every day. You can capture those incremental changes because you have added a filter condition to the mapping that removes pre-existing data from the flow of data. You then enable incremental aggregation.

When the session runs with incremental aggregation enabled for the first time on March 1, you use the entire source. This allows the PowerCenter Server to read and store the necessary aggregate data. On March 2, when you run the session again, you filter out all the records except those time-stamped March 2. The PowerCenter Server then processes only the new data and updates the target accordingly.

Consider using incremental aggregation in the following circumstances:

Page 27 of 93TCS Internal

Page 28: Bi Concepts

You can capture new source data. Use incremental aggregation when you can capture new source data each time you run the session. Use a Stored Procedure or Filter transformation to process only new data.

Incremental changes do not significantly change the target. Use incremental aggregation when the changes do not significantly change the target. If processing the incrementally changed source alters more than half the existing target, the session may not benefit from using incremental aggregation. In this case, drop the table and re-create the target with complete source data.

Note: Do not use incremental aggregation if your mapping contains percentile or median functions. The PowerCenter Server uses system memory to process Percentile and Median functions in addition to the cache memory you configure in the session property sheet. As a result, the PowerCenter Server does not store incremental aggregation values for Percentile and Median functions in disk caches.

Normalization:

Some Oracle databases were modeled according to the rules of normalization that were intended to eliminate redundancy.

Obviously, the rules of normalization are required to understand your relationships and functional dependencies

First Normal Form:

A row is in first normal form (1NF) if all underlying domains contain atomic values only. Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique

column or set of columns (the primary key).

Second Normal Form:An entity is in Second Normal Form (2NF) when it meets the requirement of being in First Normal Form (1NF) and additionally:

Does not have a composite primary key. Meaning that the primary key can not be subdivided into separate logical entities.

All the non-key columns are functionally dependent on the entire primary key. A row is in second normal form if, and only if, it is in first normal form and every non-key

attribute is fully dependent on the key. 2NF eliminates functional dependencies on a partial key by putting the fields in a

separate table from those that are dependent on the whole key. An example is resolving many: many relationships using an intersecting entity.

Third Normal Form:An entity is in Third Normal Form (3NF) when it meets the requirement of being in Second Normal Form (2NF) and additionally:

Functional dependencies on non-key fields are eliminated by putting them in a separate table. At this level, all non-key fields are dependent on the primary key.

Page 28 of 93TCS Internal

Page 29: Bi Concepts

A row is in third normal form if and only if it is in second normal form and if attributes that do not contribute to a description of the primary key are move into a separate table. An example is creating look-up tables.

Boyce-Codd Normal Form:

Boyce Codd Normal Form (BCNF) is a further refinement of 3NF. In his later writings Codd refers to BCNF as 3NF. A row is in Boyce Codd normal form if, and only if, every determinant is a candidate key. Most entities in 3NF are already in BCNF.

Fourth Normal Form:

An entity is in Fourth Normal Form (4NF) when it meets the requirement of being in Third Normal Form (3NF) and additionally:

Has no multiple sets of multi-valued dependencies. In other words, 4NF states that no entity can have more than a single one-to-many relationship

Why using hints

It is a perfect valid question to ask why hints should be used. Oracle comes with an optimizer that promises to optimize a query's execution plan. When this optimizer is really doing a good job, no hints should be required at all.

Sometimes, however, the characteristics of the data in the database are changing rapidly, so that the optimizer (or more accuratly, its statistics) are out of date. In this case, a hint could help.

You should first get the explain plan of your SQL and determine what changes can be done to make the code operate without using hints if possible. However, hints such as ORDERED, LEADING, INDEX, FULL, and the various AJ and SJ hints can tame a wild optimizer and give you optimal performance

Table analyze and update Analyze Statement

The ANALYZE statement can be used to gather statistics for a specific table, index or cluster. The statistics can be computed exactly, or estimated based on a specific number of rows, or a percentage of rows:

ANALYZE TABLE employees COMPUTE STATISTICS;

ANALYZE INDEX employees_pk COMPUTE STATISTICS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 100 ROWS;

ANALYZE TABLE employees ESTIMATE STATISTICS SAMPLE 15 PERCENT;

Page 29 of 93TCS Internal

Page 30: Bi Concepts

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES');

EXEC DBMS_STATS.gather_table_stats('SCOTT', 'EMPLOYEES', estimate_percent => 15);

Automatic Optimizer Statistics Collection

By default Oracle 10g automatically gathers optimizer statistics using a scheduled job called GATHER_STATS_JOB. By default this job runs within maintenance windows between 10 P.M. to 6 A.M. week nights and all day on weekends. The job calls the DBMS_STATS.GATHER_DATABASE_STATS_JOB_PROC internal procedure which gathers statistics for tables with either empty or stale statistics, similar to the DBMS_STATS.GATHER_DATABASE_STATS procedure using the GATHER AUTO option. The main difference is that the internal job prioritizes the work such that tables most urgently requiring statistics updates are processed first.

Informatica Session Log shows busy percentage

***** RUN INFO FOR TGT LOAD ORDER GROUP [1], CONCURRENT SET [1] ****

Thread [READER_1_1_1] created for [the read stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG] has completed: Total Run Time = [7.193083] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]

Thread [TRANSF_1_1_1] created for [the transformation stage] of partition point [SQ_ACW_PCBA_APPROVAL_STG] has completed. The total run time was insufficient for any meaningful statistics.

Thread [WRITER_1_*_1] created for [the write stage] of partition point [ACW_PCBA_APPROVAL_F1, ACW_PCBA_APPROVAL_F] has completed: Total Run Time = [0.806521] secs, Total Idle Time = [0.000000] secs, Busy Percentage = [100.000000]

Hint categories

Hints can be categorized as follows:

Page 30 of 93TCS Internal

Page 31: Bi Concepts

Hints for Optimization Approaches and Goals, Hints for Access Paths, Hints for Query Transformations, Hints for Join Orders, Hints for Join Operations, Hints for Parallel Execution, Additional Hints

ORDERED- This hint forces tables to be joined in the order specified. If you know table X has fewer rows, then ordering it first may speed execution in a join.

PARALLEL (table, instances)This specifies the operation is to be done in parallel.

If index is not able to create then will go for /*+ parallel(table, 8)*/-----For select and update example---in where clase like st,not in ,>,< ,<> then we will use.

[NO]APPEND-This specifies that data is to be or not to be appended to the end of a file rather than into existing free space. Use only with INSERT commands..

EXPLAIN PLAN UsageTable analyze and update statisticDbms_job.get_table_stats (owner=’schemaname’, table=table name)

When an SQL statement is passed to the server the Cost Based Optimizer (CBO) uses database statistics to create an execution plan which it uses to navigate through the data. Once you've highlighted a problem query the first thing you should do is EXPLAIN the statement to check the execution plan that the CBO has created. This will often reveal that the query is not using the relevant indexes, or indexes to support the query are missing. Interpretation of the execution plan is beyond the scope of this article.

The explain plan process stores data in the PLAN_TABLE. This table can be located in the current schema or a shared schema and is created using in SQL*Plus as follows:

SQL> CONN sys/password AS SYSDBAConnectedSQL> @$ORACLE_HOME/rdbms/admin/utlxplan.sqlSQL> GRANT ALL ON sys.plan_table TO public;

SQL> CREATE PUBLIC SYNONYM plan_table FOR sys.plan_table;;

Syntax for synonym

CREATE OR REPLACE SYNONYM CTS_HZ_PARTIES FOR [email protected]

Page 31 of 93TCS Internal

Page 32: Bi Concepts

CREATE DATABASE LINK TS4EDW CONNECT TO CDW_ASA IDENTIFIED BY k0kroach USING 'TS4EDW'

Full Outer Join?

If we want all the parts (irrespective of whether they are supplied by any supplier or not), and all the suppliers (irrespective of whether they supply any part or not) listed in the same result set, we have a problem. That's because the traditional outer join (using the '+' operator) is unidirectional, and you can't put (+) on both sides in the join condition. The following will result in an error:

SQL> select p.part_id, s.supplier_name 2 from part p, supplier s 3 where p.supplier_id (+) = s.supplier_id (+);where p.supplier_id (+) = s.supplier_id (+) *ERROR at line 3:ORA-01468: a predicate may reference only one outer-joined table

Up through Oracle8i, Oracle programmers have used a workaround to circumvent this limitation. The workaround involves two outer join queries combined by a UNION operator, as in the following example:

SELECT e.last_name, e.department_id, d.department_nameFROM employees e FULL OUTER JOIN departments d ON (e.department_id = d.department_id) ;

SQL> select p.part_id, s.supplier_name 2 from part p, supplier s 3 where p.supplier_id = s.supplier_id (+) 4 union 5 select p.part_id, s.supplier_name 6 from part p, supplier s 7 where p.supplier_id (+) = s.supplier_id;

WTT.NAME WORK_TYPE, (CASE WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY IS NOT NULL) THEN L1.ATTRIBUTE_CATEGORY ELSE PTT.TASK_TYPE END) TASK_TYPE, PEI.DENOM_CURRENCY_CODE

Page 32 of 93TCS Internal

Page 33: Bi Concepts

What’s the difference between View and Materialized View?

In view we cannot do DML commands where as it is possible in Materialized view.

A view has a logical existence but a materialized view has a physical existence.Moreover a materialized view can be Indexed, analysed and so on....that is all the things that we can do with a table can also be done with a materialized view.

We can keep aggregated data into materialized view. we can schedule the MV to refresh but table can’t.MV can be created based on multiple tables.

Materialized View?Since when we are working with various databases running in different system,So sometime we may needed to fetch some records from the remote location,so it may quit expensive in terms of resourse of fetching data directly from remote location.To to minimize to response time and to increse the throughput we may create the copy to that on local database by using data from remote database.This duplicate copy is Known as materialised view,which may be refreshed as per as requirment as option avilable with oracle such as fast,complete and refresh.

CREATE MATERIALIZED VIEW EBIBDRO.HWMD_MTH_ALL_METRICS_CURR_VIEW TABLESPACE EBIBDDNOCACHELOGGINGNOCOMPRESSNOPARALLELBUILD IMMEDIATEREFRESH COMPLETESTART WITH sysdateNEXT TRUNC(SYSDATE+1)+  4/24   WITH PRIMARY KEYAS select * from HWMD_MTH_ALL_METRICS_CURR_VW;

Another Method to refresh:

DBMS_MVIEW.REFRESH('MV_COMPLEX', 'C');

Target Update Override

By default, the Integration Service updates target tables based on key values. However, you can override the default UPDATE statement for each target in a mapping. You might want to update the target based on non-key columns.

Overriding the WHERE Clause

Page 33 of 93TCS Internal

Page 34: Bi Concepts

You can override the WHERE clause to include non-key columns. For example, you might want to update records for employees named Mike Smith only. To do this, you edit the WHERE clause as follows:

UPDATE T_SALES SET DATE_SHIPPED =:TU.DATE_SHIPPED,TOTAL_SALES = :TU.TOTAL_SALES WHERE :TU.EMP_NAME = EMP_NAME and EMP_NAME = 'MIKE SMITH'

If you modify the UPDATE portion of the statement, be sure to use :TU to specify ports.

What is the Difference between Delete ,Truncate and Drop?

DELETE

The DELETE command is used to remove rows from a table. A WHERE clause can be used to only remove some rows. If no WHERE condition is specified, all rows will be removed. After performing a DELETE operation you need to COMMIT or ROLLBACK the transaction to make the change permanent or to undo it.

TRUNCATE

TRUNCATE removes all rows from a table. The operation cannot be rolled back. As such, TRUCATE is faster and doesn't use as much undo space as a DELETE.

DROP

The DROP command removes a table from the database. All the tables' rows, indexes and privileges will also be removed. The operation cannot be rolled back.

Difference between Rowid and Rownum

ROWIDA globally unique identifier for a row in a database. It is created at the time the row is inserted into a table, and destroyed when it is removed from a table.'BBBBBBBB.RRRR.FFFF' where BBBBBBBB is the block number, RRRR is the slot(row) number, and FFFF is a file number.

ROWNUM

For each row returned by a query, the ROWNUM pseudocolumn returns a number indicating the order in which Oracle selects the row from a table or set of joined rows. The first row selected has a ROWNUM of 1, the second has 2, and so on.

Page 34 of 93TCS Internal

Page 35: Bi Concepts

You can use ROWNUM to limit the number of rows returned by a query, as in this example:

SELECT * FROM employees WHERE ROWNUM < 10;

What is the difference between sub-query & co-related sub query?

A sub query is executed once for the parent statement whereas the correlated sub query is executed once for each row of the parent query.

ExampleSelect deptno, ename, sal from emp a where sal = (select max(sal) from emp where deptno = a.deptno) group by deptno

Get dept wise max sal along with empname and emp no.

Select a.empname, a.empno, b.sal, b.deptno From EMP a, (Select max (sal) sal, deptno from EMP group by deptno) bWherea.sal=b.sal anda.deptno=b.deptno

Below query transpose rows into columns.

selectemp_id,max(decode(row_id,0,address))as address1,max(decode(row_id,1,address)) as address2,max(decode(row_id,2,address)) as address3from (select emp_id,address,mod(rownum,3) row_id from temp order by emp_id )group by emp_id

other query

select emp_id,max(decode(rank_id,1,address)) as add1,max(decode(rank_id,2,address)) as add2,max(decode(rank_id,3,address))as add3

Page 35 of 93TCS Internal

Page 36: Bi Concepts

from(select emp_id,address,rank() over (partition by emp_id order by emp_id,address )rank_id from temp )group byemp_id

Also below is the logic for converting columns into Rows without using Normalizer Transformation.

1)      Source will contain two columns address and id.2)      Use sorter to arrange the rows in ascending order.3)      Then create expression as shown in below screen shot.

3) Use Aggregator transformation and check group by on port id only. As shown below:-

Page 36 of 93TCS Internal

Page 37: Bi Concepts

WTT.NAME WORK_TYPE, (CASE WHEN (PPC.CLASS_CODE = 'Subscription' AND L1.ATTRIBUTE_CATEGORY IS NOT NULL) THEN L1.ATTRIBUTE_CATEGORY ELSE PTT.TASK_TYPE END) TASK_TYPE, PEI.DENOM_CURRENCY_CODE

Rank query:

Select empno, ename, sal, r from (select empno, ename, sal, rank () over (order by sal desc) r from EMP);

Dense rank query:

The DENSE_RANK function works acts like the RANK function except that it assigns consecutive ranks:

Select empno, ename, Sal, from (select empno, ename, sal, dense_rank () over (order by sal desc) r from emp);

Top 5 salaries by using rank:

Select empno, ename, sal,r from (select empno,ename,sal,dense_rank() over (order by sal desc) r from emp) where r<=5;

Or

Select * from (select * from EMP order by sal desc) where rownum<=5;

2 nd highest Sal:

Select empno, ename, sal, r from (select empno, ename, sal, dense_rank () over (order by sal desc) r from EMP) where r=2;

Top sal:

Select * from EMP where sal= (select max (sal) from EMP);

Second highest sal

Select * from EMP where sal= (Select max (sal) from EMP where sal< (select max (sal) from EMP));

Page 37 of 93TCS Internal

Page 38: Bi Concepts

Or

Select max (sal) from emp where sal < (select max (sal) from emp)

Remove duplicates in the table:

Delete from EMP where rowid not in (select max (rowid) from EMP group by deptno);

Get duplicate rows from the table:Select deptno, count (*) from EMP group by deptno having count (*)>1;

SELECT column, group_functionFROM table[WHERE condition][GROUP BY group_by_expression][HAVING group_condition][ORDER BY column];

The WHERE clause cannot be used to restrict groups. you use theHAVING clause to restrict groups

Oracle set of statements:

DATA DEFINATION LANGUAGE :(DDL)CreateAlterDropDATA MANUPALATION LANGUAGE (DML)InsertUpdateDeleteDATAQUARING LANGUAGE (DQL)SelectDATA CONTROL LANGUAGE (DQL)GrantRevokeTRANSACTION CONTROL LANGUAGE (TCL)CommitRollbackSave point

There is a query to deleting ur duplicate recordsdelete tempwhere rowid not in (select max (rowid) from temp

Page 38 of 93TCS Internal

Page 39: Bi Concepts

group by empno.)

How to find duplicate rows in the table

Select

WIN_NR, CT_NR, SEQ_NR, count (*)

from intsmdm.V289U_SAP_CT_HA

group by

WIN_NR, CT_NR, SEQ_NR having count (*)>1

How to find second highest sal from the tableSelect max (sal) from EMP where sal < (select max (sal) from emp)

If you want to have only duplicate records, then write the following query in the Source Qualifier SQL Override,

Select distinct (deptno), deptname from dept_test a where deptno in(select deptno from dept_test b group by deptnohaving count(1)>1)

Hierarchical queries

Starting at the root, walk from the top down, and eliminate employee Higgins in the result, butprocess the child rows.SELECT department_id, employee_id, last_name, job_id, salaryFROM employeesWHERE last_name! = ’Higgins’START WITH manager_id IS NULLCONNECT BY PRIOR employee_id = menagerie;

Block in PL/SQL:

The basic unit in PL/SQL is called a block, which is made up of three parts: a declarative part, an executable part, and an exception-building part.

PL/SQL blocks can be compiled once and stored in executable form to increase response time.

Page 39 of 93TCS Internal

Page 40: Bi Concepts

Store procedure a PL/SQL program that is stored in a database in compiled form . PL/SQL stored procedure that is implicitly started when an INSERT, UPDATE or DELETE statement is issued against an associated table is called a trigger.

Difference between Procedure and Function?

A procedure or function is a schema object that logically groups a set of SQL and other PL/SQL programming language statements together to perform a specific task.

A package is a group of related procedures and functions, together with the cursors and variables they use,Packages provide a method of encapsulating related procedures, functions, and associated cursors and variables together as a unit in the database.

I would like to contribute whatever i know, according to me there are 3 main diffrences b/w procedure and function.

1. We approach procedure for some actionwe approach function for computing value.

2. Procedure is not a part of expression. i mean we can’t call procedure from expressionswhere as function can.

3. Function should return value.Procedure may return none or more. I mean it can return values how many you wish.

A function returns a value, and a function can be called in a SQL statement. No other differences.

Indexes:

Bitmap indexes are most appropriate for columns having low distinct values—such as GENDER, MARITAL_STATUS, and RELATION. This assumption is not completely accurate, however. In reality, a bitmap index is always advisable for systems in which data is not frequently updated by many concurrent systems. In fact, as I'll demonstrate here, a bitmap index on a column with 100-percent unique values (a column candidate for primary key) is as efficient as a B-tree index.

When to Create an Index

Page 40 of 93TCS Internal

Page 41: Bi Concepts

You should create an index if:

• A column contains a wide range of values

• A column contains a large number of null values

• One or more columns are frequently used together in a WHERE clause or a join condition

• The table is large and most queries are expected to retrieve less than 2 to 4 percent of the rows

Datafiles Overview

A tablespace in an Oracle database consists of one or more physical datafiles. A datafile can be associated with only one tablespace and only one database.

Tablespaces Overview

Oracle stores data logically in tablespaces and physically in datafiles associated with the corresponding tablespace.

A database is divided into one or more logical storage units called tablespaces. Tablespaces are divided into logical units of storage called segments.

Control File Contents

A control file contains information about the associated database that is required for access by an instance, both at startup and during normal operation. Control file information can be modified only by Oracle; no database administrator or user can edit a control file.

What is incremental aggregation and how it is done?

Gradually to synchronize the target data with source data,

There are further 2 techniques:-

Refresh load - Where the existing data is truncated and Reloaded completely. Incremental - Where delta or difference between target and Source data is dumped at regular intervals. Timsetamp for previous delta load has to be maintained.

Incremental aggregation performs aggregation on incremented

Page 41 of 93TCS Internal

Page 42: Bi Concepts

data..so based on requirements if we can use incremental aggregation then definately it will improve performance..so while develop mapping always keep in mind this factor too...

Dimensional Model: A type of data modeling suited for data warehousing. In a dimensional model, there are two types of tables: dimensional tables and fact tables. Dimensional table records information on each dimension, and fact table records all the "fact", or measures.

Data modeling

There are three levels of data modeling. They are conceptual, logical, and physical. This section will explain the difference among the three, the order with which each one is created, and how to go from one level to the other.

Conceptual Data Model

Features of conceptual data model include:

Includes the important entities and the relationships among them. No attribute is specified. No primary key is specified.

At this level, the data modeler attempts to identify the highest-level relationships among the different entities.

Logical Data Model

Features of logical data model include:

Includes all entities and relationships among them. All attributes for each entity are specified. The primary key for each entity specified. Foreign keys (keys identifying the relationship between different entities) are specified. Normalization occurs at this level.

At this level, the data modeler attempts to describe the data in as much detail as possible, without regard to how they will be physically implemented in the database.

In data warehousing, it is common for the conceptual data model and the logical data model to be combined into a single step (deliverable).

The steps for designing the logical data model are as follows:

1. Identify all entities. 2. Specify primary keys for all entities. 3. Find the relationships between different entities. 4. Find all attributes for each entity. 5. Resolve many-to-many relationships. 6. Normalization.

Page 42 of 93TCS Internal

Page 43: Bi Concepts

Physical Data Model

Features of physical data model include:

Specification all tables and columns. Foreign keys are used to identify relationships between tables. Demoralization may occur based on user requirements. Physical considerations may cause the physical data model to be quite different from the

logical data model.

At this level, the data modeler will specify how the logical data model will be realized in the database schema.

The steps for physical data model design are as follows:

1. Convert entities into tables. 2. Convert relationships into foreign keys. 3. Convert attributes into columns.

http://www.learndatamodeling.com/dm_standard.htm

Modeling is an efficient and effective way to represent the organization’s needs; It provides information in a graphical way to the members of an organization to understand and communicate the business rules and processes. Business Modeling and Data Modeling are the two important types of modeling.

The differences between a logical data model and physical data model is shown below.

Logical vs Physical Data ModelingLogical Data Model Physical Data ModelRepresents business information and defines business rules

Represents the physical implementation of the model in a database.

Entity TableAttribute ColumnPrimary Key Primary Key ConstraintAlternate Key Unique Constraint or Unique IndexInversion Key Entry Non Unique IndexRule Check Constraint, Default ValueRelationship Foreign KeyDefinition Comment

Type 1 Slowly Changing Dimension

In Type 1 Slowly Changing Dimension, the new information simply overwrites the original information. In other words, no history is kept.

In our example, recall we originally have the following table:

Page 43 of 93TCS Internal

Page 44: Bi Concepts

Customer Key Name State1001 Christina Illinois

After Christina moved from Illinois to California, the new information replaces the new record, and we have the following table:

Customer Key Name State1001 Christina California

Advantages:

- This is the easiest way to handle the Slowly Changing Dimension problem, since there is no need to keep track of the old information.

Disadvantages:

- All history is lost. By applying this methodology, it is not possible to trace back in history. For example, in this case, the company would not be able to know that Christina lived in Illinois before.

- Usage :

About 50% of the time.

When to use Type 1:

Type 1 slowly changing dimension should be used when it is not necessary for the data warehouse to keep track of historical changes.

Type 2 Slowly Changing Dimension

In Type 2 Slowly Changing Dimension, a new record is added to the table to represent the new information. Therefore, both the original and the new record will be present. The newe record gets its own primary key.

In our example, recall we originally have the following table:

Customer Key Name State1001 Christina Illinois

After Christina moved from Illinois to California, we add the new information as a new row into the table:

Customer Key Name State1001 Christina Illinois1005 Christina California

Page 44 of 93TCS Internal

Page 45: Bi Concepts

Advantages:

- This allows us to accurately keep all historical information.

Disadvantages:

- This will cause the size of the table to grow fast. In cases where the number of rows for the table is very high to start with, storage and performance can become a concern.

- This necessarily complicates the ETL process.

Usage:

About 50% of the time.

When to use Type 2:

Type 2 slowly changing dimension should be used when it is necessary for the data warehouse to track historical changes.

Type 3 Slowly Changing Dimension

In Type 3 Slowly Changing Dimension, there will be two columns to indicate the particular attribute of interest, one indicating the original value, and one indicating the current value. There will also be a column that indicates when the current value becomes active.

In our example, recall we originally have the following table:

Customer Key Name State1001 Christina Illinois

To accommodate Type 3 Slowly Changing Dimension, we will now have the following columns:

Customer Key Name Original State Current State Effective Date

After Christina moved from Illinois to California, the original information gets updated, and we have the following table (assuming the effective date of change is January 15, 2003):

Customer Key Name Original State Current State Effective Date1001 Christina Illinois California 15-JAN-2003

Advantages:

Page 45 of 93TCS Internal

Page 46: Bi Concepts

- This does not increase the size of the table, since new information is updated.

- This allows us to keep some part of history.

Disadvantages:

- Type 3 will not be able to keep all history where an attribute is changed more than once. For example, if Christina later moves to Texas on December 15, 2003, the California information will be lost.

Usage:

Type 3 is rarely used in actual practice.

When to use Type 3:

Type III slowly changing dimension should only be used when it is necessary for the data warehouse to track historical changes, and when such changes will only occur for a finite number of time.

ProductProduct ID(PK) Effective

DateTime(PK) Year Product Name Product Price Expiry DateTime

1 01-01-2004 12.00AM 2004 Product1 $150 12-31-2004 11.59PM1 01-01-2005 12.00AM 2005 Product1 $250

Type 3: Creating new fields. In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.

Product

Product ID(PK)Current

Year

Product

NameCurrent Product Price

Old Product

PriceOld Year

1 2005 Product1 $250 $150 2004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2006, if the product1's price changes to $350, then we would not be able to see the complete history of 2004 prices, since the old values would have been updated with 2005 product information.

Product

Product ID(PK) YearProduct

Name

Product

Price

Old Product

PriceOld Year

1 2006 Product1 $350 $250 2005

Star Schemas

Page 46 of 93TCS Internal

Page 47: Bi Concepts

A star schema is a database design where there is one central table, the fact table, that participates in many one-to-many relationships with dimension tables.

the fact table contains measures: sales quantity, cost dollar amount, sales dollar amount, gross profit dollar amount

the dimensions are date, product, store, promotion the dimensions are said to describe the measurements appearing in the fact table

The star schema is the simplest data warehouse schema. It is called a star schema because the diagram resembles a star, with points radiating from a center. The center of the star consists of one or more fact tables and the points of the star are the dimension tables, as shown in Figure   2-1 .

Figure 2-1 Star Schema

Text description of the illustration dwhsg007.gif

The most natural way to model a data warehouse is as a star schema, only one join establishes the relationship between the fact table and any one of the dimension tables.

A star schema optimizes performance by keeping queries simple and providing fast response time. All the information about each level is stored in one row.

Snow flake Schema

If a dimension is normalized, we say it is a snowflaked design. 

Consider the Product dimension, and suppose we have the following attribute hierarchy:

SKU -> brand -> category -> department

for a given SKU, there is one brand for a given brand, there is one category for a given category, there is one department

 

Page 47 of 93TCS Internal

Page 48: Bi Concepts

a department has many brands a category has many brands  a brand has many products

Snowflake schema, which is a star schema with normalized dimensions in a tree structure.

Snowflake schema is a more complex data warehouse model than a star schema, and is a type of star schema. It is called a snowflake schema because the diagram of the schema resembles a snowflake.

Snowflake schemas normalize dimensions to eliminate redundancy. That is, the dimension data has been grouped into multiple tables instead of one large table. For example, a product dimension table in a star schema might be normalized into a products table, a product_category table, and a product_manufacturer table in a snowflake schema. While this saves space, it increases the number of dimension tables and requires more foreign key joins. The result is more complex queries and reduced query performance. Figure   17-3 presents a graphical representation of a snowflake schema.

Figure 17-3 Snowflake Schema

Page 48 of 93TCS Internal

Page 49: Bi Concepts

Below is the simple data model

Below is the sq for project dim

Page 49 of 93TCS Internal

Page 50: Bi Concepts

Page 50 of 93TCS Internal

Page 51: Bi Concepts

Page 51 of 93TCS Internal

Page 52: Bi Concepts

1.ACW – Logical Design

ACW_PCBA_APPROVAL_FPrimary Key

PCBA_APPROVAL_KEY[PK1]

Non-Key AttributesPART_KEYCISCO_PART_NUMBERSUPPLY_CHANNEL_KEYNPIAPPROVAL_FLAGADJUSTMENTAPPROVAL_DATEADJUSTMENT_AMTSPEND_BY_ASSEMBLYCOMM_MGR_KEYBUYER_IDRFQ_CREATEDRFQ_RESPONSECSSD_CREATED_BYD_CREATED_DATED_LAST_UPDATED_BYD_LAST_UPDATE_DATE

ACW_DF_APPROVAL_FPrimary Key

DF_APPROVAL_KEY[PK1]

Non-Key AttributesPART_KEYCISCO_PART_NUMBERSUPPLY_CHANNEL_KEYPCBA_ITEM_FLAGAPPROVEDAPPROVAL_DATEBUYER_IDRFQ_CREATEDRFQ_RESPONSECSSD_CREATED_BYD_CREATION_DATED_LAST_UPDATED_BYD_LAST_UPDATE_DATE

ACW_USERS_DPrimary Key

USER_KEY [PK1]Non-Key Attributes

PERSON_IDEMAIL_ADDRESSLAST_NAMEFIRST_NAMEFULL_NAMEEFFECTIVE_START_DATEEFFECTIVE_END_DATEEMPLOYEE_NUMBERLAST_UPDATED_BYLAST_UPDATE_DATECREATION_DATECREATED_BYD_LAST_UPDATED_BYD_LAST_UPDATE_DATED_CREATION_DATED_CREATED_BY

ACW_DF_FEES_FPrimary Key

ACW_DF_FEES_KEY[PK1]

Non-Key AttributesPRODUCT_KEYORG_KEYDF_MGR_KEYCOST_REQUIREDDF_FEESCOSTED_BYCOSTED_DATEAPPROVING_MGRAPPROVED_DATED_CREATED_BYD_CREATION_DATED_LAST_UPDATE_BYD_LAST_UPDATED_DATE

ACW_PCBA_APPROVAL_STGNon-Key Attributes

INVENTORY_ITEM_IDLATEST_REVLOCATION_IDLOCATION_CODEAPPROVAL_FLAGADJUSTMENTAPPROVAL_DATETOTAL_ADJUSTMENTTOTAL_ITEM_COSTDEMANDCOMM_MGRBUYER_IDBUYERRFQ_CREATEDRFQ_RESPONSECSS

ACW_SUPPLY_CHANNEL_DPrimary Key

SUPPLY_CHANNEL_KEY[PK1]

Non-Key AttributesSUPPLY_CHANNELDESCRIPTIONLAST_UPDATED_BYLAST_UPDATE_DATECREATED_BYCREATION_DATED_LAST_UPDATED_BYD_LAST_UPDATE_DATED_CREATED_BYD_CREATION_DATE

ACW_PRODUCTS_DPrimary Key

PRODUCT_KEY [PK1]Non-Key Attributes

PRODUCT_NAMEBUSINESS_UNIT_IDBUSINESS_UNITPRODUCT_FAMILY_IDPRODUCT_FAMILYITEM_TYPED_CREATED_BYD_CREATION_DATED_LAST_UPDATE_BYD_LAST_UPDATED_DATE

ACW_DF_FEES_STGNon-Key Attributes

SEGMENT1ORGANIZATION_IDITEM_TYPEBUYER_IDCOST_REQUIREDQUARTER_1_COSTQUARTER_2_COSTQUARTER_3_COSTQUARTER_4_COSTCOSTED_BYCOSTED_DATEAPPROVED_BYAPPROVED_DATE

ACW_DF_APPROVAL_STGNon-Key Attributes

INVENTORY_ITEM_IDCISCO_PART_NUMBERLATEST_REVPCBA_ITEM_FLAGAPPROVAL_FLAGAPPROVAL_DATELOCATION_IDLOCATION_CODEBUYERBUYER_IDRFQ_CREATEDRFQ_RESPONSECSS

ACW_PART_TO_PID_DPrimary Key

PART_TO_PID_KEY [PK1]Non-Key Attributes

PART_KEYCISCO_PART_NUMBERPRODUCT_KEYPRODUCT_NAMELATEST_REVISIOND_CREATED_BYD_CREATION_DATED_LAST_UPDATED_BYD_LAST_UPDATE_DATE

ACW_ORGANIZATION_DPrimary Key

ORG_KEY [PK1]Non-Key Attributes

ORGANIZATION_CODECREATED_BYCREATION_DATELAST_UPDATE_DATELAST_UPDATED_BYD_CREATED_BYD_CREATION_DATED_LAST_UPDATE_DATED_LAST_UPDATED_BY

EDW_TIME_HIERARCHY

PID for DF Fees

Users

Page 52 of 93TCS Internal

Page 53: Bi Concepts

2.ACW – Physical Design

ACW_PCBA_APPROVAL_FColumns

PCBA_APPROVAL_KEY CHAR(10) [PK1]PART_KEY NUMBER(10)CISCO_PART_NUMBER CHAR(10)SUPPLY_CHANNEL_KEYNUMBER(10)NPI CHAR(1)APPROVAL_FLAG CHAR(1)ADJUSTMENT CHAR(1)APPROVAL_DATE DAT EADJUSTMENT_AMT FLOAT(12)SPEND_BY_ASSEMBLYFLOAT(12)COMM_MGR_KEY NUMBER(10)BUYER_ID NUMBER(10)RFQ_CREATED CHAR(1)RFQ_RESPONSE CHAR(1)CSS CHAR(10)D_CREATED_BY CHAR(10)D_CREATED_DAT E CHAR(10)D_LAST_UPDATED_BY CHAR(10)D_LAST_UPDATE_DATEDAT E

ACW_DF_FEES_FColumns

ACW_DF_FEES_KEY NUMBER(10) [PK1]PRODUCT_KEY NUMBER(10)ORG_KEY NUMBER(10)DF_MGR_KEY NUMBER(10)COST_REQUIRED CHAR(1)DF_FEES FLOAT(12)COSTED_BY NUMBER(10)COSTED_DATE DAT EAPPROVING_MGR NUMBER(10)APPROVED_DATE DAT ED_CREATED_BY CHAR(10)D_CREATION_DATE DAT ED_LAST_UPDATE_BY CHAR(10)D_LAST_UPDATED_DAT ECHAR(10)

ACW_USERS_DColumns

USER_KEY NUMBER(10) [PK1]PERSON_ID CHAR(10)EMAIL_ADDRESS CHAR(10)LAST_NAM E VARCHAR2(50)FIRST _NAME VARCHAR2(50)FULL_NAM E CHAR(10)EFFECTIVE_START _DATEDAT EEFFECTIVE_END_DAT E DAT EEMPLOYEE_NUMBER NUMBER(10)SEX NUMBERLAST_UPDATE_DATE DAT ECREATION_DATE DAT ECREATED_BY NUMBER(10)D_LAST_UPDATED_BY CHAR(10)D_LAST_UPDATE_DATE DAT ED_CREATION_DATE DAT ED_CREATED_BY CHAR(10)

ACW_DF_APPROVAL_FColumns

DF_APPROVAL_KEY NUMBER(10) [PK1]PART_KEY NUMBER(10)CISCO_PART_NUMBER CHAR(30)SUPPLY_CHANNEL_KEYNUMBER(10)PCBA_ITEM_FLAG CHAR(1)APPROVED CHAR(1)APPROVAL_DATE DAT EBUYER_ID NUMBER(10)RFQ_CREATED CHAR(1)RFQ_RESPONSE CHAR(1)CSS CHAR(10)D_CREATED_BY CHAR(10)D_CREATION_DATE DAT ED_LAST_UPDATED_BY CHAR(10)D_LAST_UPDATE_DATEDAT E

ACW_PCBA_APPROVAL_STGColumns

INVENTORY_IT EM_IDNUMBER(10)LATEST _REV CHAR(10)LOCATION_ID NUMBER(10)LOCATION_CODE CHAR(10)APPROVAL_FLAG CHAR(1)ADJUSTMENT CHAR(1)APPROVAL_DATE DAT ETOTAL_ADJUSTMENTCHAR(10)TOTAL_ITEM _COST FLOAT(10)DEMAND NUMBERCOMM_MGR CHAR(10)BUYER_ID NUMBER(10)BUYER VARCHAR2(240)RFQ_CREATED CHAR(1)RFQ_RESPONSE CHAR(1)CSS CHAR(10)

ACW_PRODUCTS_DColumns

PRODUCT_KEY NUMBER(10) [PK1]PRODUCT_NAME CHAR(30)BUSINESS_UNIT_ID NUMBER(10)BUSINESS_UNIT VARCHAR2(60)PRODUCT_FAM ILY_ID NUMBER(10)PRODUCT_FAM ILY VARCHAR2(180)IT EM_TYPE CHAR(30)D_CREATED_BY CHAR(10)D_CREATION_DATE DAT ED_LAST_UPDATE_BY CHAR(10)D_LAST_UPDATED_DAT ECHAR(10)

ACW_SUPPLY_CHANNEL_DColumns

SUPPLY_CHANNEL_KEYNUMBER(10) [PK1]SUPPLY_CHANNEL CHAR(60)DESCRIPT ION VARCHAR2(240)LAST_UPDATED_BY NUMBERLAST_UPDATE_DATE DAT ECREATED_BY NUMBER(10)CREATION_DATE DAT ED_LAST_UPDATED_BY CHAR(10)D_LAST_UPDATE_DATEDAT ED_CREATED_BY CHAR(10)D_CREATION_DATE DAT E

ACW_DF_FEES_STGColumns

SEGMENT 1 VARCHAR2(40)ORGANIZATION_IDNUMBER(10)IT EM_TYPE CHAR(30)BUYER_ID NUMBER(10)COST_REQUIRED CHAR(1)QUARTER_1_COSTFLOAT(12)QUARTER_2_COSTFLOAT(12)QUARTER_3_COSTFLOAT(12)QUARTER_4_COSTFLOAT(12)COSTED_BY NUMBER(10)COSTED_DATE DAT EAPPROVED_BY NUMBER(10)APPROVED_DATE DAT E

ACW_DF_APPROVAL_STGColumns

INVENTORY_IT EM_ID NUMBER(10)CISCO_PART_NUMBERCHAR(30)LATEST _REV CHAR(10)PCBA_ITEM_FLAG CHAR(1)APPROVAL_FLAG CHAR(1)APPROVAL_DATE DAT ELOCATION_ID NUMBER(10)SUPPLY_CHANNEL CHAR(10)BUYER VARCHAR2(240)BUYER_ID NUMBER(10)RFQ_CREATED CHAR(1)RFQ_RESPONSE CHAR(1)CSS CHAR(10)

ACW_PART_TO_PID_DColumns

PART_T O_PID_KEY NUMBER(10) [PK1]PART_KEY NUMBER(10)CISCO_PART_NUMBERCHAR(30)PRODUCT_KEY NUMBER(10)PRODUCT_NAME CHAR(30)LATEST _REVISION CHAR(10)D_CREATED_BY CHAR(10)D_CREATION_DATE DAT ED_LAST_UPDATED_BYCHAR(10)D_LAST_UPDATE_DATEDAT E

ACW_ORGANIZAT ION_DColumns

ORG_KEY NUMBER(10) [PK1]ORGANIZATION_CODE CHAR(30)CREATED_BY NUMBER(10)CREATION_DATE DAT ELAST_UPDATE_DATE DAT ELAST_UPDATED_BY NUMBERD_CREATED_BY CHAR(10)D_CREATION_DATE DAT ED_LAST_UPDATE_DATEDAT ED_LAST_UPDATED_BYCHAR(10)

EDW_TIME_HIERARCHY

PID_for_DF_Fees

Users

Page 53 of 93TCS Internal

Page 54: Bi Concepts

3. ACW - Data Flow Diagram

ACW BO Reports

ETLCDB

D ACW_DF_FEES_STG

D ACW_PCBA_APPROVAL_STG

D ACW_DF_APPROVAL_STG

D ACW_PRODUCTS_D

D ACW_ORGANIZATIONS_D

D NRTREF

D REFADM

D CMERO

D MFGISRO

D ACW_PCBA_APPROVAL_F

D EDW_TIME_HIERARCHY_D

D ACW_DF_APPROVAL_F

D ACW_DF_FEES_F

D ACW_USERS_DD ACW_PART_TO_PID_D

D ACW_SUPPLY_CHANNEL_D

D EDW_TIME_HIERARCHY

ESMPRD - DWRPT

Dimensional

ESMPRD

ODSPROD

SJPROD

EDWPROD

ACW

DF Fees

DF Approval

PCBA Approval

BO Reports

SJ and ODS Data

ACW Data

ODS Data

X1

Time Dim

Logic for getting header for the target flat file

Page 54 of 93TCS Internal

Page 55: Bi Concepts

Implementation for Incremental LoadMethod -I

Page 55 of 93TCS Internal

Page 56: Bi Concepts

Logic in the mapping variable is

Page 56 of 93TCS Internal

Page 57: Bi Concepts

Logic in the SQ is

Logic in the expression is to set max value for mapping var is below

Page 57 of 93TCS Internal

Page 58: Bi Concepts

Logic in the update strategy is below

Page 58 of 93TCS Internal

Page 59: Bi Concepts

Page 59 of 93TCS Internal

Page 60: Bi Concepts

Method -II

Updating parameter File

Logic in the expression

Page 60 of 93TCS Internal

Page 61: Bi Concepts

Main mapping

Sql override in SQ Transformation

Page 61 of 93TCS Internal

Page 62: Bi Concepts

Workflod Design

Parameter file

Page 62 of 93TCS Internal

Page 63: Bi Concepts

It is a text file below is the format for parameter file. We use to place this file in the unix box where we have installed our informatic server.

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_AUSTRIA]$InputFileName_BAAN_SALE_HIST=/interface/dev/etl/apo/srcfiles/HS_025_20070921$DBConnection_Target=DMD2_GEMS_ETL$$CountryCode=AT$$CustomerNumber=120165

[GEHC_APO_DEV.WF:w_GEHC_APO_WEEKLY_HIST_LOAD.WT:wl_GEHC_APO_WEEKLY_HIST_BAAN.ST:s_m_GEHC_APO_BAAN_SALES_HIST_BELGIUM]$DBConnection_Sourcet=DEVL1C1_GEMS_ETL$OutputFileName_BAAN_SALES=/interface/dev/etl/apo/trgfiles/HS_002_20070921$$CountryCode=BE$$CustomerNumber=101495

Transformation…

Mapping: Mappings are the highest-level object in the Informatica object hierarchy, containing all objects necessary to support the movement of data.Session: A session is a set of instructions that tells informatica Server how to move data from sources to targets.WorkFlow: A workflow is a set of instructions that tells Informatica Server how to execute tasks such as sessions, email notifications. In a workflow multiple sessions can be included to run in parallel or sequential manner.Source Definition: The Source Definition is used to logically represent an application database table.Target Definition: The Target Definition is used to logically represent a database table or file in the Data Warehouse / Data Mart.Aggregator: The Aggregator transformation is used to perform calculations on additive data. Reduce performance. Expression: The Expression transformation is used to evaluate, create, modify data or set and create variablesFilter: The Filter transformation is used as a True/False gateway for passing data through a given path in the mapping. Should be used earlier to reduce unwanted data to pass. Joiner: The Joiner transformation is used to join two related heterogeneous data sources residing in different physical locations or file systemsLookup: The Lookup transformation is used to retrieve a value from database and apply the retrieved values against the values passed in from another transformation.Normalizer: The Normalizer transformation is used to transform structured data (such as COBOL or flat files) into relational dataRank: The Rank transformation is used to order data within certain data set so that only the top or bottom n records are retrieved

Page 63 of 93TCS Internal

Page 64: Bi Concepts

Sequence Generator: The Sequence Generator transformation is used to generate numeric key values in sequential order.Source Qualifier: The Source Qualifier transformation is used to describe in SQL the method by which data is to be retrieved from a source application system.Stored Procedure: The Stored Procedure transformation is used to execute externally stored database procedures and functionsUpdate Strategy: The Update Strategy transformation is used to indicate the DML statement. Input Transformation: Input transformations are used to create a logical interface to a mapplet in order to allow data to pass into the mapplet.Output Transformation: Output transformations are used to create a logical interface from a mapplet in order to allow data to pass out of a mapplet.

Adventage of Teradata: 7. Can store Billions of rows. 8. parallel processing makes teradata faster than other RDBMS9. Can be accessed by network attached and channel attached system10. supports the requirements from diverse clients11. automatically detects and recovers from hardware failure12. allows expansion without sacrifice performance

IntroductionThis document is intended to provide a uniform approach for developers in building Informatica mappings and sessions.

Informatica OverviewInformatica is a powerful Extraction, Transformation, and Loading tool and is been deployed at GE Medical Systems for data warehouse development in the Business Intelligence Team. Informatica comes with the following clients to perform various tasks.

Designer – used to develop transformations/mappings Workflow Manager / Workflow Monitor replace the Server Manager - used to

create sessions / workflows/ worklets to run, schedule, and monitor mappings for data movement

Repository Manager – used to maintain folders, users, permissions, locks, and repositories.

Server – the “workhorse” of the domain. Informatica Server is the component responsible for the actual work of moving data according to the mappings developed and placed into operation. It contains several distinct parts such as the Load Manager, Data Transformation Manager, Reader and Writer.

Repository Server - Informatica client tools and Informatica Server connect to the repository database over the network through the Repository Server.

Page 64 of 93TCS Internal

Page 65: Bi Concepts

Informatica Architecture at GE Medical Systems

DEVELOPMENT ENVIRONMENT

TESTING ENVIRONMENT

PRODUCTION ENVIRONMENT

Page 65 of 93TCS Internal

IFDEV

DWDEVSOURCE DATA

DEVELOPMENT DATABSE

INFORMATICA DEVELOPMENT REPOSITORY(Americas Development Repository)

INFORMATICA SERVER

GEMSDW1 (3.231.200.74)

IFDEV

DWTESTSOURCE DATA

TEST DATABASE

INFORMATICA DEVELOPMENT REPOSITORY(Americas Development Repository)

INFORMATICA SERVER

GEMSDW1 (3.231.200.74)

IFMAR

DWSTAGESOURCE DATA

STAGE DATABASEINFORMATICA SERVER

GEMSDW2 (3.231.200.69)

Page 66: Bi Concepts

General Development GuidelinesThe starting point of the development is the logical model created by the Data Architect. This logical model forms the foundation for metadata, which will be continuously be maintained throughout the Data Warehouse Development Life Cycle (DWDLC). The logical model is formed from the requirements of the project. At the completion of the logical model technical documentation defining the sources, targets, requisite business rule transformations, mappings and filters. This documentation serves as the basis for the creation of the Extraction, Transformation and Loading tools to actually manipulate the data from the applications sources into the Data Warehouse/Data Mart.

To start development on any data mart you should have the following things set up by the Informatica Load Administrator

Informatica Folder. The development team in consultation with the BI Support Group can decide a three-letter code for the project, which would be used to create the informatica folder as well as Unix directory structure.

Informatica Userids for the developers Unix directory structure for the data mart. A schema XXXLOAD on DWDEV database.

The best way to get the informatica set-up done is to put a request in the following website.http://uswaudom02medge.med.ge.com/GEDSS/prod/BIDevelopmentSupport.nsf/

Transformation SpecificationsBefore developing the mappings you need to prepare the specifications document for the mappings you need to develop. A good template is placed in the templates folder (\\3.231.100.33\GEDSS_All\QuickPlace_Home\Tools\Informatica\Installation_&_Development\Templates). You can use your own template as long as it has as much detail or more than that which is in this template.

Page 66 of 93TCS Internal

FIN2

INFORMATICA PRODUCTION REPOSITORY(Americas Production Repository) REPORTING

DATABASE

MIRRORING

Page 67: Bi Concepts

While estimating the time required to develop mappings the thumb rule is as follows. Simple Mapping – 1 Person Day Medium Complexity Mapping – 3 Person Days Complex Mapping – 5 Person Days.

Usually the mapping for the fact table is most complex and should be allotted as much time for development as possible.

Data Loading from Flat FilesIt’s an accepted best practice to always load a flat file into a staging table before any transformations are done on the data in the flat file.Always use LTRIM, RTRIM functions on string columns before loading data into a stage table.You can also use UPPER function on string columns but before using it you need to ensure that the data is not case sensitive (e.g. ABC is different from Abc)If you are loading data from a delimited file then make sure the delimiter is not a character which could appear in the data itself. Avoid using comma-separated files. Tilde (~) is a good delimiter to use.We have a screen door program, which can check for common errors in flat files. You should work with the BI support team to set up the screen door program to check the flat files from which your data mart extracts data. For more information you can go to this link \\3.231.100.33\GEDSS_All\QuickPlace_Home\Projects\DataQA

Data Loading from Database tablesMappings which run on a regular basis should be designed in such a way that you query just that data from the source table which has changed since the last time you extracted data from the source table. If you are extracting data from more than one table from the same database by joining them then you can have multiple source definitions and a single source qualifier instead of having joiner transformation to join them as shown in the figure below. You can put the join conditions in the source qualifier. If the tables exist in different databases you can make use of synonyms for querying them from the same database

INSTEAD OF

Page 67 of 93TCS Internal

T1

T2

T3

SQ1

T1

T2

T3

SQ1

SQ1

SQ1

JON1

Page 68: Bi Concepts

Data Loading from tables in Oracle AppsWhen you try to import the source definition from a table in Oracle Apps using source analyser in designer you might face problems, as informatica cannot open up so many schemas at the same time. The best way to import the source definition of a table in Oracle Apps is to take the table creation script you want to import and create it in a test schema and import the definition from there.

CommentingAny experienced developer would agree to the point that a good piece of code is not just a script, which runs efficiently and does what it is required to do, but also one that is commented properly. So in keeping with good coding practices, informatica mappings, sessions and other objects involved in the mappings need to be commented properly as well. This not only helps in the production support team to debug the mappings in case they throw errors while running in production but also this way we are storing the maximum metadata in the informatica repository which might be useful when we build a central metadata repository in the near future.

Each folder should have the Project name and Project leader name in the comments box.

Each mapping should have a comment, which tells what the mapping does at very high level

Each transformation should have a comment in the description box, which tells the purpose of the transformation.

If the transformation is taking care of a business rule then that business rule should be mentioned in the comment.

Each port should have its purpose documented in the description box.

Log filesA session log is created for each session that runs. The verbosity of the logs can be tailored to specific performance or troubleshooting needs. By default, the session log name is the name of the mapping with the .log extension. This should not normally be overridden. The Session Wizard has two options for modifying the session name, by appending either the ordinal (if saving multiple sessions is enabled) or the time (if saving session by timestamp is enabled). Be aware that when saving session logs by timestamp, Informatica does not perform any deletion or archiving of the session logs.Whenever using the VERBOSE DATA option of informatica logging use a condition to load just a few records rather than doing a full load. This conserves the space on the Unix Box. Also you should remove the verbose option as soon as you are done with the troubleshooting. You should configure your informatica sessions to create the log files at /ftp/oracle/xxx/logs/ directory and the bad files in /ftp/oracle/xxx/errors/ directory where xxx stands for the three-letter code of the data mart.

Page 68 of 93TCS Internal

Page 69: Bi Concepts

Triggering Sessions and BatchesThe standard methodology to schedule informatica sessions and batches is through Cronacle scripts.

Failure NotificationOnce in production your sessions and batches need to send out notification when then fail to the load administrator. You can do this by calling the script /i01/proc/admin/intimate_load_failure.sh xxx in the failure post session commands. The script intimate_load_failure.sh takes the three letter data mart code as argument.

Page 69 of 93TCS Internal

Page 70: Bi Concepts

Naming Conventions and usage of Transformations

Quick ReferenceObject Type SyntaxFolder XXX_<Data Mart Name>Mapping m_fXY_ZZZ_<Target Table Name>_x.xSession s_fXY_ZZZ_<Target Table Name>_x.xBatch b_<Meaningful name representing the sessions inside>Source Definition <Source Table Name>Target Definition <Target Table Name>Aggregator AGG_<Purpose>Expression EXP_<Purpose>Filter FLT_<Purpose>Joiner JNR_<Names of Joined Tables>Lookup LKP_<Lookup Table Name>Normalizer Norm_<Source Name>Rank RNK_<Purpose>Router RTR_<Purpose>Sequence Generator SEQ_<Target Column Name>Source Qualifier SQ_<Source Table Name>Stored Procedure STP_<Database Name>_<Procedure Name>Update Strategy UPD_<Target Table Name>_xxxMapplet MPP_<Purpose>Input Transformation INP_<Description of Data being funneled in>Output Tranformation OUT_<Description of Data being funneled out>Database Connections XXX_<Database Name>_<Schema Name>

General ConventionsThe name of the transformation should be as self-explanatory as possible regarding its purpose in the mapping.Wherever possible use short forms of long words.Use change of case as word separators instead of under scores to conserve characters.E.g. FLT_TransAmtGreaterThan0 instead of FLT_Trans_Amt_Greater_Than_0.Preferably use all UPPER CASE letters in naming the transformations.

Folder

XXX_<Data Mart Name>

XXX stands for the three-letter code for that specific data mart.Make sure you put in the project leader name and a brief note on what the data mart is all about in the comments box of the folder.

Page 70 of 93TCS Internal

Page 71: Bi Concepts

ExampleBIO_Biomed_Datamart

MappingMapping is the Informatica Object which contains set of transformations including source and target. Its look like pipeline.m_fXY_ZZZ_<Target Table Name or Meaningful name>_x.x

f= frequency d=daily,w=weekly,m=monthly,h=hourlyX = L for Load or U for UpdateY = S for Stage or P for ProductionZZZ = Three-letter data mart code.x.x = version no. eg. 1.1

ExampleName of a mapping which just inserts data into the stage table Contract_Coverages on a daily basis.m_dLS_BIO_Contract_Coverages_1.1

SessionA session is a set of instructions that tells informatica Server how to move data from sources to targets.

s_fXY_ZZZ_<Target Table Name or Meaningful name>_x.xf = frequency d=daily,w=weekly,m=monthly,h=hourlyX = L for Load or U for UpdateY = S for Stage or P for ProductionZZZ = Three-letter data mart code.x.x = version no. eg. 1.1

ExampleName of a session which just inserts data into the stage table Contract_Coverages on a daily basis.

s_dLS_BIO_Contract_Coverages_1.1

A workflow is a set of instructions that tells the Informatica Server how to execute tasks such as sessions, email notifications, and shell commands. In a workflow multiple sessions can be included to run in parallel or sequential manner

Page 71 of 93TCS Internal

Page 72: Bi Concepts

WorkFlow

A workflow is a set of instructions that tells Informatica Server how to execute tasks such as sessions, email notifications and commands. In a workflow multiple sessions can be included to run in parallel or sequential manner.

Wkf_<Meaningful name representing the sessions inside>

ExampleName of a workflow which contains sessions which run daily to load US sales datawkf_US_SALES_DAILY_LOAD

Source DefinitionThe Source Definition is used to logically represent an application database table. The Source Definition is associated to a single table or file and is created through the Import Source process. It is usually the left most object in a mapping.Naming convention is as follows. <Source Table Name>

Source Definition could also be named as follows<Source File Name>In case there are multiple files then use the common part of all the files

OR<Database name>_<Table Name>

ORXY_<Db Name>_<Table Name>XY = TB if it’s a tableFF if it’s a flat fileCB if it’s a COBOL file

ExampleName of source table from dwdev databaseTB_DWDEV_TRANSACTION, FF_OPERATING_PLAN

Target DefinitionThe Target Definition is used to logically represent a database table or file in the Data Warehouse / Data Mart. The Target Definition is associated to a single table and is created in the Warehouse Designer. The target transformation is usually the right most transformation in a mappingNaming convention is as follows. <Target Table Name>

Target Definition could also be named as follows<Database name>_<Table Name>

ORXY_<Db Name>_<Table Name>XY = TB if it’s a tableFF if it’s a flat fileCB if it’s a COBOL file

ExampleName of target table in dwdev database

Page 72 of 93TCS Internal

Page 73: Bi Concepts

TB_DWDEV_TRANSACTION, FF_OPERATING_PLAN

AggregatorThe Aggregator transformation is used to perform Aggregate calculations on group basis.

AGG_<Purpose>Example

Name of aggregator which aggregates to transaction amountAGG_SUM_OF_TRANS_AMTName of aggregator which is used to find distinct recordsAGG_DISTINCT_ORDERS

ExpressionThe Expression transformation is used to perform the arithmetic calculation on row by row basis and also used to convert string to integer vis and concatenate two columns.:

1. Variable names should begin with the letters “v_’ followed by the datatype and name.

o Character data – v_char/v_vchar/v_vchar2/v_texto Numeric data – v_num/v_float/v_dec/v_realo Integer data – v_int/v_sinto Date data – v_dateo Sequential data – v_seq

2. Manipulations of string should be indicated in the name of the new port. For example, conc_CustomerName.

3. Manipulations of numeric data should be indicated in the name of the new port. For example, sum_AllTaxes.

Naming convention of the transformation itself is as follows.EXP_<Purpose>

ExampleName of expression which is used to trim columnsEXP_TRIM_COLSName of exression which is used to decode geography identifiers to geography descriptionsEXP_DECODE_GEOG_ID

FilterThe Filter transformation is used as a True/False gateway for passing data through a given path in the mapping. Filters are almost always used in tandem to provide a path for both possibilities. The Filter transformation should be used as early as possible in a mapping in order to preserve performance.Naming convention is as follows.

FLT_<Purpose>Filters could also be named as follows

FLT_<Column in Condition>Example

Name of filter which filters out records which are already existing in the target table.FLT_STOP_OLD_RECS

Page 73 of 93TCS Internal

Page 74: Bi Concepts

Name of filter which filters out records with geography identifiers less than zeroFLT_GEO_ID or FLT_GeoidGreaterThan0

JoinerThe Joiner transformation is used to join two related heterogeneous data sources residing in different physical locations or file systems. One of the most common uses of the joiner is to join data from a relational table to a flat file etc. The sources or tables joined should be annotated in the Description field of the Transformation tab for the Joiner transformation.

JNR_<Names of Joined Tables>Example

Name of joiner which joins TRANSACTION and GEOGRAPHY tableJNR_TRANX

LookupThe Lookup transformation is used to retrieve a value(s) from database and apply the retrieved value(s) against a value(s) passed in from another transformation. The existence of the retrieved value(s) can then be used in other transformations to satisfy a condition.Lookup transformations can be used in either a connected or unconnected state. Where possible, the unconnected state should be used to enhance performance. However, it must be noted that only one return value can be passed out of an unconnected lookup.The ports needed for the Lookup should be suffixed with the letters “_in” for the input ports and “_out” for the output ports. Port data types should not normally be modified in a Lookup transformation, but instead should be modified in a prior transformation.

Often lookups fail and developers are left to wonder why. The datatype of a port is absolutely essential in validating data through a lookup. For example, a decimal(19,2) and a money datatype will not match.

When overriding the Lookup SQL, always ensure to put a valid Order By or Order By 1 statement in the SQL. This will cause the database to perform the order rather than Informatica server as builds the cache.

Naming convention is as follows.LKP_<Lookup Table Name>Example

Name of lookup transformation looking up on transaction table would beLKP_TRANSACTION

NormalizerThe Normalizer transformation is used to transform structured data (such as COBOL or flat files) into relational data. The Normalizer works by having the file header and detail information identified by the developer in the transformation, and then looping through the structured file according to the transformation definition.

Norm_<Source Name>Example

Name of Normalizer normalizing data in OS_TRANS_DAILY fileNorm_OS_TRANS_DAILY

Page 74 of 93TCS Internal

Page 75: Bi Concepts

RankThe Rank transformation is used to order data within certain data set so that only the top or bottom n records are retrieved. For example, you can order Stores by Sales Quarterly and then filter only the top 10 Store records. The reference to the business rule governing the ranking should be annotated in the Description field of the Transformation tab for the Rank transformation.

RNK_<Purpose>

ExampleName of Rank which picks top 10 Customers by Sales Amounts.RNK_TopTenCustbySales

RouterRTR_<Purpose>

ExampleName of which routes data based on the value of Geography IdentifierRTR_GeoidGreaterThan0

ORRTR_GEO_ID

Sequence GeneratorThe Sequence Generator transformation is used to generate numeric key values in sequential order. This is normally done to produce surrogate primary keys etc. It has been observed that reusable sequence generators don’t work as efficiently as stand alone sequence generators. To overcome this there are two options.

1) Use the procedure described in the appendix A of this document.2) Use a trigger on the target table to populate the primary key automatically when a

record is inserted.

SEQ_<Target Column Name>

ExampleName of sequence generator feeding primary key column to transaction tableSEQ_TRANSACTION

Source QualifierThe Source Qualifier transformation is used to describe in SQL (or in the native script of the DBMS platform, e.g. SQL for Oracle) the method by which data is to be retrieved from a source application system. The Source Qualifier describes any joins, join types, order or group clauses, and any filters of the data.Care should be exercised in the use of filters in the Source Qualifier or in overriding the default SQL or native script. The amount of data can be greatly affected using this option, such that a mapping can become invalid. Use this option only when it is known that the data excluded will not be needed in the mapping.Naming convention is as follows.

Page 75 of 93TCS Internal

Page 76: Bi Concepts

SQ_<Source Table Name>

ExampleName of source qualifier of Transaction tableSQ_TRANSACTION

Stored ProcedureThe Stored Procedure transformation is used to execute externally stored database procedures and functions. The transformation can execute any require functionality as needed, from truncate table to complex business logic. Avoid using stored procedures as far as possible as this makes the mappings difficult to debug and also reduces readability of the code. Informatica doesn’t have the LIKE operator and you can use a store procedure, which will do the LIKE test and send out a flag. Similarly any operator or function, which is not available in informatica but available in the database server, can be used by the usage of small stored procedure. You should resist the temptation of putting all the logic in a stored procedure.Naming convention is as follows.

STP_<Database Name>_<Procedure Name>

ExampleName of stored procedure to calculate commission in dwdev databaseSTP_DWDEV_Calc_Commission

Update StrategyThe Update Strategy transformation is used to indicate the type of data modification (DML) that will occur to a table in the database. The transformation can provide INSERT, UPDATE, or DELETE functionality to the data. As far as possible don’t use the REJECT option of Update Strategy as details of the rejected records are entered into the log file by informatica and hence this may lead to the creation of a very big log file.Naming convention is as follows.

UPD_<Target Table Name>_xxxxxx = _ins for INSERT_dlt for DELETE_upd for UPDATE_dyn – dynamic (the strategy type is decided by an algorithm inside the update strategy transformation

When using an Update Strategy transformation, do not allow the numeric representation of the strategy to remain in the expression. Instead, replace the numeric with the following:0 - INSERT1- DELETE2 - UPDATE

ExampleName of update strategy which updates TRANSACTION tableUPD_TRANSACTION_upd

Page 76 of 93TCS Internal

Page 77: Bi Concepts

MappletMapplets are a way of capturing complex transformation logic and storing the logic for reuse. It may also be designed to pre-configure transformations that are redundant, thus saving development time. Mapplets usually contain several transformations configured to meet a specific transformation need.In order for mapplets to be reusable, input and output transformation ports are required. These ports provide a logical interface from a mapping to the mapplet. As with all interface designs, mapplets require careful design to ensure their maximum efficiency and reusability.All transformations contained within the mapplet should be named in accordance with the Transformation Naming Convention listed above. The exception is that if the target data mart name is required, it should not be included unless the mapplet is specific to a single data mart project. If the mapplet is specific to a data mart project, make sure it is documented as such.It is important to ensure the Description field of the mapplet is completed. Additionally, the functionality and reuse of the mapplet should be defined as well.

MPP_<Purpose>

ExampleName of mapplet, which splits monthly estimates to weekly estimatesMPP_SPLIT_ESTIMATES

Input Tranformation ( Mapplet Only)Input transformations are used to create a logical interface to a mapplet in order to allow data to pass into the mapplet. This interface may represent the next step in a mapping from the output port side of any transformation.INP_<Description of Data being funnelled in>

ExampleINP_APJournalEntries

Output Tranformation ( Mapplet Only)Output transformations are used to create a logical interface from a mapplet in order to allow data to pass out of a mapplet. This interface may represent the next step in a mapping to the input port side of any transformation.OUT_<Description of Data being funnelled in>

ExampleOUT_ValidJournalEntries

Database ConnectionsWhen creating database connections in server manager to access source databases and target database you should follow the following naming convention to avoid confusion and make production migration easy.XXX_<Database Name>_<Schema_Name> where XXX stands for the three-letter datamart code.

Example

Page 77 of 93TCS Internal

Page 78: Bi Concepts

Database connection for cfdload schema on dwdev database for ORS datamart sessions would beORS_dwdev_cfdload

Version ControlThe version control feature provided by Informatica is not mapping specific but folder specific. You cannot version individual mappings separately. You need to save the whole contents of the folder as a different version when ever you want to change the version of a single mapping. Hence we have proposed to do version control through PVCS. You need to have the following structure set up in PVCS before you start you start development.PVCS----|

|-----Project Name---||-----Informatica_Folder_Name_1|-----Informatica_Folder_Name_2|

|-----Mappings|---Mapping1 |----

Mapping2 |-----Sessions|----Session1

|----Session1

You can start PVCS right from the day one when development begins. This way the PVCS can server as the central repository for all scripts including informatica scripts which will enable the developers to access the production scripts anytime. The name of the mapping should reflect the version number. Refer to the naming conventions of mappings. The first cut mappings should be named with a suffix “_1.0”, next whenever you want to make changes you should first make a copy of the mapping and change the suffix to “_1.2”.

Testing Guidelines

Testing a New Data martGet a fresh schema XXXLOAD created in the DWTEST database. Test your mappings by loading data into XXXLOAD schema on DWTEST database. You can do a partial load or a full load depending on the amount of data you have. This schema would later serve the purpose of testing out any changes you make to the data mart once it’s moved into production. For testing the sessions and mappings you can use the template in the templates folder. You can use your own improvised template in case this doesn’t suffice your requirement.

Page 78 of 93TCS Internal

Page 79: Bi Concepts

Testing changes to a data mart in productionFirst develop the changes in the Informatica Development Repository and test them out by loading data into XXXLOAD schema on DWDEV database. Next make sure the schema XXXLOAD in DWTEST database has exactly the same structure and data as in XXXLOAD on DWSTAGE. Now test out all your changes by loading data into XXXLOAD schema on DWTEST database. After you are satisfied with the test results you can move the changes to production. Make sure you follow the same process to move your changes from DWDEV to DWTEST as you would follow to move the changes to DWSTAGE.

Production Migration You should first go through the BI change control procedure described in the document at the following link\\uswaufs03medge.med.ge.com\GEDSS_All\QuickPlace_Home\Processes\Change_Control

Moving a New Data Mart To ProductionThe following documentation needs to be done before you can move the Informatica scripts of a new data mart to production. You can find all the template at the following link

\\3.231.100.33\GEDSS_All\QuickPlace_Home\Tools\Informatica\Installation_&_Development\Templates

1) Production Migration Document for Informatica (MD120). This is in addition to the one you fill out for moving the Oracle Scripts of the new data mart to production.2) Load Process Document. This document should explain through diagrams and text the whole load process.3) Some data marts might need more documents for adequately documenting the

loads and this can be discussed with the BI Support team on what all documents need to be created.

You need to place all these documents on the file server also known as the Quick place for the BI Support Team to review.

Moving a Change to ProductionThe following documentation needs to be done before you can move a change to Informatica scripts, which are already in production.1) Production Migration Document for Informatica (MD120). This document needs

to be updated with the change.2) Change Document. This document needs to be filled out and placed in the

quickplace under the support folder of your project name. Rename the change document file name to CC_99999.doc where 99999 stand for the global change id given by the change control application. Make sure you analyze the impact of the change you are making and document it in the change document. The impact could be changes in load time, load strategy etc. If possible attach an email from the user that says that the user has validated the change in the test environment.

Page 79 of 93TCS Internal

Page 80: Bi Concepts

Performance TuningThe goal of performance tuning is to optimize session performance by eliminating performance bottlenecks. To tune the performance of a session, first you identify a performance bottleneck, eliminate it, and then identify the next performance bottleneck until you are satisfied with the session performance. You can use the test load option to run sessions when you tune session performance.The most common performance bottleneck occurs when the Informatica Server writes to a target database. You can identify performance bottlenecks by the following methods:

Running test sessions. You can configure a test session to read from a flat file source or to write to a flat file target to identify source and target bottlenecks.

Studying performance details. You can create a set of information called performance details to identify session bottlenecks. Performance details provide information such as buffer input and output efficiency.

Monitoring system performance. You can use system-monitoring tools to view percent CPU usage, I/O waits, and paging to identify system bottlenecks.

Once you determine the location of a performance bottleneck, you can eliminate the bottleneck by following these guidelines:

Eliminate source and target database bottlenecks. Have the database administrator optimize database performance by optimizing the query, increasing the database network packet size, or configuring index and key constraints.

Eliminate mapping bottlenecks. Fine tune the pipeline logic and transformation settings and options in mappings to eliminate mapping bottlenecks.

Eliminate session bottlenecks. You can optimize the session strategy and use performance details to help tune session configuration.

Eliminate system bottlenecks. Have the system administrator analyze information from system monitoring tools and improve CPU and network performance.

If you tune all the bottlenecks above, you can further optimize session performance by partitioning the session. Adding partitions can improve performance by utilizing more of the system hardware while processing the session.Because determining the best way to improve performance can be complex, change only one variable at a time, and time the session both before and after the change. If session performance does not improve, you might want to return to your original configurations.For more information check out the Informatica Help from any of the three informatica client tools.

Page 80 of 93TCS Internal

Page 81: Bi Concepts

Performance Tips

If suppose I've to load 40 lacs records in the target table and the workflow is taking about 10 - 11 hours to finish. I've already increased the cache size to 128MB. There are no joiner, just lookups and expression transformations

Ans:

(1) If the lookups have many records, try creating indexes on the columns used in the lkp condition. And try increasing the lookup cache.If this doesnt increase the performance. If the target has any indexes disable them in the target pre load and enable them in the target post load.

(2) Three things you can do w.r.t it,

1. Increase the Commit intervals ( by default its 10000) 2. Use bulk mode instead of normal mode incase ur target doesn't have primary keys or use pre and post session SQL to implement the same (depending on the business req.) 3. Uses Key partitionning to load the data faster.

(3)If your target consists key constraints and indexes u slowthe loading of data. To improve the session performance inthis case drop constraints and indexes before you run thesession and rebuild them after completion of session.

What is Performance tuning in Informatica

The aim of performance tuning is optimize sessionperformance so sessions run during the available load windowfor the Informatica Server.

Increase the session performance by following.

The performance of the Informatica Server is related tonetwork connections. Data generally moves across a networkat less than 1 MB per second, whereas a local disk movesdata five to twenty times faster. Thus network connectionsofteny affect on session performance.So aviod twrokconnections.

Flat files: If u'r flat files stored on a machine other thanthe informatca server, move those files to the machine thatconsists of informatica server.

Relational datasources: Minimize the connections to sources,targets and informatica server to improve sessionperformance.Moving target database into server system mayimprove session performance.

Page 81 of 93TCS Internal

Page 82: Bi Concepts

Staging areas: If you use staging areas u force informaticaserver to perform multiple datapasses. Removing of stagingareas may improve session performance.

yoU can run the multiple informatica servers againist thesame repository. Distibuting the session load to multipleinformatica servers may improve session performance.

Run the informatica server in ASCII datamovement modeimproves the session performance. Because ASCII datamovementmode stores a character value in one byte.Unicode mode takes2 bytes to store a character.

If a session joins multiple source tables in one SourceQualifier, optimizing the query may improve performance.Also, single table select statements with an ORDER BY orGROUP BY clause may benefit from optimization such as addingindexes.

We can improve the session performance by configuring thenetwork packet size,which allows data to cross the networkat one time.To do this go to server manger ,choose serverconfigure database connections.

If your target consists key constraints and indexes u slowthe loading of data. To improve the session performance inthis case drop constraints and indexes before you run thesession and rebuild them after completion of session.

Running a parallel sessions by using concurrent batches willalso reduce the time of loading the data. So concurentbatches may also increase the session performance.

Partittionig the session improves the session performance bycreating multiple connections to sources and targets andloads data in paralel pipe lines.

In some cases if a session contains a aggregatortransformation ,you can use incremental aggregation toimprove session performance.

Aviod transformation errors to improve the session performance.

If the session containd lookup transformation you canimprove the session performance by enabling the look up cache.

If your session contains filter transformation ,create thatfilter transformation nearer to the sources or you can usefilter condition in source qualifier.

Aggreagator,Rank and joiner transformation may oftenlydecrease the session performance .Because they must groupdata before processing it. To improve session performance inthis case use sorted ports option.

Page 82 of 93TCS Internal

Page 83: Bi Concepts

1. Filter as soon as possible (left most in mapping). Process only the data necessary and eliminate as much extra unnecessary data as possible. Use Source Qualifier to filter data since the Source Qualifier transformation limits the row set extracted from a source while the Filter transformation limits the row set sent to a target.

2. Only pass data through an Expression Transformation if some type of manipulation is being done. If no manipulations are being done using that field then push the data to the farthest active transformation possible. Turn off output ports where the field isn’t passed on.

3. Cache lookups if source table is under 500,000 rows and DON’T cache for tables over 500,000 rows.

4. Reduce the number of transformations. Don’t use an Expression Transformation to collect fields. Don’t use an Update Transformation if only inserting. Insert mode is the default.

5. If a value is used in multiple ports, calculate the value once (in a variable) and reuse the result instead of recalculating it for multiple ports.

6. Reuse objects where possible.

7. Delete unused ports particularly in the Source Qualifier and Lookups.

8. Use Operators in expressions over the use of functions.

9. Avoid using Stored Procedures, and call them only once during the mapping if possible.

10. Remember to turn off Verbose logging after you have finished debugging.

11. Use default values where possible instead of using IIF (ISNULL(X),,) in Expression port.

12. When overriding the Lookup SQL, always ensure to put a valid Order By statement in the SQL. This will cause the database to perform the order rather than Informatica Server while building the Cache.

13. Improve session performance by using sorted data with the Joiner transformation. When the Joiner transformation is configured to use sorted data, the Informatica Server improves performance by minimizing disk input and output.

14. Improve session performance by using sorted input with the Aggregator Transformation since it reduces the amount of data cached during the session.

Page 83 of 93TCS Internal

Page 84: Bi Concepts

15. Improve session performance by using limited number of connected input/output or output ports to reduce the amount of data the Aggregator transformation stores in the data cache.

16. Use a Filter transformation prior to Aggregator transformation to reduce unnecessary aggregation.

17. Performing a join in a database is faster than performing join in the session. Also use the Source Qualifier to perform the join.

18. Define the source with less number of rows and master source in Joiner Transformations, since this reduces the search time and also the cache.

19. When using multiple conditions in a lookup conditions, specify the conditions with the equality operator first.

20. Improve session performance by caching small lookup tables.

21. If the lookup table is on the same database as the source table, instead of using a Lookup transformation, join the tables in the Source Qualifier Transformation itself if possible.

22. If the lookup table does not change between sessions, configure the Lookup transformation to use a persistent lookup cache. The Informatica Server saves and reuses cache files from session to session, eliminating the time required to read the lookup table.

23. Use :LKP reference qualifier in expressions only when calling unconnected Lookup Transformations.

24. Informatica Server generates an ORDER BY statement for a cached lookup that contains all lookup ports. By providing an override ORDER BY clause with fewer columns, session performance can be improved.

25. Eliminate unnecessary data type conversions from mappings.

26. Reduce the number of rows being cached by using the Lookup SQL Override option to add a WHERE clause to the default SQL statement.

UNIT TEST CASES TEMPLATE:

Page 84 of 93TCS Internal

Page 85: Bi Concepts

Step#

Description Test Conditions Expected Results

Actual Results,

Pass or Fail(P or F)

Tested By

SAP- CMS Interfaces1 CMS

database down

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

2 Check the no of records loaded in table

Run the Informatica Interface

Count of records in flat file and table is same

As expected

P Madhava

3 Call the SP for getting unique sequence no

Run the Informatica Interface

Get the unique number As expected

P Madhava

4 Run the interface even if no flat file present on sap server

Run the Informatica Interface

Interface will stop after finding no flat files

As expected

P Madhava

Page 85 of 93TCS Internal

Page 86: Bi Concepts

Step#

Description Test Conditions Expected Results

Actual Results,

Pass or Fail(P or F)

Tested By

5 Check for flat files when files are present on sap server

Run the Informatica Interface

Interface will load the data into CMS

As expected

P Madhava

6 SAP host name changed in the SCP script

Run the Informatica Interface

Informatica Interface will fail to do SCP of the files on to SAP server and send an error email

As expected

P Madhava

7 SAP unix user changed in the SCP script

Run the Informatica Interface

Informatica Interface will fail to do SCP of the files on to SAP server and send an error email

As expected

P Madhava

8 CMS database down after files are retrieved from SAP server

Run the Informatica Interface

Data is not loaded and files are sent to errored directory

As expected

P Madhava

9 Stored Procedure throws an error

Run the Informatica Interface

Informatica interface stops and sends an error email

As expected

P Madhava

10 Check the value of DA_LD_NR I in the control table

Run the Informatica Interface

Value of DA_LD_NR in the control table is same as that loaded in the table for that interface

As expected

P Madhava

11 Error during load the data into CMS tables

Run the Informatica Interface

Interface is stopped, files moved to errored directory and email notification sent

As expected

P Madhava

12 Error while updating the control table in CMS

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

CMS SAP Interfaces1 Value in CMS

Control table is not set to “STAGED”

Run the Informatica Interface

Informatica Interface will not generate any flat file

As expected

P Madhava

2 Value in CMS Control table is set to “STAGED”

Run the Informatica Interface

Informatica Interface will not generate flat file

As expected

P Madhava

Page 86 of 93TCS Internal

Page 87: Bi Concepts

Step#

Description Test Conditions Expected Results

Actual Results,

Pass or Fail(P or F)

Tested By

3 SAP host name changed in the SCP script

Run the Informatica Interface

Informatica Interface will fail to scp the files on to SAP server and send an error email

As expected

P Madhava

4 SAP unix user changed in the SCP script

Run the Informatica Interface

Informatica Interface will fail to SCP the files on to SAP server and send an error email

As expected

P Madhava

5 Value in CMS Control table is set to “STAGED”

Run the Informatica Interface

The status in the control table updated to “TRANSFORMED”

As expected

P Madhava

6 Value in CMS Control table is set to “STAGED” and record status is “UNPROCESSED”

Run the Informatica Interface

The status of each record is updated to “UNPROCESSED”

As expected

P Madhava

7 File generated with no records

Run the Informatica Interface

No files are send to SAP server

As expected

P Madhava

8 CMS database down

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

9 SCP of files failed

Run the Informatica Interface

Flat files are moved to error directory and send an email

As expected

P Madhava

10 SCP of files is successful

Run the Informatica Interface

Flat files are moved to processed directory

As expected

P Madhava

11 Check the no of records updated in the CMS table

Run the Informatica Interface

Count of records updated is same as count of the records in the flat file

As expected

P Madhava

12 Check the no of records present in the flat file

Run the Informatica Interface

Count of records present in the flat file is same as the records with status “UNPROCCESSED”

As expected

P Madhava

SLM CMS1 Value in SLM

Control table is not set to “STAGED”

Run the Informatica Interface

Informatica Interface will not generate any flat file

As expected

P Madhava

2 Value in SLM Control table is set to “STAGED”

Run the Informatica Interface

Informatica Interface will not generate flat file

As expected

P Madhava

Page 87 of 93TCS Internal

Page 88: Bi Concepts

Step#

Description Test Conditions Expected Results

Actual Results,

Pass or Fail(P or F)

Tested By

3 CMS database down

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

4 Check the no of records loaded in table

Run the Informatica Interface

Count of records in CMS and SLM table is same

As expected

P Madhava

5 Call the SP for getting unique sequence no

Run the Informatica Interface

Get the unique number As expected

P Madhava

6 Stored Procedure throws an error

Run the Informatica Interface

Informatica interface stops and sends an error email

As expected

P Madhava

7 Check the value of DA_LD_NR I in the control table

Run the Informatica Interface

Value of DA_LD_NR in the control table is same as that loaded in the table for that interface

As expected

P Madhava

8 Error during loading the data into CMS table

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

9 Error while updating the control table in CMS

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

10 Error while retrieving data from SLM database

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

11 Error while updating control table on SLM database

Run the Informatica Interface

Interface sends an email notification and stops

As expected

P Madhava

What is QA philosophy?

The inherent philosophy of Quality Assurance for software systems development is to ensure the system meets or exceeds the agreed upon requirements of the end-users; thus creating a high-quality, fully-functional and user-friendly application.

vijay kumar: What is 'Software Quality Assurance'? Software QA involves the entire software development PROCESS - monitoring and improving the process, making sure that any agreed-upon standards and procedures are

Page 88 of 93TCS Internal

Page 89: Bi Concepts

followed, and ensuring that problems are found and dealt with. It is oriented to 'prevention'. vijay kumar: What is 'Software Testing'? Testing involves operation of a system or application under controlled conditions and evaluating the results (eg, 'if the user is in interface A of the application while using hardware B, and does C, then D should happen'). The controlled conditions should include both normal and abnormal conditions. Testing should intentionally attempt to make things go wrong to determine if things happen when they shouldn't or things don't happen when they should. It is oriented to 'detection'. (See the Bookstore section's 'Software Testing' category for a list of useful books on Software Testing.)

Organizations vary considerably in how they assign responsibility for QA and testing. Sometimes they're the combined responsibility of one group or individual. Also common arevijay kumar: Why does software have bugs?

miscommunication or no communication - as to specifics of what an application should or shouldn't do (the application's requirements). software complexity - the complexity of current software applications can be difficult to comprehend for anyone without experience in modern-day software development. Multi-tiered applications, client-server and distributed applications, data communications, enormous relational databases, and sheer size of applications have all contributed to the exponential growth in software/system complexity. programming errors - programmers, like anyone else, can make mistakes. changing requirements (whether documented or undocumented) - the end-user may not understand the effects of changes, or may understand and request them anvijay kumar: What is verification? validation? Verification typically involves reviews and meetings to evaluate documents, plans, code, requirements, and specifications. This can be done with checklists, issues lists, walkthroughs, and inspection meetings. Validation typically involves actual testing and takes place after verifications are completed. The term 'IV & V' refers to Independent Verification and Validation. vijay kumar: What is a 'walkthrough'? A 'walkthrough' is an informal meeting for evaluation or informational purposes. Little or no preparation is usually required.

vijay kumar: What's an 'inspection'? An inspection is more formalized than a 'walkthrough', typically with 3-8 people including a moderator, reader, and a recorder to take notes. The subject of the inspection is typically a document such as a requirements spec or a test plan, and the purpose is to find problems and see what's missing, not to fix anything. Attendees should prepare for this type of meeting by reading thru the document; most problems will be found during this preparation. The result of the inspection meeting should be a written report. Thorough preparation for inspections is difficult, painstaking work, but is one of the most

Page 89 of 93TCS Internal

Page 90: Bi Concepts

cost effective methods of ensuring quality. Employees who are most skilled at inspections are like the 'eldest brother' in the parable in 'Why is it often hard forvijay kumar: What is software 'quality'? Quality software is reasonably bug-free, delivered on time and within budget, meets requirements and/or expectations, and is maintainable. However, quality is obviously a subjective term. It will depend on who the 'customer' is and their overall influence in the scheme of things. A wide-angle view of the 'customers' of a software development project might include end-users, customer acceptance testers, customer contract officers, customer management, the development organization's management/accountants/testers/salespeople, future software maintenance engineers, stockholders, magazine columnists, etc. Each type of 'customer' will have their own slant on 'quality' - the accounting department might define quality in terms of profits while an end-user might definvijay kumar: What is SEI? CMM? CMMI? ISO? IEEE? ANSI? Will it help?

SEI = 'Software Engineering Institute' at Carnegie-Mellon University; initiated by the U.S. Defense Department to help improve software development processes. CMM = 'Capability Maturity Model', now called the CMMI ('Capability Maturity Model Integration'), developed by the SEI. It's a model of 5 levels of process 'maturity' that determine effectiveness in delivering quality software. It is geared to large organizations such as large U.S. Defense Department contractors. However, many of the QA processes involved are appropriate to any organization, and if reasonably applied can be helpful. Organizations can receive CMMI ratings by undergoing assessments by qualified auditors. vijay kumar: Level 1 - characterized by chaos, periodic panics, and heroic efforts required by individuals to successfully complete projects. Few if any processes in place; successes may not be repeatable.

Level 2 - software project tracking, requirements management, realistic planning, and configuration management processes are in place; successful practices can be repeated.

Level 3 - standard software development and maintenance processes are integrated throughout an organization; a Software Engineering Process Group is is in place to oversee software processes, and training programs are used to ensurevijay kumar: Level 4 - metrics are used to track productivity, processes, and products. Project performance is predictable, and quality is consistently high.

Level 5 - the focus is on continouous process improvement. The impact of new processes and technologies can be

Page 90 of 93TCS Internal

Page 91: Bi Concepts

predicted and effectively implemented when required.

vijay kumar: What is the 'software life cycle'? The life cycle begins when an application is first conceived and ends when it is no longer in use. It includes aspects such as initial concept, requirements analysis, functional design, internal design, documentation planning, test planning, coding, document preparation, integration, testing, maintenance, updates, retesting, phase-out, and other aspects.

Defect Life Cycle : when the defect was found by tester, he assigned that bug as NEW status. Then Test Lead Analysis that bug and assign to developer OPEN status. Developer fix the bug FIX status. Then tester again test the new build if the same error occurs or not... if no means CLOSED status. Defect Life Cycle -> NEW -> OPEN -> FIX -> CLOSED Revalidation cycle means test the new version or new build have the same defect by executing the same testcases. simply like regression testing.

Bug reporting and Tracking Using the testing methodology listed above our QA engineers, translators and language specialists log issues, or bugs, on our Online Bug Tracking System. Localization issues which can be fixed by our engineers will be fixed accordingly. Internationalization and source code issues will also be logged and reported to the Client with suggestions on how to fix them. Bug Tracking process is as follows:

1. New Bugs are submitted in the Bug tracking system account by the QA.

When a bug is logged our QA engineers include all relevant information to that bug such as:

• Date/time logged

• Language

• Operating System

Page 91 of 93TCS Internal

Page 92: Bi Concepts

• Bug Type – e.g. functional, UI, Installation, translation?

• Priority – Low\Medium\High\Urgent

• Possible Screenshot of Problem

The QA also analyses the error and describes, in a minimum number of steps how to reproduce the problem for the benefit of the engineer. At this stage the bug is labelled “Open”. Each issue must pass through at least four states: Open: Opened by QA during testing Pending: Fixed by Engineer but not verified as yet Fixed: Fix Verified by QA Closed: Fix re-verified before sign-off

QA Process Cycle:

Software QA involves the entire software development PROCESS - monitoring and improving the process.

The philosophy of Quality Assurance for software systems development is to ensure the system meets or exceeds the agreed upon requirements of the end-users; thus creating a high-quality, fully-functional and user-friendly application.

Phase I: Requirements Gathering, Documentation and Agreement

Phase II: Establishing Project Standards

Phase III: Test Planning

Phase IV: Test Case Development

Phase V: QA Testing

Phase VI: User Acceptance Testing

Phase VII: System Validation

QA Life Cycle consists of 5 types ofTesting regimens:

1. Unit Testing2. Functional Testing3. System Integration Testing4. Regression Testing5. User Acceptance Testing

Page 92 of 93TCS Internal

Page 93: Bi Concepts

Unit testing: The testing, by development, of the application modules to verify each unit (module) itself meets the accepted user requirements and design and development standards

Functional Testing: The testing of all the application’s modules individually to ensure the modules, as released from development to QA, work together as designed and meet the accepted user requirements and system standards

System Integration Testing: Testing of all of the application modules in the same environment, database instance, network and inter-related applications, as it would function in production. This includes security, volume and stress testing.

Regression Testing: This is the testing of each of the application’s system builds to confirm that all aspects of a system remain functionally correct after program modifications. Using automated regression testing tools is the preferred method.

User Acceptance Testing: The testing of the entire application by the end-users ensuring the application functions as set forth in the system requirements documents and that the system meets the business needs

Page 93 of 93TCS Internal