217
Using Informatica Data Explorer 5 Informatica Corporation, 2005-2006. All rights reserved. Education Services Version IDE-25102006

IDE 5.0_Basics_20061025

Embed Size (px)

Citation preview

Page 1: IDE 5.0_Basics_20061025

1

Using Informatica Data Explorer 5

Informatica Corporation, 2005-2006. All rights reserved.

Education Services

Version IDE-25102006

Page 2: IDE 5.0_Basics_20061025

2

Agenda

• Overview of Informatica Data Explorer

• Importing Metadata and Accessing Source Data

• Column Profiling

• Data Rules

• Single Table Structural Analysis

• Cross Table Profiling

• Validating Table and Cross Table Analysis

• Normalization

• Repository

• Using the Repository Navigator

• Using Repository Reports

• Integration with PowerCenter

Page 3: IDE 5.0_Basics_20061025

3

Introduction

Page 4: IDE 5.0_Basics_20061025

4

Introduction Objectives

• Identify the components of the Informatica Data Explorer product suite

• Describe the Informatica Data Explorer process flow

Page 5: IDE 5.0_Basics_20061025

Informatica Data Explorer Product Suite

IDE Source Profiler

IDEClient

Windows XP, 2000

IDEServer

Unix or Windows

UDB

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL,XML

& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

DDL

IDEProject

RepositoryNavigator

IDE ImportIMS

IDEImportVSAM

VSAM

IMS

Sequential

PDS

OS/390DB2

Unload

Command Files &

JCL

IDE Repository

PortsMSSQL

IDE Import

Relational

Via ODBC

Page 6: IDE 5.0_Basics_20061025

6

FixedTargetMapping

RepositoryNavigator

Ports:InformaticaCWM

Sources

RDBMSODBC

FlatFile

* MainFrame

Connectors

Data Table n

Data Table 2

Data Table 1

Co

nten

t Pro

filin

g

Cross

Tab

le P

rofili

ng

Single Table profiling

IDE Profiling IDE Design

Consolidated Schema

Source DataKnowledge

Base

StructureContentQuality

Product Architecture

* The IMS and VSAM importers actually use a GUI (Source Profiler) to read a Copybook and generate a program to extract mainframe data into Flat Files for use by IDE

Page 7: IDE 5.0_Basics_20061025

7

IDE Server Platform• Windows (2000/2003/XP)• Sun Solaris (7,8,9)• HP-UX (11 or later)•AIX (4.3, 5L)

IDE Server

Repository DBMS and

Server Platform

ODBC DriverConnectivity

Client

Workstation• Windows 2000• Windows XP

IDE Client

Project

Files

Workstation• Windows 2000/XP

Repository

Navigator

Workstation• Windows 2000/XP

FTM/XML

Data / Header

Files

•IBM DB2 UDB-7.2,8.1 •Informix 7.31,9.2,9.3 Informix 9•Microsoft SQL Server 7 and 2000)•Oracle 8i, 9i•Sybase 12 and 12.5

Relational Importers for:•IBM DB2 UDB-5.2, 6.1,7.1,7.2,8.1•Informix-7.24,7.31,9.1,9.2,9.3•Oracle -7.3,8,8i,9i•Sybase 10,11,12•ODBC(SQL Server, etc.)

Performs Actual Profiling

Profiling Results initial store

RepositoryCompleted Profiling

ResultsRepository does not

need to be on the same server as IDE

TCPIPConnectivity

ODBC/JDBCConnectivity

ODBC Driver (API 3.x conformance level 2)

Ports

XML format files

Technical Diagram

Flat File Importer for:•Fixed Length•Delimited•DB2 Unload

Page 8: IDE 5.0_Basics_20061025

IDE Process Flow

IDE Data Profiling

IDE Data Prep / Import

Products

IDE Schema Development

• Data Extraction • Cleansing• Transformation

Specifications

FTM / XML Metadata Mapping

DB Load

Messaging

Target DB

Target DesignOR

Message

IDE Repository and NavigatorMetadata Management

OR

OR

Data• Relational• Flat Files• VSAM• IMS• ODBC

Documented Metadata

Page 9: IDE 5.0_Basics_20061025

9

Introduction Review

• The Informatica Data Explorer product line: • Informatica Data Explorer

• Importer for Flat Files• DDL Generators• Source Profiler• Repository Navigator• Repository

• Import for Relational Databases• Import for VSAM• Import for IMS• FTM

Page 10: IDE 5.0_Basics_20061025

10

Introduction Review (cont.)

• The IDE Process Flow consists of five major processes:• Data Preparation and Import

• Data Profiling

• Schema Validation and Development

• Metadata Development

• Metadata Management

Page 11: IDE 5.0_Basics_20061025

11

Lesson 1

Importing Metadata and Accessing Source Data

Page 12: IDE 5.0_Basics_20061025

12

Lesson 1 Objectives

• Explain what an Informatica Data Explorer Project is, and how it is used

• Create and setup Informatica Data Explorer Projects

• Define the term “metadata” as used by Informatica Software

• Explain the importance of metadata in Data Profiling using Informatica Data Explorer

Page 13: IDE 5.0_Basics_20061025

13

Lesson 1 Objectives (cont.)

• Explain what source data are, and the ways in which they may be imported into Informatica Data Explorer

• Explain what the Informatica Data Explorer Flat File Importer does

• Describe the format of an Informatica Data Explorer Flat File, including the minimum requirements for Informatica Data Explorer to use it to access source data

Page 14: IDE 5.0_Basics_20061025

14

Case Study Description

• The Customer Order system is a mainframe application accessed through a CICS user interface

• It was developed 10 years ago

• The Employee Identification system is an Oracle database created 2 years ago

• Business users are sure they know the data

• Senior executives suspect the quality of the data is bad

Page 15: IDE 5.0_Basics_20061025

IDE Source Profiler

IDEClient

Windows 98, NT or 2000

IDEServer

Unix or NT

UDB

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL, IDE XML& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

DDL

IDEProject

RepositoryNavigator

IDE ImportIMS

IDE ImportVSAM

VSAM

IMS

Sequential

PDS

OS/390DB2

Unload

Command Files &

JCL

IDE Repository

PortsMSSQL

IDE ImportRelational

Via ODBC

Informatica Data Explorer Project

Page 16: IDE 5.0_Basics_20061025

16

Informatica Data Explorer Projects

• The persistent data store used by IDE

• A Project is a UNIX or Windows NT container (directory, folder etc.)

• Projects contain:• Metadata

• Data

• Profiling as well as Mapping information

• Projects are opened and closed by the IDE Server

Page 17: IDE 5.0_Basics_20061025

17

What is Metadata?

• Informatica Data Explorer defines metadata as:• Data that describes data

• Information about the characteristics of source data

• In Informatica Data Explorer, metadata is information that will create: • Schemas

• Tables

• Columns

• Other objects

Page 18: IDE 5.0_Basics_20061025

18

Why Import Metadata?

• Must be imported into an Informatica Data Explorer Project before any subsequent tasks or activities can be started

• Informatica Data Explorer needs? to know the names of the Columns in order to store Data Profiling results

• Informatica Data Explorer needs to know how to interpret the source data (Fixed vs. Delimited)

• Provides basis for automated quality assessments in data profiling

Page 19: IDE 5.0_Basics_20061025

IDE Data Sources

IDE Source Profiler

IDEClient

Windows 98, NT or 2000

IDEServer

Unix or NT

UDB

Informix

Sybase

Oracle

COBOLPrograms

IDEFTM / XML

DDL, IDE XML& DTDs

IDE Repository

FlatFiles

IDEImport

Flat File

DDL

IDEProject

RepositoryNavigator

IDE ImportIMS

IDE ImportVSAM

OS/390

Command Files &

JCL

IDE Repository

PortsMSSQL

IDE ImportRelational

Via ODBC

VSAM

IMS

Sequential

PDS

DB2Unload

Page 20: IDE 5.0_Basics_20061025

20

IDE Flat Files

• Consist of two components

• Header File• Contains metadata describing contents of a data file

• Data file• Data in either delimited or fixed column format as well as

DB2 Load format

Page 21: IDE 5.0_Basics_20061025

21

IDE Flat Files (cont.)

• Header and Data files may be • Separate files or

• Combined into one file

• A header file should not contain duplicate column names (IDE will automatically re-name them)

• IDE Flat Files may not contain Arrays (repeating groups or occurs)

Page 22: IDE 5.0_Basics_20061025

Informatica Data Explorer Flat File Components

header:file=empinfo.dat

attribute:EMPIDdata_type=INTEGERnull_rule=NOT NULLmin_value=1000max_value=9999

attribute:LAST_NAMEdata_type=CHAR(20)null_rule=NOT NULL

attribute:FIRST_NAMEdata_type=CHAR(20)null_rule=NOT NULL

attribute:GENDERdata_type=CHAR(1)null_rule=NOT NULL

attribute:DEPTIDdata_type=CHAR(4)null_rule=NOT NULLmin_value=100

149,Francis,Lynn,3,200,MIS,Database Administrator,"120 Co

249,Venkatachalam,Nagarajan,3,200,MIS,Project Leader,"300289,Kim,Suk,3,200,MIS,Staff Consultant,"4040 N Fairfax Dr216,Masood,Airaj,,200,MIS,MIS Analyst,"300 N Wakefield Dr134,Swenson,Allison,F,200,MIS,Database Administrator,"900 164,Park,Allison,F,200,MIS,Database Analyst,"PO BOX 1471"323,Blaskiewicz,Allison,F,200,MIS,Technical Specialist,"3255,Barbles,Amy,F,100,Sales,Sales Executive,"4019 Rice Bl273,Karneh,Anna,1,200,MIS,Sr Prog Analyst,"12601 Fair Lak

Header File Data File

The data file example shows the associated comma delimited file to which this header file refers

The header file example shows some of the documented information that can be loaded into Informatica Data Explorer

Page 23: IDE 5.0_Basics_20061025

23

Header and Data Files

• Header and data can be in one file

• We recommend that two files be made if created manually

• The more information that is provided in the header file, the more automatic comparisons Informatica Data Explorer can make

Page 24: IDE 5.0_Basics_20061025

24

Login

Page 25: IDE 5.0_Basics_20061025

25

Open Project

Page 26: IDE 5.0_Basics_20061025

26

Import Metadata

Page 27: IDE 5.0_Basics_20061025

27

Lab Exercises 1.1–1.6

Page 28: IDE 5.0_Basics_20061025

28

Lesson 1 Review

• An Informatica Data Explorer Project is:• The persistent data store used by Informatica Data

Explorer

• Used to organize and partition the work effort

• Metadata describes the data source and is used by Informatica Data Explorer to access the source data

• A structure that contains: • Metadata

• Data

• Profiling and Mapping information

Page 29: IDE 5.0_Basics_20061025

29

Lesson 1 Review (cont.)

• Informatica Data Explorer can import data from:• Relational Databases

• Oracle 7.3, 8, 8i, 9i or 10g

• Informix 7, 9.1, or 9.2

• Sybase 10, 11, 12 or 12.5

• IBM DB2 UDB 5.2, 6.1 7.1 or 7.2

• Microsoft SQL Server 7 and 2000 (using an ODBC driver)

• Flat Files• Delimited Format

• Fixed Length Format

• DB2 Load Format

Page 30: IDE 5.0_Basics_20061025

30

Lesson 1 Review (cont.)

• Informatica Data Explorer Flat Files must be:• ASCII or EBCDIC character format (no binary data)

• Binary data is supported via the DB2 Load Utility format

• Informatica Data Explorer Flat File may not contain:• Arrays (repeating groups or occurs)

• Duplicate column names

Page 31: IDE 5.0_Basics_20061025

31

Lesson 1 Review (cont.)

• Informatica Data Explorer Flat Files must have a header file along with the data file

• Additional information on data preparation is available in the Using Informatica Data Explorer Source Profiler course and the documentation

Page 32: IDE 5.0_Basics_20061025

32

Lesson 2

Column Profiling

Page 33: IDE 5.0_Basics_20061025

33

Lesson 2 Objectives

• Explain what Column Profiling is, and why it should be performed.

• Execute the Column Profile function of Informatica Data Explorer.

• Navigate and review the results of Column Profiling.

• Explain informational Tags.

• Describe when and how to apply informational Tags to Informatica Data Explorer objects.

Page 34: IDE 5.0_Basics_20061025

34

What is Column Profiling?

• A process of discovering physical characteristics of each column in a file

• Comparing documented Metadata against Metadata inferred from the data source

• Column Profiling is done against data in the form of • ASCII flat files

• DB2 Load Utility files

• RDBMS tables

Page 35: IDE 5.0_Basics_20061025

35

Why Profile Columns?

• Not all database metadata and documentation are accurate pictures of the data source

• Documented descriptions of data elements may be inconsistent with the way the element is actually used

• Informatica Data Explorer Column Profiling builds a description of a column (its metadata) based on the data it contains

Page 36: IDE 5.0_Basics_20061025

36

Column Lists

• The results of Column Profiling are stored with the Columns in a Table

• Column List viewers can be opened from the Navigation Tree

• Column List viewers provide information about Documented and Inferred Metadata• Documented Metadata are supplied from the header

file or source table

• Inferred Metadata are those that Informatica Data Explorer determined from examining the data.

Page 37: IDE 5.0_Basics_20061025

37

Column Profiling

Page 38: IDE 5.0_Basics_20061025

38

Column Viewer

Page 39: IDE 5.0_Basics_20061025

39

Lab Exercises 2.1–2.4

Page 40: IDE 5.0_Basics_20061025

40

Drill Down

• Allows you to perform ad hoc drill downs through data presented in the Informatica Data Explorer viewers.

• Used to interrogate any data sources that can be accessed via an ODBC connection or Informatica Data Explorer Importer.

• Searches are issued against the selected data, and rows are returned for the specified search.

Drill Downs

Page 41: IDE 5.0_Basics_20061025

41

Column Details

• Lists of properties about a Column that have been inferred by Informatica Data Explorer

• Columns can have several potential sets of characteristics

• The potential sets of characteristics are dependent on the physical view that is chosen

Page 42: IDE 5.0_Basics_20061025

42

Drill Down

Page 43: IDE 5.0_Basics_20061025

43

Drill Down Results

Page 44: IDE 5.0_Basics_20061025

44

Lab Exercises 2.5–2.7

Page 45: IDE 5.0_Basics_20061025

45

Column Value Pairs

• Informatica Data Explorer will store per Column• Up to 16,000 distinct values

• These are the most frequently occurring values from the set of all values that were observed during the Column Profile execution

• The frequency with which each value was observed

• Informatica Data Explorer will calculate• % Distribution for each distinct value based on the frequency

divided by the total rows profiled

Page 46: IDE 5.0_Basics_20061025

46

Value Pair Review

• Issues to evaluate during Column Value Pair analysis:• Are the values/range of values correct?

• Is the data type correct?

• Is there a pattern or format to the data for this Column? Do all of the values match this pattern/format?

• Is there a difference in case for alpha characters? Are some values mixed case are others all upper or lower case? Is this an issue?

• Are there different representations (different abbreviations/misspellings) of the same data?

• Are there duplicate values in a field that should be unique?

Page 47: IDE 5.0_Basics_20061025

47

Sorting Viewers

• It is possible to sort any of the tables displayed in Informatica Data Explorer

• By clicking on the column header, the results will be sorted in ascending order. Double-clicking again will sort the list in descending order

Value Pair Review

Page 48: IDE 5.0_Basics_20061025

48

Sort Order

• Sorting is based on the character codes of the values in the data:• Spaces sort to the top of an ascending sort. When the caret

(^) symbol is displayed, the sort is based on the actual “space” character not the caret (^).

• Special characters (i.e. #, &, ‘)

• Nulls

• Numbers

• Alpha characters

Page 49: IDE 5.0_Basics_20061025

49

Lab Exercises 2.8–2.10

Page 50: IDE 5.0_Basics_20061025

50

Tags

• Informatica Data Explorer Tags come in various forms, depending on the type of information you want to convey:• Notes – general text

• Action Items – things that need to be done

• Rules – business rules defining nature of object

• Transformations – requirements to change the data to fit the object

Page 51: IDE 5.0_Basics_20061025

51

Tags (cont.)

• Think of Tags as high-tech Post-Its™ that you can attach to many types of objects in an Informatica Data Explorer Project

• Note: All of the pull down menu items in Tags can be configured through server configuration files

Page 52: IDE 5.0_Basics_20061025

52

Action Tag

Page 53: IDE 5.0_Basics_20061025

53

Note Tag

Page 54: IDE 5.0_Basics_20061025

54

Rule Tag

Page 55: IDE 5.0_Basics_20061025

55

Lab Exercises 2.11–2.14

Page 56: IDE 5.0_Basics_20061025

56

Content Presentation

• Constant Analysis

• Empty Column Analysis

• Inferred Data Type Analysis

• Null Rule Analysis

• Source Data Type Analysis

• Unique Analysis

• Frequency Analysis

• Pattern Analysis

• Domain Analysis

Page 57: IDE 5.0_Basics_20061025

57

Content Presentations

Page 58: IDE 5.0_Basics_20061025

58

Content Presentation (Continued)

Page 59: IDE 5.0_Basics_20061025

59

Constant Analysis

Page 60: IDE 5.0_Basics_20061025

60

Lesson 2 Review

• Column Profiling is about the analysis of column content and format

• Column Profiling scans data files and stores the resulting profile information in an Informatica Data Explorer Project

• Column Profiling information can be viewed by opening an Column List for a Table

Page 61: IDE 5.0_Basics_20061025

61

Lesson 2 Review

• The results of Column Profiling are stored with an Column

• The results of Column Profiling include:• Primary and Alternate Data Types• Null Rules• Minimum/Maximum Value ranges• Value Pairs• Patterns

• Tags can be added to Columns or Tables to convey additional information or instructions about the Column

Page 62: IDE 5.0_Basics_20061025

62

Lesson 3

Data Rules

Page 63: IDE 5.0_Basics_20061025

63

Data Rules - Objectives

• What is a Data Rule?

• Using Data Rules in Informatica Data Explorer

• How to test for Data Rules

• Execute Data Rules tasks

• When to apply Data Rules in the data discovery process.

Page 64: IDE 5.0_Basics_20061025

64

Define Data Rules

• What is a Business Rule? • Business Rule: describe the main characteristic of the data

• What is a Data Rule?• Data Rule is a constraint written against one or more

Tables that is used to find incorrect data.

• Can be viewed as business rules for data

Page 65: IDE 5.0_Basics_20061025

65

Define Data Rules (cont.)

• Data Rules are often embedded in application programs

• The Informatica Data Explorer Practitioner can discover, document and test Data Rules against the initial source.

Page 66: IDE 5.0_Basics_20061025

66

Using Data Rules in Informatica Data Explorer

• Data Rules is the process of using Informatica Data Explorer to determine if the externally proposed data relationships are fully supported by the source data.

• Discover if the source data supports the relationships and business needs.

• Data Rules are tested against the initial source, stored and then can be re-run after the data has been cleansed or moved.

Page 67: IDE 5.0_Basics_20061025

67

Business Rules and Data Rules

• Employees with 2 or more years of service are paid 3 weeks vacation.

• Fulltime employees are assigned to a salary band.

• Employees in Dept C – salaries cannot be greater than $40,000.

• Department number contained in the employee record must correspond to an existing Department number.

• Does the Column contain a particular string of characters?

Page 68: IDE 5.0_Basics_20061025

68

Business Rules and Data Rules (cont.)

• Does one Column include the full contents of another Column?

• In an address, is there a line of blanks followed by a line of non-blanks?

• Are all three fields of a key null?

• Is the date Column in the wrong format?

• Does the Column contain the right type of data for this type of record?

Page 69: IDE 5.0_Basics_20061025

69

Create and Execute Data Rules

• Data Rules can be created from two locations:• Rules Tag

• Drill down

• Execute Data Rules from the Rules Tag viewer or Data Rules Management.

Page 70: IDE 5.0_Basics_20061025

70

Drill Down

Page 71: IDE 5.0_Basics_20061025

71

New Rule Tag

Page 72: IDE 5.0_Basics_20061025

72

Lab Exercises 3.1–3.6

Page 73: IDE 5.0_Basics_20061025

73

When to Apply Data Rules

• Tightly coupled to Drill Down

• Data Rules can be executed against different sources.

• Data Rules can be applied at any point in time during the data discovery process.

• Data Rules can be saved and re-run• After the data load as occurred or• A feed is supplied or• Data has changed for any reason

Page 74: IDE 5.0_Basics_20061025

74

Complex Data Rules

RULE LoanTypeAmtTerm

SELECT "Loan_ID","Loan_Type","Loan_Amt","Loan_Term"

FROM <Use Table in Data Source>

WHERE (UPPER(LOAN_TYPE) = 'AUTO' and

(LOAN_AMT not between 3000 and 50000 or

LOAN_TERM not between 12 and 60)) or

(UPPER(LOAN_TYPE) = 'REAL' and

(LOAN_AMT not between 10000 and 500000 or

LOAN_TERM not between 36 and 360)) or

LOAN_TYPE is null or LOAN_AMT is null or LOAN_TERM is null

Page 75: IDE 5.0_Basics_20061025

75

Data Rule Management

Page 76: IDE 5.0_Basics_20061025

76

Lesson 3 Review

• Data Rules can be created on Columns that we think are volatile.

• Data Rules can be created, saved and ran on different data sources.

• Data Rules can be created from two locations:• Rules tab• Drill down

• Execute Data Rules from the Rules Tag viewer or Data Rules Management.

Page 77: IDE 5.0_Basics_20061025

77

Lesson 4

Single Table Structural Analysis

Page 78: IDE 5.0_Basics_20061025

78

Lesson 4 Objectives

• Explain what Table Structural Profiling is, and why it should be performed

• Define the term “Functional Dependency” as used by Informatica Data Explorer, and explain the significance

• Contrast a Single-Column Determinant to a Multiple-Column (or compound) Determinant as used by Informatica Data Explorer

Page 79: IDE 5.0_Basics_20061025

79

Lesson 4 Objectives (cont.)

• Define the terms “Inferred Dependencies” and “Model Dependencies” as used by Informatica Data Explorer

• Explain why and when an Inferred Dependency should be added to the set of Model Dependencies

• Define the term “Sample Data” as used by Informatica Data Explorer, and explain the use of Sample Data in Dependency Profiling

• Understand when and how to apply Informational Tags in Dependency Profiling

Page 80: IDE 5.0_Basics_20061025

80

What is Table Structural Profiling?

• A process that discovers the interrelationships between columns in your source data

• Is performed against samples of data that you have imported into Informatica Data Explorer

• It identifies Columns that determine the value of other Columns

Page 81: IDE 5.0_Basics_20061025

81

Why Profile Table Structure?

• Functional Dependencies determine the structure of a data model and/or database design

• Functional Dependencies can be equated to an elementary form of Business Rule

• Dependencies between data items suggest organization of data storage that is both natural and efficient

Page 82: IDE 5.0_Basics_20061025

82

Why Profile Table Structure? (cont.)

• Quickly validate expected Dependencies (Keys)

• If data does not conform to expected or required dependency rules, you most likely have a data integrity problem

Page 83: IDE 5.0_Basics_20061025

83

IDEServer

RDBMS

Flat FilesDB2 LU

What is Sample Data?

• Sample Data is actual data that you import into an Informatica Data Explorer Table either from:• Downloaded flat files or • Directly from a relational database

• Sample Data is a subset of the data in the source database:• Multiple data samples can be loaded into

Informatica Data Explorer• Each data sample is stored in the Project

• Sample Data is associated with a particular Table

Page 84: IDE 5.0_Basics_20061025

84

Why Import Sample Data?

• Sample data is used in Table Structural Profiling to examine relationships of all columns of a given record

Source

Data

Column Profiling(stores results only)

ImportSampleData

Data Sample #1

Table Structural Profiling

(examines entire records)

Data Sample #2

Page 85: IDE 5.0_Basics_20061025

85

A value of EMPNO always determines the same value of ENAME throughout the sample data

EMPNO ENAME

EMPNO

123456789012345789

ENAME

John DoeJane Smith

Eduardo SanchezJane SmithJohn Doe

Eduardo Sanchez

Functional Dependencies

• An Column is functionally dependent on other Columns that determine its value

Page 86: IDE 5.0_Basics_20061025

86

Functional Dependencies (cont.)

• A Functional Dependency is written as:• A B

• ‘A’ is the Determinant Column

• ‘B’ is the Dependent Column

• The statement is ALWAYS read left to right• ‘A functionally determines B’, or

• “If I know a value for A, I can determine the value for B” or

• For each distinct value of ‘A’ there can only be one value of ‘B’

Page 87: IDE 5.0_Basics_20061025

87

Functional Dependencies (cont.)

• The determinant side can be compound:• A + B C

• ‘A’ and ‘B’ together are the Determinant Column

• ‘C’ is the Dependent Column

• The determinant side can be Null:• Ø C

• Nothing is the Determinant Column

• ‘C’ is the Dependent Column

• ‘C’ has only one value, or one value and nulls, in the whole sample

Page 88: IDE 5.0_Basics_20061025

88

Reviewing Inferred Dependencies

• You must review the set of Inferred Dependencies

• The Dependencies inferred by Informatica Data Explorer exist implicitly in the data

• You must make decisions as to which of the Inferred Dependencies explicitly represent the current use of the data

• The review process is to determine• Which dependencies should be included in the set of

dependencies from which the Normalized Schema will be generated

Page 89: IDE 5.0_Basics_20061025

89

Sample Data

Page 90: IDE 5.0_Basics_20061025

90

Exercises 4.1 - 4.4

Page 91: IDE 5.0_Basics_20061025

91

Adding an Inferred Dependency to the Model

• Inferred Dependencies added to the model establish the Tables (tables) that will be created in Normalization• Normalization breaks a single Table (table) into multiple

Tables (tables)

• For example, “Employee” Table in the source system represents two Tables (Employee and Department) once the dependencies are created and the model is normalized

Page 92: IDE 5.0_Basics_20061025

92

Adding an Inferred Dependency to the Model (cont.)

• Columns that do not participate as a Dependant are automatically included in the Primary Key• Informatica Data Explorer considers all Columns as part of

the key until a relationship is established

• Dependency Profiling is an iterative process

Page 93: IDE 5.0_Basics_20061025

93

Exercises 4.5 - 4.6

Page 94: IDE 5.0_Basics_20061025

94

Dependency Subject Area

• Inferred Dependencies• The set of dependencies that are inferred from a sample

of data for a Table

• Table Dependencies• A subset of the Model Dependencies that are wholly

contained in a Table

Page 95: IDE 5.0_Basics_20061025

95

Dependency Subject Area (cont.)

• Model Dependencies• The set of dependencies that you determine fit into your

design and are supported by the data• Model Dependencies are associated at the schema level• Model Dependencies are the set of all dependencies

across all Tables• Model Dependencies are used to create the normalized

schema

Page 96: IDE 5.0_Basics_20061025

96

Dependencies

Page 97: IDE 5.0_Basics_20061025

97

Inferred Dependencies

Page 98: IDE 5.0_Basics_20061025

98

Key Dependencies

Page 99: IDE 5.0_Basics_20061025

99

Model Dependencies

Page 100: IDE 5.0_Basics_20061025

100

Filter Dependencies

Page 101: IDE 5.0_Basics_20061025

101

Add Dependencies to Model or Filter

Page 102: IDE 5.0_Basics_20061025

102

When to Add an Inferred Dependency

• Review each Inferred Dependency and add to model only those that can have a explicit reason for existing• Is the application enforcing the dependency?

• Is the user/business enforcing the dependency?

• Is some outside source enforcing the dependency?

Page 103: IDE 5.0_Basics_20061025

103

Types of Dependencies

• True• The dependency is true for 100% of the data analyzed• Example: Every time a unique value is known for

EMPID, additional information is available (i.e. Employee Name, Address, Phone, etc.)

• Gray• The dependency is almost, but not quite 100% true for

the data analyzed• One row causes the violation

Page 104: IDE 5.0_Basics_20061025

104

Types of Dependencies (cont.)

• Unsupported

• Two or more rows in the sample data do not support the dependency

• Unknown

• The dependency has not yet been validated against the sample data (Basis dependencies validation appear as Unknown)

Page 105: IDE 5.0_Basics_20061025

105

• Questions to Ask:• What caused the dependency to be gray?• Should another sample be imported for verification?

• Review each Inferred Gray Dependency and add to model only those that can have a explicit reason for existing:• Is the application supposed to be enforcing the

dependency?• Is the user/business supposed to be enforcing the

dependency?• Is some outside source supposed to be enforcing the

dependency?

When to Add an Inferred Gray Dependency

Page 106: IDE 5.0_Basics_20061025

106

Lab Exercise 4.7

Page 107: IDE 5.0_Basics_20061025

107

Tagging Dependencies

• You cannot tag an Inferred or Model Dependency

• You add Tags to the Column that is causing the problem

Page 108: IDE 5.0_Basics_20061025

108

Compound Determinants

• Two or more Columns that uniquely identify the Dependent Column

• This often represents a M to 1 relationship in the data

• This happens quite often in older file-based systems

Page 109: IDE 5.0_Basics_20061025

109

Lab Exercise 4.8 – 4.9

Page 110: IDE 5.0_Basics_20061025

110

Lesson 4 Review

• Importing Sample Data stores the data inside an Informatica Data Explorer Project

• Sample Data is used as input to Dependency Profiling

• You must import Sample Data before you can perform the Profile Dependencies task using Informatica Data Explorer• Data samples are imported using the Import Sample

Data feature• Data samples can be retained from doing a Drill Down

or executing a Data Rule

Page 111: IDE 5.0_Basics_20061025

111

Lesson 4 Review (cont.)

• Dependency Profiling finds the relationships between Columns in the same source file or table

• All Inferred Dependencies are associated with sets of Sample Data

• Table Dependencies are dependencies that have been added to the model, and are associated with a specific Table

Page 112: IDE 5.0_Basics_20061025

112

Lesson 4 Review (cont.)

• Model Dependencies are the set of dependencies from all Tables in the schema

• Only Model Dependencies are used as input to the generation of a Normalized Schema

• All Dependencies inferred by Informatica Data Explorer exist implicitly in the data

Page 113: IDE 5.0_Basics_20061025

113

Lesson 4 Review (cont.)

• You will find many Inferred Dependencies that have no meaning in context of the application or business use of the data

• These are Implicit Dependencies that have no explicit meaning

• Dependency Profiling is an iterative process

Page 114: IDE 5.0_Basics_20061025

114

Lesson 5

Cross Table Profiling

Page 115: IDE 5.0_Basics_20061025

115

Lesson 5 Objectives

• Explain what Cross Table Profiling is, and why it should be performed

• Execute the Cross Table Profiling function in Informatica Data Explorer

• Navigate and review the results of Cross Table Profiling

• Define the terms “Synonym” and “Homonym” as used in Informatica Data Explorer

Page 116: IDE 5.0_Basics_20061025

116

Lesson 5 Objectives (cont.)

• Understand what data is used for Cross Table Profiling, and how potential Synonyms are identified

• Describe why and when a Synonym should be created

• Create a Synonym

• Understand the significance of creating Synonyms

Page 117: IDE 5.0_Basics_20061025

117

Cross Table Profiling

• The process that identifies similarity between the values in other columns

• Performed using the value sets associated with the Column objects inside Informatica Data Explorer• These are the Value Frequency

Lists that were created by Column Profiling

Page 118: IDE 5.0_Basics_20061025

118

Why Profile Redundancies?

• To uncover Columns that actually represent the same business facts

• Informatica Data Explorer can uncover two types of redundancies: • Synonyms

• Redundant data that you would like to eliminate through the creation of Synonyms

• Redundant data that is intended to improve database performance

• Homonyms• Data that looks redundant but actually represents quite different

business facts (Homonyms)

Page 119: IDE 5.0_Basics_20061025

119

Comparing Value Sets

ABC

BCDE

Value SetOverlap

Value Set1

ValueOverlap

Value Set2

Page 120: IDE 5.0_Basics_20061025

120

Inferred Redundancies

Page 121: IDE 5.0_Basics_20061025

121

Exercise 5.1 - 5.2

Page 122: IDE 5.0_Basics_20061025

122

• Two or more Columns having the same business meaning

• Comparing common values between columns can identify candidate Synonyms

SP_NOValueSet

EMPIDValueSet

28%overlap

Synonyms

Page 123: IDE 5.0_Basics_20061025

123

Effect of Synonyms

• If the Primary Keys of two Tables are synonyms, they will collapse into a single Table in the Normalized Schema

TransactionID (PK)

ProductID

ProductName

InventorySupplier

TransactionID

TransactionID (PK)

ProductID

PruductName

SupplierName

SupplierAddr

TransactionID (PK)

SupplierName

SupplierAddress

Page 124: IDE 5.0_Basics_20061025

124

Effect of Synonyms (cont.)

• If two Columns that are synonyms represent a parent-child relationship, they will result in two Columns in two Tables with one Column participating in a Primary Key and the other in the corresponding Foreign Key

OrderNumber

ProductID

ProductName

Order Payment

PaymentID

OrderID

CheckNumber

OrderNumber (PK)

ProductID

ProductName

OrderNumber

PaymentID

PaymentID

OrderNumber (FK)

CheckNumber

Page 125: IDE 5.0_Basics_20061025

125

Homonyms Defined

• Two or more Columns having the same name yet different business meanings

70%overlap

SHIPPING_STATEValueSet

STATEValueSet

Page 126: IDE 5.0_Basics_20061025

126

Making Synonyms

Page 127: IDE 5.0_Basics_20061025

127

Synonyms

Page 128: IDE 5.0_Basics_20061025

128

Exercise 5.3 - 5.4

Page 129: IDE 5.0_Basics_20061025

129

Lesson 5 Review

• Cross Table Profiling is about data integration between sets of data

• Cross Table Profiling comprises 2 activities• Comparing value lists

• Use Foreign Key or Join analysis to compare value lists greater than 16,000

• Assigning Synonyms

• Rule of Thumb• Be conservative about making Synonyms

• You can always come back after you’ve normalized the schema and make more

Page 130: IDE 5.0_Basics_20061025

130

Lesson 5 Review

• You can not make intra-table Synonyms, only inter-table

• You must have built Value Lists either during the Profile Columns task, or during the Import Sample Data task, before you can perform Cross Table Profiling

• Creation of Synonyms participates in Normalization

Page 131: IDE 5.0_Basics_20061025

131

Lesson 6

Validating Table and Cross Table Analysis

Page 132: IDE 5.0_Basics_20061025

132

• Understand how Validation differs from Cross Table Profiling

• Define and discuss the term Referential Integrity

• Explain various methods of validation and how it can be use

• Execute Validation tasks

• View Validation results

Lesson 6 Objectives

Page 133: IDE 5.0_Basics_20061025

133

• Validation can be used to:• Define the exact overlap characteristics of two redundant

Columns• Validate a single or multi-Column foreign key• Validate that the keys of two tables do not overlap (Vertical

Merge)• Validate single or multiple Column keys (Validate Keys)• Validate a Join• Validate against reference table• Validate against Domain values

• Execute Validation from the Single Table Structural Analysis and Cross Table Structural Analysis

Validation

Page 134: IDE 5.0_Basics_20061025

134

Referential Integrity

• Example A: An Order File contains an OrderID that uniquely identifies each customer order. There should be no OrderID values in the Order or Detail file that do not exist in the other.

Example A

Page 135: IDE 5.0_Basics_20061025

135

Referential Integrity (cont.)

• Example B: An Order file may have OrderID values that do not exist in the Payment file (outstanding payments or unbilled customers). The Payment file should not have any OrderID values that do not occur in the Order file.

Example B

Page 136: IDE 5.0_Basics_20061025

136

• Validation compares sets of Columns between two relations to discover the quality of the overlap.

• Validation exhaustively tests all the data.

• Cross Table Profiling discovers potential overlap between Columns.

• Cross Table Profiling estimates overlap.

• Results of Validation– sets of statistics about the overlap and non-overlapping values

Validation and Cross Table Profiling

Page 137: IDE 5.0_Basics_20061025

137

• To understand the exact overlap:• Execute Validation from the Cross Table Profiling

• create a relationship (Primary Key / Foreign Key, Join, …) between the two Columns and choose Validate

Profile Redundant Columns

Page 138: IDE 5.0_Basics_20061025

138

Exercise 6.1-6.2

Page 139: IDE 5.0_Basics_20061025

139

• Validate a Single or Multi-Column Foreign Key

• Primary use – test the Referential Integrity of primary and foreign key relationships.

• Each row in a child table must reference a row in the parent table.

• Every order detail record must reference an order.

• Information discovered can be used to help write logic to perform the data integration.

Foreign Key Analysis

Page 140: IDE 5.0_Basics_20061025

140

Foreign Key Analysis Results

Page 141: IDE 5.0_Basics_20061025

141

Parents Without Children

Page 142: IDE 5.0_Basics_20061025

142

Exercise 6.3-6.5

Page 143: IDE 5.0_Basics_20061025

143

• Primary use – when two similar systems are merged together.

• Company A merges with Company B: payroll master records are merged.

• It is expected that all rows in the parent and child tables are orphans.• Employees of Company A are not on Company B’s payroll

master file.

• Employees of Company B are not on Company A’s payroll master file.

Vertical Merge Analysis

Page 144: IDE 5.0_Basics_20061025

144

Vertical Merge

Page 145: IDE 5.0_Basics_20061025

145

Vertical Merge Analysis Results

Page 146: IDE 5.0_Basics_20061025

146

Exercise 6.6-6.8

Page 147: IDE 5.0_Basics_20061025

147

• Primary use – validate keys in a single Table

• Validation looks at the table and checks to make sure that every row is unique.

• Use this feature to find any duplicate rows for keys discovered in Single Table Structural Analysis.

Validate Key Analysis

Page 148: IDE 5.0_Basics_20061025

148

Validate Key

Page 149: IDE 5.0_Basics_20061025

149

Validate Key Analysis Results

Page 150: IDE 5.0_Basics_20061025

150

New Alternate Key

Page 151: IDE 5.0_Basics_20061025

151

Validate Alternate Key

Page 152: IDE 5.0_Basics_20061025

152

Exercise 6.9 - 6.12

Page 153: IDE 5.0_Basics_20061025

153

Lesson 7

Normalization

Page 154: IDE 5.0_Basics_20061025

154

Lesson 7 Objectives

• Explain what Normalization is and when it should be performed

• Execute the Normalization function of Informatica Data Explorer

• Navigate and review the results of Normalization

Page 155: IDE 5.0_Basics_20061025

155

Lesson 7 Objectives (cont.)

• Describe what an Column Trace is, and how it is used

• Understand how to modify the Normalized Schema by making changes to the Source Schema

• Explain the iterative nature of Normalization

Page 156: IDE 5.0_Basics_20061025

156

Normalization

• A process that transforms an initial schema into a schema with greater integrity

• A process of transforming Source Schema into a:• Non-redundant

• Anomaly-free

• Third Normal Form model

• Normalization is based upon:• Dependencies added to the model in Single Table Structural

Analysis and

• Synonyms made in Cross Table Structural Analysis

Page 157: IDE 5.0_Basics_20061025

157

Why Normalize?

• A Third Normal Form (3NF) schema has no: • Redundant Columns other than Foreign Keys

• Columns that are only partially dependant on the key

• Transitive Dependencies

• The Normalized Schema provides a checkpoint for the completeness and accuracy of the decisions you made during the profiling tasks

Page 158: IDE 5.0_Basics_20061025

158

Exercise 7.1

Page 159: IDE 5.0_Basics_20061025

Normalized Schema v. Source Schema

custord

ORDER_NO: char(4)ITEM_NO: char(6)

ORDER_DATE: datetimeSHIPDT: datetimePO_NUM: smallintLAST_NAME: varchar(10)FIRST_NAME: varchar(11)CNAME: varchar(36)CON_TTL: varchar(27)SHIPPING_STREET: varchar(40)SHIPPING_CITY: varchar(20)SHIPPING_STATE: char(2)SHIPPING_ZIP: varchar(10)PHONENUM: varchar(12)SP_NO: smallintQUANTITY: smallintITEM_DSC: varchar(25)SUPID: smallintUNIT_COST: moneyTAX_RATE: decimal(5,4)BILL_CODE: char(10)

empinfo

EMPID: smallint

LAST_NAME: varchar(17)FIRST_NAME: varchar(12)GENDER: char(1)DEPTID: smallintDEPTNM: varchar(14)TITLE: varchar(30)STREET: varchar(40)CITY: varchar(15)STATE: varchar(3)ZIP: varchar(10)PHONE: varchar(14)

All_Constant_Attributes

BILL_CODE: char(10)TAX_RATE: decimal(5,4)

ITEM_NO

ITEM_NO: char(6)

SUPID: smallintITEM_DSC: varchar(25)

ITEM_NO_ORDER_DATE

ITEM_NO: char(6)ORDER_DATE: datetime

UNIT_COST: money

ORDER_NO

ORDER_NO: char(4)

PHONENUM: varchar(12)SHIPPING_ZIP: varchar(10)SHIPPING_STATE: char(2)SHIPPING_CITY: varchar(20)SHIPPING_STREET: varchar(40)CON_TTL: varchar(27)CNAME: varchar(36)FIRST_NAME: varchar(11)LAST_NAME: varchar(10)PO_NUM: smallint

ITEM_NO_ORDER_NO

ITEM_NO: char(6)ORDER_NO: char(4)ORDER_DATE: datetimeEmployeeID: smallintDEPTID: smallint

QUANTITY: smallintSHIPDT: datetime

DEPTID

DEPTID: smallint

DEPTNM: varchar(14)

EmployeeID

EmployeeID: smallintDEPTID: smallint

PHONE: varchar(14)ZIP: varchar(10)STATE: varchar(3)CITY: varchar(15)STREET: varchar(40)TITLE: varchar(30)GENDER: char(1)FIRST_NAME: varchar(12)LAST_NAME: varchar(17)

Page 160: IDE 5.0_Basics_20061025

160

Normalized Schema Anomalies

• Observable normalization anomalies may include:• Unexpected Tables

• Duplicate Tables

• Tables with strange/unexpected keys

• Columns in the wrong locations

Page 161: IDE 5.0_Basics_20061025

161

Column Tracing

• Allows you to find the origin of an Column in another schema

• Used to determine the Source Model Dependencies and Synonyms (or the lack thereof) which may be causing the anomaly

Page 162: IDE 5.0_Basics_20061025

162

Schema Locking

• The existence of a Normalized Schema causes Informatica Data Explorer to lock various objects in the Source Schema

• In order to modify Dependencies in the Source Schema, you must remove the Normalized Schema

Page 163: IDE 5.0_Basics_20061025

163

Re-Normalizing

• In order to change the Normalized Schema, you must Remove the Normalized Schema

• Modify the Source Schema, then Re-run Normalization

• The next exercises:• Remove a dependency

• Add another Table

• Renormalize schema

• Review the new Normalized Schema

Page 164: IDE 5.0_Basics_20061025

164

Lab Exercises 7.2 – 7.3

Page 165: IDE 5.0_Basics_20061025

165

New Normalized Schema

All_Constant_Attributes

BILL_CODE: char(10)TAX_RATE: decimal(5,4)

ITEM_NO

ITEM_NO: char(6)

SUPID: smallintITEM_DSC: varchar(25)

SHIPPING_ZIP

SHIPPING_ZIP: varchar(10)

SHIPPING_STATE: char(2)SHIPPING_CITY: varchar(20)SHIPPING_STREET: varchar(40)

ORDER_NO

ORDER_NO: char(4)

PHONENUM: varchar(12)SHIPPING_ZIP: varchar(10)CON_TTL: varchar(27)CNAME: varchar(36)FIRST_NAME: varchar(11)LAST_NAME: varchar(10)PO_NUM: smallint

ITEM_NO_ORDER_NO

ITEM_NO: char(6)ORDER_NO: char(4)

UNIT_COST: moneyQUANTITY: smallintEmployeeID: smallintSHIPDT: datetimeORDER_DATE: datetime

DEPTID

DEPTID: smallint

DEPTNM: varchar(14)

EmployeeID

EmployeeID: smallint

PHONE: varchar(14)ZIP: varchar(10)STATE: varchar(3)CITY: varchar(15)STREET: varchar(40)TITLE: varchar(30)DEPTID: smallintGENDER: char(1)FIRST_NAME: varchar(12)LAST_NAME: varchar(17)

Page 166: IDE 5.0_Basics_20061025

166

Lesson 7 Review

• Normalization is a 100% automated process

• The only inputs to the normalization process are• Dependencies added to the Model

• Column Synonyms

• Refinement of the Normalized Schema is an iterative process

Page 167: IDE 5.0_Basics_20061025

167

Lesson 7 Review (cont.)

• The Normalized Schema is most often used as a basis for• Baseline view

• Review for anomalies

• Comparison to business requirements

• Staging Area

• The Normalized Schema is not a business model

Page 168: IDE 5.0_Basics_20061025

168

Lesson 7 Review (cont.)

• Normalized Schema anomalies stem from either:• Dependencies added to the model

• Dependencies not added to the model

• Incorrect (or unmade) Synonyms

• You can Normalize the Source Schema as soon as you have added dependencies to the model during Single Table Structural Analysis• Actually, you can do it any time but it will just make

a copy of your existing schema if you have not added any dependencies.

Page 169: IDE 5.0_Basics_20061025

169

Lesson 7 Review (cont.)

• If you have not established inter-relational Synonyms, you will get duplicate Tables and/or Columns in the Normalized Schema• Duplicate Tables will appear in the Normalized Model

with an extension, such as:• EmployeeID

• EmployeeID_1

• Suggestions:• Make only one change at a time and then renormalize

• Often making one change in the Source Schema can result in several changes in the Normalized Schema

Page 170: IDE 5.0_Basics_20061025

170

Lesson 8

Exporting to the Repository

Page 171: IDE 5.0_Basics_20061025

171

Lesson 8 Objective

• Export Projects to the IDE Repository

Page 172: IDE 5.0_Basics_20061025

172

What is the Repository?

• A series of relational database tables that store the results from the Informatica Data Explorer Product Suite

Page 173: IDE 5.0_Basics_20061025

173

Repository Export

• The Repository Export dialog box enables you to export an IDE catalog to the Repository

• The Repository Export dialog box provides the ability to limit some of the data that is exported to the Repository

• Once in the Repository, the Catalog becomes available to a variety of DBMS tools, such as SQL, report generators, and so on

• All schemas in the Catalog will be exported to the Repository

Page 174: IDE 5.0_Basics_20061025

IDE Repository Architecture

UNIX or Windows NT

IDEServer

ODBCDrivers

Windows XP, 2000

IDEClient

Client Server

Project

RepositoryRDBMS

UNIX or Windows NT

Page 175: IDE 5.0_Basics_20061025

175

Exporting to Repository

Page 176: IDE 5.0_Basics_20061025

176

Lab Exercise 8.1

Page 177: IDE 5.0_Basics_20061025

177

Lesson 8 Review

• You control what information from a Project is included in the Export process

• The more you export, the longer the process will take

• Information exported to the IDE Repository becomes available to:• Informatica Data Explorer Repository Navigator

• Report Writing tools

• SQL tools

Page 178: IDE 5.0_Basics_20061025

178

Lesson 9

Using the Repository Navigator

Page 179: IDE 5.0_Basics_20061025

179

Lesson 9 Objectives

• Understand use of the Repository Navigator

• Access the IDE Repository and browse its contents using the Navigator

• Explain Tags

• Understand how to share information among departments

Page 180: IDE 5.0_Basics_20061025

180

IDE Repository Navigator

• A browser for the contents of the IDE Repository

• Can be used by anyone in your enterprise

Repository~~~~

KnowledgeAbout

Corporate Systems

Structure

Content

Quality

Page 181: IDE 5.0_Basics_20061025

181

IDE Repository Architecture

UNIX or Windows NT

UNIX or Windows NT

IDEServer

ODBCDrivers

Windows XP, 2000

IDEClient

ODBCDrivers

IDESourceProfiler

Client Server

IDEFTM/XML

Project

IDE Repository

RDBMS

RepositoryNavigator

Page 182: IDE 5.0_Basics_20061025

182

Schema Viewer

• The Schema Viewer functions similar to the Navigation Tree in Informatica Data Explorer• You expand/contract objects

• You use a right-click of the mouse to view properties

• The Schema Viewer provides users with the ability to query profiling information for Tables and Columns (Properties, Tags, Sample Data, Value Frequency Lists) within each schema

Page 183: IDE 5.0_Basics_20061025

183

Exercise 9.1-9.3

Page 184: IDE 5.0_Basics_20061025

184

The Link Viewer

• The Link Viewer shows links between any two schemas in the current project.

• Link Viewer uses:

• View Links between Columns

• Find information on compatibility problems

• Access Tags associated with Links

Link Viewer

Page 185: IDE 5.0_Basics_20061025

185

Table Viewer

• Provides SQL access to the IDE Repository

• Has several pre-built SQL queries

• Allows you to run your own custom queries

Page 186: IDE 5.0_Basics_20061025

186

Exercise 9.4-9.6

Page 187: IDE 5.0_Basics_20061025

187

Lesson 9 Review

• The IDE Repository provides:• Rapid access to source data knowledge

• Team collaboration

• Enhanced communication

• Flexible ad hoc reporting

Page 188: IDE 5.0_Basics_20061025

188

Lesson 10

Repository Reports

Page 189: IDE 5.0_Basics_20061025

189

Lesson 10 Objectives

• Understand what IDE Repository Reports are

• Demonstrate how to use Repository Reports

• Create a report using a Crystal Reports template

• Export a report using Crystal Reports

Page 190: IDE 5.0_Basics_20061025

190

What are Repository Reports

• IDE Repository Reports are a series of reports to provide specific management information from the IDE Repository.

• Reports are written with Crystal Reports.

Page 191: IDE 5.0_Basics_20061025

191

Why use Crystal Reports?

• Provides a user interface to guide the design of reports that are stored in a relational database

• Can export data to other programs such as Excel, Word or HTML pages

• Provides the flexibility to create custom or ad hoc reports. The user is not limited to the reports provided in the Informatica Data Explorer Product Suite

• Accesses the IDE Repository through an ODBC connection

Page 192: IDE 5.0_Basics_20061025

192

Report Templates

• A series of reports are provided as an easy means of obtaining documentation from the IDE Repository

• The Report Templates can be modified to meet individual needs

Page 193: IDE 5.0_Basics_20061025

193

List of Reports

Column Profile - By File Column Profiling results sorted by FileColumn Profile - By Field Column Profiling results sorted by FieldNull Rule Exceptions List of Attributes with Null, Zero or BlanksValue Frequency Value Frequency Lists for AttributesSupported Relationships Inferred Dependencies for each Data SampleModel Relationships Dependencies that have been added to the ModelOverlapping Data Redundancy Profiling Overlap ReportNotes Note Tag ReportAction Items Action Item Tag ReportRules Rule Tag ReportTransformations Transformation Tag ReportAttribute Links Reports Links between Attributes

Page 194: IDE 5.0_Basics_20061025

194

Selection Criteria

• Allow users to select values for certain fields within the templates

• Limit the amount of data reported from the IDE Repository

• Each Template provides selection on ProjectName and SchemaName at a minimum

Page 195: IDE 5.0_Basics_20061025

195

Exercises 10.1 – 10.4

Page 196: IDE 5.0_Basics_20061025

196

Exporting Reports

• Crystal Reports provides an option to export report data into other file formats

• Useful for sharing data with individuals that do not have access to Crystal Reports

Page 197: IDE 5.0_Basics_20061025

197

Exercises 10.5

Page 198: IDE 5.0_Basics_20061025

198

Lesson 10 Review

• IDE Repository Reports provide reporting capability from the IDE Repository

• Additional reports can be created to meet business needs

Page 199: IDE 5.0_Basics_20061025

199

Lesson 11

Integration with PowerCenter

Page 200: IDE 5.0_Basics_20061025

200

PowerCenter Integration

• Informatica Data Explorer has the ability to share metadata with PowerCenter. This allows the business users to share knowledge that was found during the data discovery process with the PowerCenter developers.

• Objects that can be shared are: • Source and target schemas

• Filters

• Expressions (transformation tags in IDE).

Page 201: IDE 5.0_Basics_20061025

201

Create a Transformation Tag

Page 202: IDE 5.0_Basics_20061025

202

Transformation Tag

Page 203: IDE 5.0_Basics_20061025

203

Set Physical Properties

Page 204: IDE 5.0_Basics_20061025

204

Export to Repository

Page 205: IDE 5.0_Basics_20061025

205

Open Fixed Target Mapping (FTM)

Page 206: IDE 5.0_Basics_20061025

206

Open Your Project

Page 207: IDE 5.0_Basics_20061025

207

Export to PowerCenter

Page 208: IDE 5.0_Basics_20061025

208

Import Object into PowerCenter

Page 209: IDE 5.0_Basics_20061025

209

Open Customer in Source Analyzer

Page 210: IDE 5.0_Basics_20061025

210

Open a new Transform

Page 211: IDE 5.0_Basics_20061025

211

Open Ports Tab

Page 212: IDE 5.0_Basics_20061025

212

Informatica Resources

Page 213: IDE 5.0_Basics_20061025

213

Informatica – The Data Integration Company

Informatica provides data integration tools for both batch and real-time applications:

Data Migration Data Synchronization

Data Warehousing Data Hubs

Business Activity Monitoring

Page 214: IDE 5.0_Basics_20061025

214

• Founded in 1993

• Leader in enterprise solution products

• Headquarters in Redwood City, CA

• Public company since April 1999 (INFA)

• 2000+ customers, including over 80% of Fortune 100

• Strategic partnerships with IBM Global Services, HP, Accenture, SAP, and many others

• Technology partnership with Composite Software for Enterprise Information Integration (EII) – real-time federated views and reporting across multiple data sources

• Worldwide distribution

Informatica – Company Information

Page 215: IDE 5.0_Basics_20061025

215

Informatica Affiliations

Page 216: IDE 5.0_Basics_20061025

216

Informatica Resources

www.informatica.com – provides information (under Services) on:• Professional Services• Education Services

my.informatica.com – customers and contractual partners can sign up to access:• Technical Support• Product documentation (under Tools – online documentation)• Velocity Methodology (under Services)• Knowledgebase• Mapping templates

devnet.informatica.com – sign up for Informatica Developers Network• Discussion forums• Web seminars• Technical papers

Page 217: IDE 5.0_Basics_20061025

217