17
8/18/2019 ETL Example http://slidepdf.com/reader/full/etl-example 1/17 4/9/2007 1 Control Data Flow Without Mapping Changes Greg Whitaker ETL Lead Developer Choicepoint

ETL Example

  • Upload
    sahapa

  • View
    237

  • Download
    1

Embed Size (px)

Citation preview

Page 1: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 1/17

4/9/2007

1

Control Data Flow WithoutMapping Changes

Greg Whitaker 

ETL Lead Developer 

Choicepoint

Page 2: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 2/17

4/9/2007

2

Environment:The Boca Raton, FL location utilizes Informaticain a 32-bit RedHat Linux/Intel environment. Wehave version 7.1.1 installed on 4 boxes, 3

production and 1 development.

Corporate headquarters in Alpharetta, GA hasan IBM P690 running AIX 5.3. This is a 64-bitenvironment and is running version 8.1.1 SP2.

Page 3: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 3/17

4/9/2007

3

Major Points:

• How to reference column names as port values.

• How to make data “generic” using a simple

mapplet, creating a write once use everywheredata flow that can be used on any source data.

• Route data to transformations at runtime usingLookups and Routers.

Page 4: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 4/17

4/9/2007

4

Brief History

• DataQA mappings were needed for hundredsof files we process.

• A dynamic data driven quality grading solution

was developed but each column needed to bemapped specific to that dataset.

• We wanted to create a generic solution that

would work for any source data.

Page 5: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 5/17

4/9/2007

5

Demo Only

• This is only a Demo.

• This is not a robust out of the box

solution.• We’re hoping that you can use ideas

from this Demo in your own mappings

Page 6: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 6/17

4/9/2007

6

Demo Mapping:

Page 7: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 7/17

4/9/2007

7

Mapping Source:

Page 8: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 8/17

4/9/2007

8

Mapping Target:

Page 9: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 9/17

4/9/2007

9

Mapplet:

Page 10: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 10/17

4/9/2007

10

Mapplet Input:

Page 11: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 11/17

4/9/2007

11

Mapplet Union Input:

Page 12: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 12/17

Page 13: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 13/17

4/9/2007

13

Sample Lookup Data:•"COLUMN_NAME", "QA_FUNCTION"

•"ASSN_NAME_1", "|IS_ZERO1|IS_NUMERIC|NOT_NUMERIC|"

•"ASSN_NAME_2", "|IS_ZERO1|"

•"ASSN_STREET", "|IS_ZERO|"

•"ASSN_CITY", "|IS_ZERO|NOT_NUMERIC|"

•"ASSN_STATE", "|IS_ZERO|NOT_NUMERIC|"

•"ASSN_ZIP", "|IS_ZERO|IS_NUMERIC|"

•"ASSN_ZIP4", "|IS_ZERO|IS_NUMERIC|"

•"ASSN_BOX_NUM", "|IS_ZERO|IS_NUMERIC|"•"ASSN_NUMBER", "|IS_ZERO|"

•"ORIG_MAN_DT", "|IS_ZERO|VALID_DATE_YYYYMMDD|"

•"PROJ_NAME_1", "|IS_ZERO|"

•"PROJ_NAME_2", "|IS_ZERO|"

•"PROJ_STREET", "|IS_ZERO|"

•"PROJ_CITY", "|IS_ZERO|NOT_NUMERIC|"

•"PROJ_STATE", "|IS_ZERO|NOT_NUMERIC|"

•"PROJ_ZIP", "|IS_ZERO|NOT_NUMERIC|"

•"DECLARE_DATE", "|VALID_DATE_YYYYMMDD|"

Page 14: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 14/17

4/9/2007

14

Mapplet Data Router:

Page 15: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 15/17

4/9/2007

15

Mapplet Function Call:

Page 16: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 16/17

4/9/2007

16

Mapplet Union & Output:

Page 17: ETL Example

8/18/2019 ETL Example

http://slidepdf.com/reader/full/etl-example 17/17

4/9/2007

17

Sample Output Data:•Column Name Max Min Function T/F Count

•ASSN_NAME_1 ZURICH 100 IS_NUMERIC FALSE 22069

•ASSN_ZIP N7V2G IS_NUMERIC FALSE 13

•ASSN_ZIP4 R5 IS_NUMERIC FALSE 12502

•ASSN_BOX_NUM 999 014059 IS_NUMERIC TRUE 2642

•ASSN_ZIP 99999 01202 IS_NUMERIC TRUE 22056

•ASSN_BOX_NUM 999 IS_ZERO FALSE 22069

•ASSN_CITY ZOLFO AA IS_ZERO FALSE 22069

•ASSN_STATE WI AL IS_ZERO FALSE 22069

•PROJ_CITY ZOLFO AA IS_ZERO FALSE 22069

•PROJ_NAME_1 ZURICH 100 IS_ZERO FALSE 22069•PROJ_STATE FL FL IS_ZERO FALSE 22069

•PROJ_STREET ZARRAGO #1 IS_ZERO FALSE 22069

•PROJ_ZIP 99999 22308 NOT_NUMERIC FALSE 22033

•ASSN_CITY ZOLFO AA NOT_NUMERIC TRUE 22069

•ASSN_NAME_1 ZURICH 100 NOT_NUMERIC TRUE 22069

•ASSN_STATE WI AL NOT_NUMERIC TRUE 22069

•PROJ_CITY ZOLFO AA NOT_NUMERIC TRUE 22069

•PROJ_STATE FL FL NOT_NUMERIC TRUE 22069

•PROJ_ZIP 3302 NOT_NUMERIC TRUE 36

•DECLARE_DATE 20021231 19051217 VALID_DATE_YYYYMMDD TRUE 22060