24
INTRODUCTION e muaz k o lass BSIT 5 th semester

I ntroduction

  • Upload
    winda

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Name muaz khan Roll no 2498 Class BSIT 5 th semester . I ntroduction. The process of updating the data warehouse. Data are moved from source to target data bases A very costly, time consuming part of data warehousing - PowerPoint PPT Presentation

Citation preview

Page 1: I ntroduction

INTRODUCTION

Name muaz khanRoll no 2498Class BSIT 5th semester

Page 2: I ntroduction

The process of updating the data warehouse.

Data are moved from source to target data basesA very costly, time consuming part of data warehousingETL stand for

ExtractionTransformationLoading

Page 3: I ntroduction

DATA EXTRACTION Often performed by COBOL routines

(not recommended because of high program maintenance and no automatically generated meta data)

Sometimes source data is copied to the target database using the replication capabilities of standard RDMS (not recommended because of “dirty data” in the source systems)

Increasing performed by specialized ETL software

Page 4: I ntroduction

DATA CLEANSING Source systems contain “dirty data” that

must be cleansed ETL software contains rudimentary data

cleansing capabilities Specialized data cleansing software is often

used. Important for performing name and address correction and house holding functions

Leading data cleansing vendors include Vanity (Integrity), Harte-Hanks (Trillium), and First logic (i.d.Centric)

Page 5: I ntroduction

DATA CLEANSING· Parsing· Correcting· Standardizing· Matching· Consolidating

Page 6: I ntroduction

Parsing locates and identifies individual data elements in the source files and then isolates these data elements in the target files.

Examples include parsing the first, middle, and last name; street number and street name; and city and state

Page 7: I ntroduction

CORRECTING Corrects parsed individual data

components using sophisticated data algorithms and secondary data sources.

Example include replacing a vanity address and adding a zip code

Page 8: I ntroduction

STANDARDIZING Standardizing applies conversion

routines to transform data into its preferred (and consistent) format using both standard and custom business rules.

Examples include adding a pre name, replacing a nickname, and using a preferred street name.

Page 9: I ntroduction

MATCHING Searching and matching records within

and across the parsed, corrected and standardized data based on predefined business rules to eliminate duplications.

Examples include identifying similar names and addresses.

Page 10: I ntroduction

CONSOLIDATING Analyzing and identifying relationships

between matched records and consolidating/merging them into ONE representation.

Page 11: I ntroduction

INTRODUCTION Name Umair ilyas

Roll no 2809 Class BSIT 5th semester

Page 12: I ntroduction

Transformation

Page 13: I ntroduction

Transforms the data in accordance with the business rules and standards that have been established

Example include: format changes, deduplication, splitting up fields, replacement of codes, derived values, and aggregates

Page 14: I ntroduction

TRANSFORMATIONData Quality paradigm Correct Unambiguous Consistent Complete Data quality checks are run at 2 places

- after extraction and after cleaning and confirming additional check are run at this point

Page 15: I ntroduction

TRANSFORMATION - CLEANING DATA Anomaly Detection

Data sampling – count(*) of the rows for a department column

Column Property Enforcement Null Values in read columns Numeric values that fall outside of expected high

and lows Cols whose lengths are exceptionally short/long Cols with certain values outside of discrete valid

value sets Adherence to a read pattern/ member of a set of

pattern

Page 16: I ntroduction
Page 17: I ntroduction

TRANSFORMATION - CONFIRMING Structure Enforcement

Tables have proper primary and foreign keys

Obey referential integrity

Data and Rule value enforcement Simple business rules Logical data checks

Page 18: I ntroduction

INTRODUCTION

Name waqas ahmad

Roll no 2481

Class BSIT 5th semester

Page 19: I ntroduction

LOADING Loading Dimensions Loading Facts

Page 20: I ntroduction

LOADING DIMENSIONS Physically built to have the minimal sets of

components The primary key is a single field containing

meaningless unique integer ns these keys and never allows any other entity to assign them

De-normalized flat tables – all attributes in a dimension must take on a single value in the presence of a dimension primary key.

Should possess one or more other fields that compose the natural key of the dimension

Page 21: I ntroduction

The data loading module consists of all the steps required to administer slowly changing dimensions (SCD) and write the dimension to disk as a physical table in the proper dimensional format with correct primary keys, correct natural keys, and final descriptive attributes.

Creating and assigning the surrogate keys occur in this module.

The table is definitely staged, since it is the object to be loaded into the presentation system of the data warehouse.

Page 22: I ntroduction

LOADING FACTS Facts

Fact tables hold the measurements of an enterprise. The relationship between fact tables and measurements is extremely simple. If a measurement exists, it can be modeled as a fact table row. If a fact table row exists, it is a measurement

Page 23: I ntroduction

LOADING FACT TABLES Managing Indexes

Performance Killers at load time Drop all indexes in pre-load time Segregate Updates from inserts Load updates Rebuild indexes

Page 24: I ntroduction

THANKS