Data Modeling DW concepts.docx

Embed Size (px)

Citation preview

  • 8/19/2019 Data Modeling DW concepts.docx

    1/25

    Data Modeling Overview

    A Data model is a conceptual representation of data structures(tables) required fora database and is very powerful in expressing and communicating the businessrequirements.

    A data model visually represents the nature of data, business rules governing thedata, and how it will be organized in the database. A data model is comprised of twoparts logical design and physical design.

    Data model helps functional and technical team in designing the database.unctional team normally refers to one or more !usiness Analysts, !usiness"anagers, #mart "anagement $xperts, $nd %sers etc., and &echnical teams refersto one or more programmers, D!As etc. Data modelers are responsible fordesigning the data model and they communicate with functional team to get thebusiness requirements and technical teams to implement the database.

     &he concept of data modeling can be better understood if we compare the

    development cycle of a data model to the construction of a house. or example

    'ompany A!' is planning to build a guest house(database) and it calls the building

    architect(data modeler) and proects its building requirements (business

    requirements). !uilding architect(data modeler) develops the plan (data model) and

    gives it to company A!'. inally company A!' calls civil engineers(D!A) to

    construct the guest house(database).

    Data Modeling Tools

     &here are a number of data modeling tools to transform business requirements into

    logical data model, and logical data model to physical data model. rom physicaldata model, these tools can be instructed to generate sql code for creating

    database.

    Popular Data Modeling Tools

    Tool Name Company Name

    Rational Rose IBM Corporation

    PowerDesigner 

    SybaseCorporation

    Oracle

    Designer 

    Oracle

    Corporation

    Data Modeler Role

    Business Requirement Analysis:

    » Interact with Business nalysts to get the !unctional re"uirements#» Interact with en$ users an$ !in$ out the reporting nee$s#

    » Con$uct inter%iews& brain storming $iscussions with pro'ect team to get a$$itional

  • 8/19/2019 Data Modeling DW concepts.docx

    2/25

      re"uirements#

    » (ather accurate $ata by $ata analysis an$ !unctional analysis#

    Development of data model:

    » Create stan$ar$ abbre%iation $ocument !or logical& physical an$ $imensional $ata mo$els#

    » Create logical& physical an$ $imensional $ata mo$els)$ata warehouse $ata mo$elling*#

    » Document logical& physical an$ $imensional $ata mo$els )$ata warehouse $ata mo$elling*#

    Reports:

    » (enerate reports !rom $ata mo$el#

    Review:

    » Re%iew the $ata mo$el with !unctional an$ technical team#

    Creation of database:

    » Create s"l co$e !rom $ata mo$el an$ co+or$inate with DBs to create $atabase#

    » Chec, to see $ata mo$els an$ $atabases are in synch#

    upport ! Maintenan"e:

    » ssist $e%elopers& -T.& BI team an$ en$ users to un$erstan$ the $ata mo$el#

    » Maintain change log !or each $ata mo$el#

    teps to "reate a Data Model

    These are the general gui$elines to create a stan$ar$ $ata mo$el an$ in real time& a $ata mo$el

    may not be create$ in the same se"uential manner as shown below# Base$ on the enterprise/sre"uirements& some o! the steps may be e0clu$e$ or inclu$e$ in a$$ition to these#

    Sometimes& $ata mo$eler may be as,e$ to $e%elop a $ata mo$el base$ on the e0isting $atabase#

    In that situation& the $ata mo$eler has to re%erse engineer the $atabase an$ create a $ata mo$el#

    1» (et Business re"uirements#

    2» Create 3igh .e%el Conceptual Data Mo$el#

    4» Create .ogical Data Mo$el#5» Select target DBMS where $ata mo$eling tool creates the physical schema#

    6» Create stan$ar$ abbre%iation $ocument accor$ing to business stan$ar$#

    7» Create $omain#8» Create -ntity an$ a$$ $e!initions#

    9» Create attribute an$ a$$ $e!initions#

    :» Base$ on the analysis& try to create surrogate ,eys& super types an$ sub types#1;» ssign $atatype to attribute# I! a $omain is alrea$y present then the attribute shoul$ be

    attache$ to the $omain#

    11» Create primary or uni"ue ,eys to attribute#12» Create chec, constraint or $e!ault to attribute#

    14» Create uni"ue in$e0 or bitmap in$e0 to attribute#15» Create !oreign ,ey relationship between entities#

    16» Create Physical Data Mo$el#16» $$ $atabase properties to physical $ata mo$el#

    17» Create Sor each release )%ersion o! the $ata mo$el*& try to compare the present %ersion with the

     pre%ious %ersion o! the $ata mo$el# Similarly& try to compare the $ata mo$el with the $atabase to

  • 8/19/2019 Data Modeling DW concepts.docx

    3/25

    !in$ out the $i!!erences#

    1:» Create a change log $ocument !or $i!!erences between the current %ersion an$ pre%ious

    %ersion o! the $ata mo$el#

    Con"eptual Data Modeling

    Conceptual $ata mo$el inclu$es all ma'or entities an$ relationships an$ $oes not contain much$etaile$ le%el o! in!ormation about attributes an$ is o!ten use$ in the INITI. P.NNIN(

    P3S-#

    'onceptual data model is created by gathering business requirements from various

    sources lie business documents, discussion with functional teams, business

    analysts, smart management experts and end users who do the reporting on the

    database. Data modelers create conceptual data model and forward that model to

    functional team for their review.

      'D" comprises of entity types and relationships. &he relationships between thesubect areas and the relationship between each entity in a subect area are drawnby symbolic notation(*D$+ or *$). *n a data model, cardinality represents therelationship between two entities. i.e. -ne to one relationship, or one to manyrelationship or many to many relationship between the entities.'D" contains data structures that have not been implemented in the database.

    #ogi"al Data Modeling

    This is the actual implementation an$ e0tension o! a conceptual $ata mo$el# .ogical $ata

    mo$el is the %ersion o! a $ata mo$el that represents the business requirements$entire or part%of an organi&ation an$ is $e%elope$ be!ore the physical $ata mo$el#

    As soon as the conceptual data model is accepted by the functional team,

    development of logical data model gets started. -nce logical data model is

    completed, it is then forwarded to functional teams for review. A sound logical

    design should streamline the physical design process by clearly dening data

    structures and the relationships between them. A good data model is created by

  • 8/19/2019 Data Modeling DW concepts.docx

    4/25

    clearly thining about the current and future business requirements. /ogical data

    model includes all required entities, attributes, key groups, and relationships

    that represent business information and dene business rules.

    *n the example, we have identied the entity names, attribute names, and

    relationship. or detailed explanation, refer to relational data modeling.

    P'ysi"al Data Modeling

    Physical $ata mo$el inclu$es all re"uire$ tables( "olumns( relations'ips( database properties!or the physical implementation o! $atabases# Database per!ormance& in$e0ing strategy& physical

    storage an$ $enormali?ation are important parameters o! a physical mo$el#

    /ogical data model is approved by functional team and there0after development of

    physical data model wor gets started. -nce physical data model is completed, it is

    then forwarded to technical teams(developer, group lead, D!A) for review. &he

    transformations from logical model to physical model include imposing databaserules, implementation of referential integrity, super types and sub types etc.

    *n the example, the entity names have been changed to table names, changed

    attribute names to column names, assigned nulls and not nulls, and datatype to

    each column.

  • 8/19/2019 Data Modeling DW concepts.docx

    5/25

    #ogi"al vs P'ysi"al Data Modeling

    @hen a $ata mo$eler wor,s with the client& his title may be a logical $ata mo$eler or a physical

    $ata mo$eler or combination o! both# logical $ata mo$eler $esigns the $ata mo$el to suit business re"uirements& creates an$ maintains the loo,up $ata& compares the %ersions o! $ata

    mo$el& maintains change log& generate reports !rom $ata mo$el an$ whereas a physical $ata

    mo$eler has to ,now about the source an$ target $atabases properties#

    physical $ata mo$eler shoul$ ,now the technical+,now+how to create $ata mo$els !rom

    e0isting $atabases an$ to tune the $ata mo$els with re!erential integrity& alternate ,eys& in$e0es

    an$ how to match in$e0es to S

  • 8/19/2019 Data Modeling DW concepts.docx

    6/25

    Rule Chec, Constraint& De!ault alue

    Relationship >oreign Aey

    De!inition Comment

    )*tra"t( transform( and load ))T#* in $atabase usage an$ especially in $ata warehousing 

    in%ol%es

    ● -0tracting $ata !rom outsi$e sources

    ● Trans!orming it to !it operational nee$s )which can inclu$e "uality le%els*

    ● .oa$ing it into the en$ target )$atabase or $ata warehouse*

    The a$%antages o! e!!icient an$ consistent $atabases ma,e -T. %ery important as the way $ata

    actually gets loa$e$#

    This article $iscusses -T. in the conte0t o! a $ata warehouse& whereas the term -T. can in !act

    re!er to a process that loa$s any $atabase#

    The typical real+li!e -T. cycle consists o! the !ollowing e0ecution steps

    +. Cycle initiation

    1. Buil$ re!erence $ata

    2. -0tract )!rom sources*

    3. ali$ate

    4. Trans!orm )clean& apply business rules& chec, !or $ata integrity& create aggregates*

    5. Stage )loa$ into staging tables& i! use$*

    6. u$it reports )!or e0ample& on compliance with business rules# lso& in case o!

    !ailure& helps to $iagnoseErepair*

    7. Publish )to target tables*

    8. rchi%e

    +9.Clean up

    +++

    Data ware'ouse is a repository o! an organi?ationFs electronically store$ $ata# Data warehouses

    are $esigne$ to !acilitate reporting an$ analysisG1H#

    This $e!inition o! the $ata warehouse !ocuses on $ata storage# 3owe%er& the means to retrie%e

    an$ analy?e $ata& to e0tract& trans!orm an$ loa$ $ata& an$ to manage the $ata $ictionary are also

    consi$ere$ essential components o! a $ata warehousing system# Many re!erences to $atawarehousing use this broa$er conte0t# Thus& an e0pan$e$ $e!inition !or $ata warehousing

    inclu$es business intelligence tools& tools to e0tract& trans!orm& an$ loa$ $ata into the repository&

    an$ tools to manage an$ retrie%e meta$ata#

    http://en.wikipedia.org/wiki/Databasehttp://warehouse/http://warehouse/http://extraction/http://transformation/http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Repositoryhttp://en.wikipedia.org/wiki/Repositoryhttp://en.wikipedia.org/wiki/Data_Warehouse#cite_note-InmonDefinition-0http://dictionary/http://tools/http://load/http://load/http://en.wikipedia.org/wiki/Metadatahttp://warehouse/http://extraction/http://transformation/http://en.wikipedia.org/wiki/Databasehttp://en.wikipedia.org/wiki/Repositoryhttp://en.wikipedia.org/wiki/Data_Warehouse#cite_note-InmonDefinition-0http://dictionary/http://tools/http://load/http://en.wikipedia.org/wiki/Metadatahttp://en.wikipedia.org/wiki/Database

  • 8/19/2019 Data Modeling DW concepts.docx

    7/25

    Data +are'ousetaging area is t'e pla"e w'ere all transformation( "leansing and enri"'ment is done before

    data "an flow furt'er,

    The Data is e0tracte$ !rom the source system& by %arious metho$s )typically calle$ -0traction*

    an$ is place$ in the normali?e$ !orm into the Staging rea/# Once in the Staging rea& $ata iscleanse$& stan$ar$i?e$ an$ re+!ormatte$ to ma,e to rea$y !or  .oa$ing into the Data+@arehouse

    .oa$e$ area# @e are going to co%er the broa$ $etails here# The $etails o! staging can be re!erre$

    to in Data -0traction an$ Trans!ormation Design in Data @arehouse#

    Staging rea is important not only !or Data @arehousing& bit !or host o! other applications as

    well# There!ore& it has to seen !rom a wi$er perspecti%e# Staging is an area where a saniti?e$&

    integrate$ = $etaile$ $ata in normali?e$ !orm e0ists#

    @ith the a$%ent o! Data @arehouse& the concept o! Trans!ormation has gaine$ groun$& which

     pro%i$es a high $egree o! "uality = uni!ormity to the $ata# The con%entional )pre+$ata

    warehouse* Staging reas use$ to be plain $umps o! the pro$uction $ata# There!ore a Stagingrea with -0traction = Trans!ormation is best o! both the worl$s !or generating "uality

    transaction le%el in!ormation#

    DW vs DataMart 

    De-normali&ed D+- Data +are'ouse vs,

    Data martData @arehouseE Data Mart !orm the saniti?e$ repository o! Data which can be accesse$ !or

    %arious purposes#

    Data +are'ouse Data @arehouse is the area where the in!ormation is loa$e$ in un$er+normali?e$ Dimensional

    Mo$eling !orm# This sub'ect has been $ealt in !air $egree o! $etail in Data @arehousingEMarting 

    section# Data @arehouse is a repository o! $ata& which contains $ata in a un$er+normali?e$$imensional !orm CROSS the enterprise# >ollowing are the !eatures o! a Data @arehouse

    ● Data+@arehouse is the sour"e for most of t'e end user tools !or Datanalysis& Data Mining& an$ strategic planning #

    ● It is suppose$ to be enterprise wide repository an$ open to all possible

    applications o! in!ormation $eli%ery#

    ● It contains uniform ! standard dimensions and measures# The $etails o! this

    can be re!erre$ to Dimensional Mo$eling Concepts#

    http://www.bipminstitute.com/data-warehouse/etl-loading-design.phphttp://www.bipminstitute.com/data-warehouse/etl-loading-design.phphttp://www.bipminstitute.com/data-warehouse/etl-transformation-design.phphttp://www.bipminstitute.com/business-intelligence/data-warehouse-staging.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.phphttp://www.bipminstitute.com/business-intelligence/data-warehouse-mart.phphttp://www.bipminstitute.com/performance-management/strategic-planning.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.phphttp://www.bipminstitute.com/data-warehouse/etl-loading-design.phphttp://www.bipminstitute.com/data-warehouse/etl-transformation-design.phphttp://www.bipminstitute.com/business-intelligence/data-warehouse-staging.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.phphttp://www.bipminstitute.com/business-intelligence/data-warehouse-mart.phphttp://www.bipminstitute.com/performance-management/strategic-planning.phphttp://www.bipminstitute.com/data-warehouse/dimensional-modeling-concepts-schemas.php

  • 8/19/2019 Data Modeling DW concepts.docx

    8/25

    ● It contains 'istori"al as well as "urrent in!ormation# @hereas most o! the

    transaction systems get the in!ormation up$ate$& the $ata warehouse concept is base$

    upon Fa$$ingF the in!ormation# >or e0ample i! a Customer  in a !iel$ system un$ergoes achange in the marital status& the system may contain only the latest marital status#

    3owe%er& a Data @arehouse will ha%e two recor$s containing pre%ious an$ current

    marital status# The time base$ analysis is one o! the most important applications o! a $atawarehouse# The metho$s o! $ine$ this is $etaile$ in special situations in Dimensional

    Mo$eling#

    ● It is offline repository# It is not use$ OR accesse$ online business transaction processing#

    ● It is read-only Data warehouse plat!orm shoul$ not be allowing a write+bac,

     by the users# It is essentially a rea$+only plat!orm# The write bac, !acility is morereser%e$ !or O.P ser%er& which sits between the Data @arehouse an$ -n$+user

     plat!orm#

    ● It contains only the a"tuals data This is lin,e$ to Frea$+onlyF# s a best practice&

    all the non+actual $ata )li,e stan$ar$s& !uture pro'ections& what+i! scenarios* shoul$ bemanage$ an$ maintaine$ in O.P an$ -n$+user tools 

    Data Marts

    Data Marts are a smaller an$ speci!ic purpose oriente$ $ata warehouse# Data @arehouse is a biga strategic plat!orm& which nee$s consi$erable planning# The $i!!erence in Data @arehouse an$

    Data Marts is li,e that o! planning a city %s# planning a township# Data @arehouse is a me$ium+

    long term e!!ort to integrate an$ create single point system o! recor$ !or %irtually all applications

    an$ nee$s !or $ata# Data mart is a short to me$ium term e!!ort to buil$ a repository !or a speci!icanalysis# The $i!!erences between a Data +are'ouse vs, Data mart are as !ollows

    Data @arehouse Data Mart

    "ope ! Appli"ation 

     Application Independent

    Data @arehouse is single point

    repository an$ its $ata can be

    use$ !or any !oreseeableapplication

     

    Specific Application

    Data+Mart is create$ out o! a speci!ic

     purpose# This means that you will ha%e a

    $ata mart create$ to analy?e customer%alue# This means that the $esigner o! the

    $ata+mart is aware that the $ata will be

    use$ !or O.P& what ,in$ o! broa$

    "ueries coul$ be place$#

     Domain Independent

    The Data @arehouse can be use$

    !or any $omain inclu$ing Sales&

    Customer & operations& !inanceetc#

    Specific Domain

    Data+mart is speci!ic to a gi%en

    $omain# Jou will generally not !in$ a

    $ata mart & which ser%es Sales as well asoperations $omain at the same time#

    http://www.bipminstitute.com/analytics-reporting/customer-dimension.phphttp://www.bipminstitute.com/olap/server-write-backs.phphttp://www.bipminstitute.com/data-analysis/what-if.phphttp://www.bipminstitute.com/data-analysis/what-if.phphttp://www.bipminstitute.com/business-intelligence/olap-server-data-warehouse.phphttp://www.bipminstitute.com/business-intelligence/end-user-tools.phphttp://www.bipminstitute.com/sales/distribution-management.phphttp://www.bipminstitute.com/customer/crm-management.phphttp://www.bipminstitute.com/analytics-reporting/customer-dimension.phphttp://www.bipminstitute.com/olap/server-write-backs.phphttp://www.bipminstitute.com/data-analysis/what-if.phphttp://www.bipminstitute.com/business-intelligence/olap-server-data-warehouse.phphttp://www.bipminstitute.com/business-intelligence/end-user-tools.phphttp://www.bipminstitute.com/sales/distribution-management.phphttp://www.bipminstitute.com/customer/crm-management.php

  • 8/19/2019 Data Modeling DW concepts.docx

    9/25

    Centralized Independent

    The control an$ management o!$ata warehouse is centrali?e$#

     Decentralized by User Area

    Typically a $ata+mart is owne$ by aspeci!ic !unctionEsub+!unction#

     Planned

    Data @arehouse is a strategic

    initiati%e& which comes out o! a

     blueprint# It is not an imme$iateresponse to an imme$iate

     problem# It has many !oun$ation

    elements& which cannot be$e%elope$ in an a$+hoc manner#

    >or e0ample the stan$ar$ sets o!

    $imensions = measures#

    Organic, possibly not planned

    Data+Mart is a response to a critical

     business nee$# It is $e%elope$ to pro%i$e

    grati!ication to the users& an$ gi%en that itis owne$ = manage$ at a !unctional

    le%el& it grows with time#

    Data

     Historical, Detailed &

    Summarized

    goo$ $ata warehouse will

    capture the history o!

    transactions by $e!aultK e%en o!there is no imme$iate nee$# This

    is because a $ata+warehousealways tries to be !uture proo!#

    Some istory, detailed and summarized

    ItFs same with Data @arehouse# 3owe%er&the le%el o! history that is capture$ is

    go%erne$ by the business nee$# >or

    e0ample& a $ata warehouse will capturethe changes in the Customer  marital

    status by $e!ault# Data Mart may not $oit& i! Data Mart is create$ to

     pro!ileEsegment a Customer  on the basiso! his spen$ing patterns only#

    our"es

     !any Internal & e"ternalSources

    This is an ob%ious outcome o!

    the Data @arehouse being a

    generic resource# That is also thereason why the staging $esign

    !or a $ata warehouse ta,es much

    more time compare$ to that o! a$ata mart#

     #e$ Internal & %"ternal Sources

    Sel! -0planatory+ limite$ purpose lea$sto limite$ sources#

    http://www.bipminstitute.com/sales/leads-management-overview.phphttp://www.bipminstitute.com/customer/crm-management.phphttp://www.bipminstitute.com/customer/crm-management.phphttp://www.bipminstitute.com/customer/crm-management.phphttp://www.bipminstitute.com/sales/leads-management-overview.phphttp://www.bipminstitute.com/customer/crm-management.php

  • 8/19/2019 Data Modeling DW concepts.docx

    10/25

    #ife Cy"le

    StandAlong Strategic Initiati'e(

    Data @arehouse is an outcome

    o! a companyFs strategy to ma,e$ata an enterprise resource# I!

    there is any other trigger&chances are that it may not

    achie%e its ob'ecti%es

    )ypically part of a *usiness Pro+ect(

    Data Mart comes into being $ue to a

     business nee$# >or e0ample Ris,Port!olio nalysis $ata mart coul$ be a

     part o! -nhancing Ris, ManagementInitiati%e#

     ong life

    Data @arehouse is a long+term

    !oun$ation o! an enterprise#

    Can a'e any life span

    Data Mart starts with a gi%en ob'ecti%e&

    an$ it can ha%e a li!e span ranging !romone year to en$less# This is because some

    applications are core an$ business asusual to an enterprise# The li!e a $ata

    mart coul$ be shortene$& i! a Data@arehouse comes into being#

    Data - Modeling

    +'at is Dimensional Modeling.

    Dimensional Modeling is a design "on"ept used by many data ware'ouse designers to buildt'eir data ware'ouse, /n t'is design model all t'e data is stored in two types of tables -

    0a"ts table and Dimension table, 0a"t table "ontains t'e fa"ts1measurements of t'e business

    and t'e dimension table "ontains t'e "onte*t of measurements i,e,( t'e dimensions on w'i"'

    t'e fa"ts are "al"ulated,

    +'at is t'e Differen"e between O#TP and O#AP.

    Main Di!!erences between O.TP an$ O.P are+

    1# ser an$ System Orientation

    O.TP customer+oriente$& use$ !or $ata analysis an$ "uerying by cler,s& clients an$ IT

     pro!essionals#

    O.P mar,et+oriente$& use$ !or $ata analysis by ,nowle$ge wor,ers) managers& e0ecuti%es&

    analysis*#

    2# Data Contents

    O.TP manages current $ata& %ery $etail+oriente$#

  • 8/19/2019 Data Modeling DW concepts.docx

    11/25

    O.P manages large amounts o! historical $ata& pro%i$es !acilities !or summari?ation an$

    aggregation& stores in!ormation at $i!!erent le%els o! granularity to support $ecision ma,ing process#

    4# Database Design

    O.TP a$opts an entity relationship)-R* mo$el an$ an application+oriente$ $atabase $esign#

    O.P a$opts star& snow!la,e or !act constellation mo$el an$ a sub'ect+oriente$ $atabase $esign#

    5# iew

    O.TP !ocuses on the current $ata within an enterprise or $epartment#

    O.P spans multiple %ersions o! a $atabase schema $ue to the e%olutionary process o! anorgani?ationK integrates in!ormation !rom many organi?ational locations an$ $ata stores

    )T# tools are use$ to e0tract& trans!ormation an$ loa$ing the $ata into $ata

    warehouse E $ata mart

     

    O#AP tools are use$ to create cubesEreports !or business analysis !rom $ata

    warehouse E $ata mart

    $&/ tool is used to extract the data and to perform the operation as per our needs

    for eg : informatica data mart but -/A; is completely diThe Dimension Attributes are the various columns in a dimension table.For example , attributes in a P!D"#T dimension can be product category,product type etc. $enerally the Dimension Attributes are used in %uery&lter condition and to display other related in'ormation about andimension.

    (hat is a surrogate key)A surrogate key is a substitution 'or the natural primary key. *t is a uni%ueidenti&er or number + normally created by a database se%uencegenerator 'or each record o' a dimension table that can be used 'or theprimary key to the table.

    A surrogate key is use'ul because natural keys may change.

  • 8/19/2019 Data Modeling DW concepts.docx

    12/25

    (hat is *)!usiness *ntelligence is a term introduced by ?oward Dresner of @artner @roup in+878. ?e described !usiness *ntelligence as a set of concepts and methodologies toimprove decision maing in business through use of facts and fact based systems.

    (hat is aggregation)*n a data warehouse paradigm aggregation is one way of improving queryperformance. An aggregate fact table is a new table created o< of an existing facttable by summing up facts for a set of associated dimension. @rain of an aggregatefact is higher than the fact table. Aggreagate tables contain fewer rows thus maingquesries run faster.

    (hat are the dierent approaches 'or making a Data/arehouse) &his is a generic question: rom a business perspective, it is very important to rstget clarity on the end user requirements and a system study before commencingany Data warehousing proect. rom a technical perspective, it is important to rst

    understand the dimensions and measures, determine quality and structure ofsource data from the -/&; systems and then decide which dimensional model toapply, i.e. whether we do a star or snowBae or a combination of both. rom aconceptual perspective, we can either go the Calph imball method (build datamarts and then consolidate at the end to form an enterprise Data warehouse) or the!ill *nmon method (build a large Data warehouse and derive data marts from thesame. *n order to decide on the method, a strong understanding of the businessrequirement and data structure is needed as also consensus with the customer.

    (hat is staging area)#taging area is also called E-perational Data #toreF (-D#). *t is a data holding place

    where the data which is extracted from all the data sources are stored. rom the#taging area, data is loaded to the data warehouse. Data cleansing taes place inthis stage.

    (hat is the dierence bet/een star and sno/0ake schema) &he main di

  • 8/19/2019 Data Modeling DW concepts.docx

    13/25

    data warehouse design methodology the data warehouse is created from the unionof organizational data marts.

    2ormali&ed versus dimensional approa"' for storage of data

    There are two lea$ing approaches to storing $ata in a $ata warehouse + the $imensional approachan$ the normali?e$ approach#

    In the $imensional approach& transaction $ata are partitione$ into either L!actsL& which aregenerally numeric transaction $ata& or L$imensionsL& which are the re!erence in!ormation that

    gi%es conte0t to the !acts# >or e0ample& a sales transaction can be bro,en up into !acts such as the

    number o! pro$ucts or$ere$ an$ the price pai$ !or the pro$ucts& an$ into $imensions such asor$er $ate& customer name& pro$uct number& or$er ship+to an$ bill+to locations& an$ salesperson

    responsible !or recei%ing the or$er# ,ey a$%antage o! a $imensional approach is that the $ata

    warehouse is easier !or the user to un$erstan$ an$ to use# lso& the retrie%al o! $ata !rom the $ata

    warehouse ten$s to operate %ery "uic,ly# The main $isa$%antages o! the $imensional approachare 1* In or$er to maintain the integrity o! !acts an$ $imensions& loa$ing the $ata warehouse

    with $ata !rom $i!!erent operational systems is complicate$& an$ 2* It is $i!!icult to mo$i!y the$ata warehouse structure i! the organi?ation a$opting the $imensional approach changes the wayin which it $oes business#

    In the normali?e$ approach& the $ata in the $ata warehouse are store$ !ollowing& to a $egree&

    $atabase normali?ation rules# Tables are groupe$ together by sub3e"t areas that re!lect general$ata categories )e#g#& $ata on customers& pro$ucts& !inance& etc#* The main a$%antage o! this

    approach is that it is straight!orwar$ to a$$ in!ormation into the $atabase# $isa$%antage o! this

    approach is that& because o! the number o! tables in%ol%e$& it can be $i!!icult !or users both to 1* 'oin $ata !rom $i!!erent sources into meaning!ul in!ormation an$ then 2* access the in!ormation

    without a precise un$erstan$ing o! the sources o! $ata an$ o! the $ata structure o! the $ata

    warehouse#

    These approaches are not mutually e0clusi%e# Dimensional approaches can in%ol%e normali?ing

    $ata to a $egree#

    Benefits of data ware'ousing

    Some o! the bene!its that a $ata warehouse pro%i$es are as !ollows G8HG9H

    ● $ata warehouse pro%i$es a common $ata mo$el !or all $ata o! interestregar$less o! the $ataFs source# This ma,es it easier to report an$ analy?e in!ormation

    than it woul$ be i! multiple $ata mo$els were use$ to retrie%e in!ormation such as sales

    in%oices& or$er receipts& general le$ger charges& etc#

    ● Prior to loa$ing $ata into the $ata warehouse& inconsistencies are i$enti!ie$ an$

    resol%e$# This greatly simpli!ies reporting an$ analysis#

    ● In!ormation in the $ata warehouse is un$er the control o! $ata warehouse users so

    that& e%en i! the source system $ata is purge$ o%er time& the in!ormation in the warehouse

    can be store$ sa!ely !or e0ten$e$ perio$s o! time#

    http://normalization/http://structure/http://structure/http://en.wikipedia.org/wiki/Data_Warehouse#cite_note-WHIPSAdvantages-6http://en.wikipedia.org/wiki/Data_Warehouse#cite_note-ISBN9789726184799-7http://normalization/http://structure/http://en.wikipedia.org/wiki/Data_Warehouse#cite_note-WHIPSAdvantages-6http://en.wikipedia.org/wiki/Data_Warehouse#cite_note-ISBN9789726184799-7

  • 8/19/2019 Data Modeling DW concepts.docx

    14/25

    ● Because they are separate !rom operational systems& $ata warehouses pro%i$e

    retrie%al o! $ata without slowing $own operational systems#

    ● Data warehouses can wor, in con'unction with an$& hence& enhance the %alue o!

    operational business applications& notably customer relationship management )CRM*

    systems#

    ● Data warehouses !acilitate $ecision support system applications such as tren$

    reports )e#g#& the items with the most sales in a particular area within the last two years*&

    e0ception reports& an$ reports that show actual per!ormance %ersus goals#

    Cube can )an$ arguably shoul$* mean something "uite speci!ic + O.P arti!acts presente$

    through an O.P ser%er  such as MS nalysis Ser%ices or Oracle )nee 3yperion* -ssbase#

    3owe%er& it also gets use$ much more loosely# O.P cubes o! this sort use cube+aware "uerytools which use a $i!!erent PI to a stan$ar$ relational $atabase# Typically O.P ser%ers

    maintain their own optimi?e$ $ata structures ),nown as MO.P*& although they can beimplemente$ as a !ront+en$ to a relational $ata source ),nown as RO.P* or in %arious hybri$mo$es ),nown as 3O.P*

    I try to be speci!ic an$ use FcubeF speci!ically to re!er to cubes on O.P ser%ers such as SSS#

    Business Ob'ects wor,s by "uerying $ata through one or more sources )which coul$ be relational

    $atabases& O.P cubes& or !lat !iles* an$ creating an in+memory $ata structure calle$ a

    MicroCube which it uses to support interacti%e slice+an$+$ice acti%ities# nalysis Ser%ices an$MS

  • 8/19/2019 Data Modeling DW concepts.docx

    15/25

    cube& on the other han$& ten$s to imply that $ata is presente$ using a multi+$imensional

    nomenclature )typically an O.P technology* an$ that the $ata is generally summari?e$ as

    intersections o! multiple hierarchies# )i#e# the net worth o! your !amily %s# your personal networth an$ e%erything in between* (enerally& cube implies something %ery speci!ic whereas

    $ata mart ten$s to be a little more general#

    I suppose in OOP spea, you coul$ accurately say that a $ata mart has+a cube& has+arelational $atabase& has+a ni!ty reporting inter!ace& etc but it woul$ be less correct to say that

    any one o! those in$i%i$ually is+a $ata mart# The term $ata mart is more inclusi%e#

    Figure 1-1 Contrasting OLTP and Data Warehousing Environments

    online transaction processing (-/&;)

    Online transaction processing# O.TP systems are optimi?e$ !or !ast an$ reliable transactionhan$ling# Compare$ to $ata warehouse systems& most O.TP interactions will in%ol%e a relati%elysmall number o! rows& but a larger group o! tables#

  • 8/19/2019 Data Modeling DW concepts.docx

    16/25

    This illustrates !i%e things

    ● Data Sources )operational systems an$ !lat !iles*

    ● Staging rea )where $ata sources go be!ore the warehouse*

    ● @arehouse )meta$ata& summary $ata& an$ raw $ata*

    ● Data Marts )purchasing& sales& an$ in%entory*

    ● sers )analysis& reporting& an$ mining*

    !1AP and Data Mining

    *n large data warehouse environments, many di

  • 8/19/2019 Data Modeling DW concepts.docx

    17/25

    4ow "an P#15# be best used for t'e )T# pro"ess. 

    P.ES

  • 8/19/2019 Data Modeling DW concepts.docx

    18/25

    So I thin, that P.ES

  • 8/19/2019 Data Modeling DW concepts.docx

    19/25

    with a JDesignerJ piece, where the data warehouse administrator can specify the relationshipbetween the relational tables, as well as how dimensions, attributes, and hierarchies map tothe underlying database tables.

    Cight now, there is a convergence between the traditional C-/A; and "-/A; vendors.C-/A; vendor recognize that users want their reports fast, so they are implementing "-/A;functionalities in their toolsM "-/A; vendors recognize that many times it is necessary to

    drill down to the most detail level information, levels where the traditional cubes do not getto for performance and size reasons.

    #o what are the criteria for evaluating -/A; vendors> ?ere they are:

    Ability to leverage parallelism supplied by DM5 and hard/are: &his wouldgreatly increase the toolJs performance, and help loading the data into the cubes as quiclyas possible.Per'ormance: *n addition to leveraging parallelism, the tool itself should be quic both in

    terms of loading the data into the cube and reading the data from the cube.#ustomi6ation eorts: "ore and more, -/A; tools are used as an advanced reporting

    tool. &his is because in many cases, especially for C-/A; implementations, -/A; tools oftencan be used as a reporting tool. *n such cases, the ease of front0end customization becomesan important factor in the tool selection process.

    5ecurity Features: !ecause -/A; tools are geared towards a number of users, maingsure people see only what they are supposed to see is important. !y and large, allestablished -/A; tools have a security layer that can interact with the common corporatelogin protocols. &here are, however, cases where large corporations have developed theirown user authentication mechanism and have a single sign0on policy. or these cases,having a seamless integration between the tool and the in0house authentication can requiresome wor. * would recommend that you have the tool vendor team come in and mae surethat the two are compatible.Metadata support: !ecause -/A; tools aggregates the data into the cube and

    sometimes serves as the front0end tool, it is essential that it wors with the metadatastrategyNtool you have selected.

    Popular Tools !usiness -bects

    ● 'ognos

    ● ?yperion

    ● "icrosoft Analysis #ervices

    ● "icro#trategy

    #onceptual, 1ogical, And Physical Data Models &here are three levels of data modeling. &hey are conceptual, logical, and physical. &hissection will explain the di

  • 8/19/2019 Data Modeling DW concepts.docx

    20/25

    ● All attributes for each entity are specied.

    ●  &he primary ey for each entity specied.

    ● oreign eys (eys identifying the relationship between di

  • 8/19/2019 Data Modeling DW concepts.docx

    21/25

    7ierarchy: &he specication of levels that represents relationship between di

  • 8/19/2019 Data Modeling DW concepts.docx

    22/25

    Dimensional Model: A type of data modeling suited for data warehousing. *n a dimensionalmodel, there are two types of tables: dimensional tables and fact tables. Dimensional tablerecords information on each dimension, and fact table records all the fact, or measures.

    Dimensional Table: Dimension tables store records related to this particular dimension. Oofacts are stored in a dimensional table.

    8T1: #tands for $xtraction, &ransformation, and /oading. &he movement of data from onearea to another.

    Fact Table: A type of table in the dimensional model. A fact table typically includes twotypes of columns: fact columns and foreign eys to the dimensions.

    7ierarchy: A hierarchy denes the navigating path for drilling up and drilling down. Allattributes in a hierarchy belong to the same dimension.

    Metadata: Data about data. or example, the number of tables in the database is a type ofmetadata.

    Metric: A measured value. or example, total sales is a metric.

    M!1AP: "ultidimensional -/A;. "-/A; systems store data in the multidimensional cubes.

    !1AP: -n0/ine Analytical ;rocessing. -/A; should be designed to provide end users a quic

    way of slicing and dicing the data.!1AP: Celational -/A;. C-/A; systems store data in the relational database.

    5no/0ake 5chema: A common form of dimensional model. *n a snowBae schema,di

  • 8/19/2019 Data Modeling DW concepts.docx

    23/25

    *n data warehousing proect, the logical data model is built based on user requirements, andthen it is translated into the physical data model. &he detailed steps can be found in the'onceptual, /ogical, and ;hysical Data "odeling section.

    ;art of the data modeling exercise is often the identication of data sources. #ometimes thisstep is deferred until the $&/ step. ?owever, my feeling is that it is better to nd out wherethe data exists, or, better yet, whether they even exist anywhere in the enterprise at all.

    #hould the data not be available, this is a good time to raise the alarm. *f this was delayeduntil the $&/ phase, rectifying it will becoming a much tougher and more complex process.

     &he 8T1 ($xtraction, &ransformation, /oading) process typically taes the longest to develop,and this can easily tae up to 49R of the data warehouse implementation cycle or longer.

     &he reason for this is that it taes time to get the source data, understand the necessarycolumns, understand the business rules, and understand the logical and physical datamodels.

    Possible Pitfalls 

     &here is a tendency to give this particular phase too little development time. &his can provesuicidal to the proect because end users will usually tolerate less formatting, longer time to

    run reports, less functionality (slicing and dicing), or fewer delivered reportsM one thing thatthey will not tolerate is wrong information.

    A second common problem is that some people mae the $&/ process more complicatedthan necessary. *n $&/ design, the primary goal should be to optimize load speed withoutsacricing on quality. &his is, however, sometimes not followed. &here are cases where thedesign goal is to cover all possible future uses, whether they are practical or ust a gmentof someoneJs imagination. =hen this happens, $&/ performance su

  • 8/19/2019 Data Modeling DW concepts.docx

    24/25

    -nce the development team declares that everything is ready for further testing, the IAteam taes over. &he IA team is always from the client. %sually the IA team members willnow little about data warehousing, and some of them may even resent the need to have tolearn another tool or tools. &his maes the IA process a tricy one.

    #ometimes the IA process is overlooed. -n my very rst data warehousing proect, theproect team wored very hard to get everything ready for ;hase +, and everyone thought

    that we had met the deadline. &here was one mistae, though, the proect managers failedto recognize that it is necessary to go through the client IA process before the proect cango into production. As a result, it too ve extra months to bring the proect to production(the original development time had been only 1 +N1

    *n the -/A; world, there are mainly two di

  • 8/19/2019 Data Modeling DW concepts.docx

    25/25

    ● /imited by #I/ functionalities: !ecause C-/A; technology mainly relies ongenerating #I/ statements to query the relational database, and #I/ statements donot t all needs (for example, it is diLcult to perform complex calculations using#I/), C-/A; technologies are therefore traditionally limited by what #I/ can do.C-/A; vendors have mitigated this ris by building into the tool out0of0the0boxcomplex functions as well as the ability to allow users to dene their own functions.

    7!1AP 

    ?-/A; technologies attempt to combine the advantages of "-/A; and C-/A;. orsummary0type information, ?-/A; leverages cube technology for faster performance. =hendetail information is needed, ?-/A; can drill through from the cube into the underlyingrelational data.

    !1AP : !nline Analytical Processing : Tools

    O.P )online analytical processing* is a !unction o! business intelligence so!tware that enables a

    user to easily an$ selecti%ely e0tract an$ %iew $ata !rom $i!!erent points o! %iew# Designe$ !or

    managers loo,ing to ma,e sense o! their in!ormation& O.P tools structure $ata hierarchically

    the way managers thin, o! their enterprises& but also allows business analysts to rotate that $ata&

    changing the relationships to get more $etaile$ insight into corporate in!ormation#@eb>OCS O.P combines all the !unctionality o! "uery tools& reporting tools& an$ O.P into

    a single power!ul solution with one common inter!ace so business analysts can slice an$ $ice the$ata an$ see business processes in a new way# @eb>OCS ma,es $ata part o! an organi?ationFs

    natural culture by gi%ing $e%elopers the premier $esign en%ironments !or automate$ a$ hoc an$

     parameter+$ri%en reporting an$ gi%ing e%eryone else the ability to recei%e an$ retrie%e $ata inany !ormat& per!orming analysis using whate%er $e%ice or application is part o! the $aily wor,ing

    li!e#

    @eb>OCS a$ hoc reporting an$ O.P !eatures allow users to slice an$ $ice $ata in an almostunlimite$ number o! ways# Satis!ying the broa$est range o! analytical nee$s& business

    intelligence application $e%elopers can easily enhance reports with e0tensi%e $ata+analysis

    !unctionality so that en$ users can $ynamically interact with the in!ormation# @eb>OCS alsosupports the real+time creation o! -0cel sprea$sheets an$ -0cel Pi%otTables with !ull styling&

    $rill+$owns& an$ !ormula capabilities so that -0cel power users can analy?e their corporate $ata

    in a tool with which they are alrea$y !amiliar#

    !usiness intelligence (!*) tools empower organizations to facilitate improvedbusiness decisions. !* tools enable users throughout the extended enterprise notonly to access company information but also to report and analyze that critical datain an eLcient and intuitive manner. *tJs is not ust about delivering reports from adata warehouseM itJs about providing large numbers of people S executives,analysts, customers, partners, and everyone else S secure and simple access to theright information so they can mae better decisions. &he best !* tools allow

    employees to enhance their productivity while maintaining a high degree of self0suLciency.

    http://www.informationbuilders.com/products/webfocus/index.htmlhttp://www.informationbuilders.com/products/webfocus/ad_hoc.htmlhttp://www.informationbuilders.com/products/webfocus/index.htmlhttp://www.informationbuilders.com/products/webfocus/ad_hoc.html