All About Data Warehouse

Embed Size (px)

Citation preview

  • 7/23/2019 All About Data Warehouse

    1/22

    All about Data Warehouse

    Project/Service Identifier: Data warehouse boot up session

    Prepared By: Internal Project Team

    Document Version: 0.1

    Published Date:

  • 7/23/2019 All About Data Warehouse

    2/22

    evision !istory

    Version

    "umber

    evision Date Summary of #han$es %odified by

    0.1 26/02/2013 Initial Draft Mamta Bhanwar

    #ontributors

    "ame &opics '(ned

    Mamta hanwar Intro!uction to D"# $oals of a !ata warehouse# %omponents of D"# D"&lifec'cle# (acts# Dimensions# Dimensional Mo!elin)

    *aurabh *ri+asta+a %on+ersion from ,- to DM# (

    *aurabh *harma oa!in) techniues of !imension an! fact tables# Dimensional mo!elin)

    phases

  • 7/23/2019 All About Data Warehouse

    3/22

    Contents

    1 INTRODUCTION....................................................................................................................................................... .......... .. 4

    2 GOALS OF A DATA WAREHOUSE.......................................................................................................................................4

    3 COMPONENTS OF DWH...................................................................................................................................................... 5

    3.1 SOURCESYSTEMS..........................................................................................................................................5

    3.2 DATASTAGINGAREA.....................................................................................................................................5

    3.3 DATAPRESENTATIONAREA...........................................................................................................................6

    3.4 BUSINESSINTELLIGENCETOOLS...................................................................................................................6

    4 DWH LIFECYCLE.................................................................................................................................................................. 7

    4.1 PROJECTPLANNING........................................................................................................................................7

    4.2 BUSINESSREQUIREMENTDEFINITION:...........................................................................................................7

    4.3 TECHNICALARCHITECTURE DESIGN:.............................................................................................................

    4.4 PRODUCTSELECTIONANDINSTALLATION:....................................................................................................

    4.5 DIMENSIONALMODELLING:...........................................................................................................................!

    4.6 PHYSICALDESIGN:.........................................................................................................................................!4.7 DATASTAGINGDESIGNANDDE"ELOPMENT:...............................................................................................1#

    4. DEPLOYMENT...............................................................................................................................................11

    4.! MAINTENANCEGRO$TH.............................................................................................................................11

    4.1# ANALYTICAPPLICATIONSPECIFICATION......................................................................................................11

    4.11 ANALYTICAPPLICATIONDE"ELOPMENT.....................................................................................................11

    5 NORMALIZATION..................................................................................................................................................... .......... 11

    6 FACTS................................................................................................................................................................................ . 12

    6.1 TYPESOFFACTS...........................................................................................................................................12

    7 DIMENSIONS.................................................................................................................................................................... .. 12

    7.1 TYPESOFDIMENSION...................................................................................................................................12

    8 DIMENSIONAL MODELING................................................................................................................................................ 13

    .1 IMPORTANCEOFDIMENSIONALMODELING.................................................................................................14

    .2 TYPESOFDIMENSIONALMODELS................................................................................................................14

    9 LOADING TECHNIUES OF DIMENSION AND FACT TA!LES.................................................................................... ... 14

    1" DIMENSIONAL MODELLING PHASES.............................................................................................................................. 15

    11 CON#ERSION FROM ER $ DM.......................................................................................................................................... 15

    12 FA...................................................................................................................................................................................... 17

    13 REFERENCES...................................................................................................................................................... .......... .... 19

  • 7/23/2019 All About Data Warehouse

    4/22

    1 Introduction

    A data warehouse is a subject-oriented, integrated, time-variant and non-volatile

    collection of data in support of management's decision making process.

    Subject-Oriented:A data warehouse can be used to analyze a particularsubject area. For eample, !sales! can be a particular subject.

    Integrated:A data warehouse integrates data from multiple data sources.For eample, source A and source " may have di#erent ways of identifying aproduct, but in a data warehouse, there will be only a single way ofidentifying a product.

    Time-Variant:$istorical data is kept in a data warehouse. For eample, onecan retrieve data from % months, & months, ( months, or even older datafrom a data warehouse. )his contrasts with a transactions system, whereoften only the most recent data is kept. For eample, a transaction systemmay hold the most recent address of a customer, where a data warehousecan hold all addresses associated with a customer.

    Non-volatile:*nce data is in the data warehouse, it will not change. +o,historical data in a data warehouse should never be altered.

    2 Goals of a Data Warehouse

    asily accessible with minimal uery retrieval time and must beunderstandable

    Assembled, onsistent, cleaned and uality assured

    "e adaptive and resilience to change, do not invalidate the eisting data

    "e secure

    +erve as the foundation for improved business decision making

  • 7/23/2019 All About Data Warehouse

    5/22

    3 Components of DWH

    Figure 1 Components of Data Warehouse

    3.1 S%&'() S*+,)-+

    )he source systems for a data warehouse are typically transaction processing

    applications.

    )he source systems maintain little historical data.

    /ot optimized for reporting and thus di0cult to share the data.

    /o control over the ontent and format of the data.

    1ain priorities are processing performance and availability.

    3.2 D, S,/0/ A'))he data staging area of the data warehouse is both a storage area and a set of

    processes commonly referred to as etract-transformation-load 2)34. /o direct

    uerying is allowed to this component.

    Extraction:tracting means reading and understanding the source dataand copying the data needed for the data warehouse into the staging area forfurther manipulation. )he source systems might be very comple and poorly

  • 7/23/2019 All About Data Warehouse

    6/22

    documented, and thus determining which data needs to be etracted can bedi0cult. )he data has to be etracted normally not only once, but severaltimes in a periodic manner to supply all changed data to the warehouse andkeep it up-to-date. 1oreover, the source system typically cannot be modi5ed,nor can its performance or availability be adjusted, to accommodate theneeds of the data warehouse etraction process.

    Transformation:*nce the data is etracted to the staging area, there arenumerous potential transformations, such as cleansing the data 2correctingmisspellings, resolving domain con6icts, dealing with missing elements, orparsing into standard formats4, combining data from multiple sources,deduplicating data, and assigning warehouse keys. )hese transformations areall precursors to loading the data into the data warehouse presentation area.

    Loading:)his is the 5nal step of the )3 process which usually takes theform of presenting the uality-assured dimensional tables to the bulk loadingfacilities of each data mart.

    3.3 D, P')+),,0% A')

    )he data presentation area is where data is organized, stored, and made available

    for direct uerying by users, report writers, and other analytical applications. 7e

    typically refer to the presentation area as a series of integrated data marts.

    Following are the considerations which should be kept in mind for this area8

    )he data must be presented, stored, and accessed in dimensional schemas.

    )hey must contain detailed, atomic data. Atomic data is reuired to withstandassaults from unpredictable ad hoc user ueries. 7hile the data marts alsomay contain performance-enhancing summary data, or aggregates, it is notsu0cient to deliver these summaries without the underlying granular data in

    a dimensional form. 9imensional modeling emphasizes simplicity anduery performance.

    All the data marts must be built using common dimensions and facts, whichwe refer to as conformed which is the basis of data warehouse busarchitecture.

    3.4 !&+0)++ I,)0/)() T%%+

    9ata access tools uery the data in the data warehouse:s presentation area. A dataaccess tool can be as simple as an ad hoc uery tool or as comple as asophisticated data mining or modelling application. Ad hoc uery tools, as powerfulas they are, can be understood and used e#ectively only by a small percentage of

    the potential data warehouse business user population. )he majority of the businessuser base likely will access the data via prebuilt parameter-driven analyticapplications.

  • 7/23/2019 All About Data Warehouse

    7/22

    4 DWH lifecycle

    Figure 2 Data Warehouse Life Cycle

    4.1 P'%)(, 0/

    Assess the organization's readiness for a 97 initiative

    stablish the preliminary scope and justi5cation

    *btain resources

    3aunch the project

    4.2 !&+0)++ ')&0')-), D)00,0%

    ;nderstand the needs of business and translate them into designconsiderations

  • 7/23/2019 All About Data Warehouse

    8/22

    Figure 3 rioriti!ation "uadrant #nalysis

    4.3 T)(0( '(0,)(,&') )+0/

    >t is a blue print for the warehouse:s technical services and elements.

    +erve as an organizing framework to support integration of multipletechnologies.

    >t consists of a series of models that delve into greater detail regarding eachcomponent.

    $elps in identifying problems and support the co-ordination of parallel e#orts

    while speeding development through the reuse of modular components. ight-step process

    o stablish an architecture task force

    o ollect Architecture-related reuirements using purely technologyfocused sessions

    o 9ocument architecture reuirements

    o 9evelop a high level architectural model

    o 9esign and specify the subsystems considering security reuirementsas well as physical infrastructure and con5guration needs

    o 9etermine architecture implementation phases

    o 9ocument the technical architecture

    o

  • 7/23/2019 All About Data Warehouse

    9/22

    ;nderstand the corporate purchasing process

    9evelop a product evaluation matri

    onduct market research through internet, industry publications, colleagues,conferences, vendors and analysts

    /arrow options to a short list and perform detail evaluations

    onduct prototype, install on trial and negotiate

    4.5 D0-)+0% -%)0/

    Following steps are taken in this phase8

    7e generate an impressive list of potential dimensions and then mark theintersections.

    9ata analysis is done to evaluate granularity, historical consistency, validvalues, and attribute availability

    onduct design workshops to create the dimensional schema. n addition we recommend a "-tree inde onhigh-cardinality attribute columns used for constraints. "it-mapped indeesshould be placed on all medium and low cardinality attributes. )he primarykey of the fact table is almost always a subset of the foreign keys. +ingle,concatenated inde on the primary dimensions of fact table. +ince manydimensional ueries are constrained on the date dimension, the date foreignkey should be the leading inde term. >n addition, having the date key in the

  • 7/23/2019 All About Data Warehouse

    10/22

    5rst position speeds the data loading process where incremental data isclumped by date. +ince most optimizers now permit more than one inde tobe used at the same time in resolving a uery, we can build separate indeeson the other independent dimension foreign keys in the fact table. 1uch lessfreuently, indees are placed on the facts if they are used for range orbanding constraints

    4.7 D, +,/0/ )+0/ ):)%-),

    )3 processes are designed and developed in this phase.

    "imension table staging: +ince dimensions need to conform and bereused across dimensional models, typically they are responsibility of a morecentralized authority responsible for de5ning, maintaining and publishing aparticular dimension for the appropriate data marts. Following steps areeecuted8

    o tract dimensional data from operational source system.

    o leanse attribute values.

    o 1anage surrogate key assignments

    Figure 4 Dimension ta$le surrogate %ey management

    o "uild dimension row load images and publish revised dimension

    #act table staging:Following steps are eecuted8

    o

    tract fact data from operational source system.o

  • 7/23/2019 All About Data Warehouse

    11/22

    o ?uality assure the fact table data

    o onstruct or update aggregation data

    o "ulk load the data

    o Alert the users

    4.8 D)%*-),

    )he technology, data, and analytic application tracks converge at deployment.

    ;nfortunately, this convergence does not happen naturally but reuires substantial

    preplanning. +upport is often organized into a two-tier structure@the 5rst line of

    epertise resides within the business area, whereas centralized support provides a

    secondary line of defence. )he alpha test phase consists of the core project team

    performing an end-to-end system test. As with any system test, you:re bound to

    encounter problems, so make sure there:s adeuate time in the schedule for the

    inevitable rework. 7ith the beta test, we involve a limited set of business users to

    perform a user acceptance test, especially as it applies to the business relevance

    and uality of the warehouse deliverables. Finally, the data warehouse is releasedfor general availability.

    4.9 M0,)() G'%;,

    Su$$ort: ;ser support is crucial immediately following the deployment inorder to ensure that the business community gets hooked.

    Education: 7e need to provide a continuing education program for the datawarehouse which should include formal refresher and advanced courses, aswell as repeat the introductory courses.

    Tec%nical su$$ort: )echnical support should proactively monitor

    performance and system capacity trends to identify the performance issues &rogram su$$ort: /eed to continue monitoring progress against the

    agreed-on success criteria. *ngoing checkpoint reviews are a key tool toassess and identify opportunities for improvement with prior deliverables.

    4.1" A*,0( A0(,0% S)(00(,0%

    "efore we start designing the initial applications, it is helpful to establishstandards for the applications such as common pull-down many andconsistent output look and feel.

    ;sing the standards, we specify each application template, capturingsu0cient information about the layout, input variables, calculations and

    breaks so that both the business representatives and application developershare common understanding.

    >dentify structured navigational paths to access the applications

    4.11 A*,0( A0(,0% D):)%-),

    Focus on standards for naming conventions, calculations libraries and coding.

    9evelopers armed with a robust data access tool uickly will 5nd needling problems

  • 7/23/2019 All About Data Warehouse

    12/22

    in the data haystack despite the uality assurance performed by the staging

    application.

    & 'ormali!ationhttp8oracle-surya.blogspot.in(B(BCnormalization-with-eample.html

    ( FactsA fact is a value or measurement, which represents a fact about the managed entity

    or system. >t represents the business measure.

    6.1 T*)+ % (,+

    Additive:Additive facts are those which can be added aggregated for all

    dimensions. For eample, sales amount

    Semi- Additive:an be added aggregated for some of the dimensions. Foreample, >nventory management.

    Non-Additive:annot be added aggregated across any dimension. Foreample, ount, Averages, Age.

    ) Dimensions9imensions provide structured labelling information to otherwise unordered numeric

    measures. )he dimension is a data set composed of individual, non-overlapping

    data elements. )he primary functions of dimensions are threefold8 to provide

    5ltering, grouping and labelling.

    )hese functions are often described as !slice and dice!. +licing refers to 5ltering

    data. 9icing refers to grouping data. A common data warehouse eample involves

    sales as the measure, with customer and product as dimensions. >n each sale a

    customer buys a product. )he data can be sliced by removing all customers ecept

    for a group under study, and then diced by grouping by product.

    7.1 T*)+ % 0-)+0%

    "egenerated dimension8 A degenerate dimension is a key, such as atransaction number, invoice number, ticket number, or bill-of-lading number,that has no attributes and hence does not join to an actual dimension table.9egenerate dimensions are very common when the grain of a fact tablerepresents a single transaction item or line item because the degeneratedimension represents the uniue identi5er of the parent. 9egeneratedimensions often play an integral role in the fact table's primary key.

    Conformed dimension:A conformed dimension is a set of data attributesthat have been physically implemented in multiple database tables using the

    http://oracle-surya.blogspot.in/2012/05/normalization-with-example.htmlhttp://oracle-surya.blogspot.in/2012/05/normalization-with-example.html
  • 7/23/2019 All About Data Warehouse

    13/22

    same structure, attributes, domain values, de5nitions and concepts in eachimplementation. A conformed dimension cuts across many facts.9imensions are conformed when they are either eactly the same 2includingkeys4 or one is a perfect subset of the other. 1ost important, the row headersproduced in the answer sets from two di#erent conformed dimensions mustbe able to match perfectly.

    onformed dimensions are either identical or strict mathematical subsets ofthe most granular, detailed dimension. 9imension tables are not conformed ifthe attributes are labeled di#erently or contain di#erent values. onformeddimensions come in several di#erent 6avors. At the most basic level,conformed dimensions mean eactly the same thing with every possible facttable to which they are joined. )he date dimension table connected to thesales facts is identical to the date dimension connected to the inventoryfacts.

    'un( "imension:A junk dimension is a convenient grouping of typically low-cardinality 6ags and indicators. "y creating an abstract dimension, these6ags and indicators are removed from the fact table while placing them into a

    useful dimensional framework. A Dunk 9imension is a dimension tableconsisting of attributes that do not belong in the fact table or in any of theeisting dimension tables. )he nature of these attributes is usually tet orvarious 6ags, e.g. non-generic comments or just simple yesno or truefalseindicators. )hese kinds of attributes are typically remaining when all theobvious dimensions in the business process have been identi5ed and thus thedesigner is faced with the challenge of where to put these attributes that donot belong in the other dimensions.*ne solution is to create a new dimension for each of the remainingattributes, but due to their nature, it could be necessary to create a vastnumber of new dimensions resulting in a fact table with a very large numberof foreign keys. )he designer could also decide to leave the remaining

    attributes in the fact table but this could make the row length of the tableunnecessarily large if, for eample, the attributes is a long tet string.

    )he solution to this challenge is to identify all the attributes and then putthem into one or several Dunk 9imensions. *ne Dunk 9imension can holdseveral truefalse or yesno indicators that have no correlation with eachother, so it would be convenient to convert the indicators into a moredescribing attribute. An eample would be an indicator about whether apackage had arrived, instead of indicating this as Eyes or Eno, it would beconverted into Earrived or Epending in the junk dimension. )he designercan choose to build the dimension table so it ends up holding all theindicators occurring with every other indicator so that all combinations arecovered. )his sets up a 5ed size for the table itself which would be (G

    rows, where is the number of indicators. )his solution is appropriate insituations where the designer would epect to encounter a lot of di#erentcombinations and where the possible combinations are limited to anacceptable level. >n a situation where the number of indicators are large, thuscreating a very big table or where the designer only epect to encounter afew of the possible combinations, it would be more appropriate to build eachrow in the junk dimension as new combinations are encountered. )o limit the

  • 7/23/2019 All About Data Warehouse

    14/22

    size of the tables, multiple junk dimensions might be appropriate in othersituations depending on the correlation between various indicators.

    Dunk dimensions are also appropriate for placing attributes like non-genericcomments from the fact table. +uch attributes might consist of data from anoptional comment 5eld when a customer places an order and as a result will

    probably be blank in many cases. )herefore the junk dimension shouldcontain a single row representing the blanks as a surrogate key that will beused in the fact table for every row returned with a blank comment 5eld

    )ole $la!ing dimension:9imensions are often recycled for multipleapplications within the same database. For instance, a !9ate! dimension canbe used for !9ate of +ale!, as well as !9ate of 9elivery!, or !9ate of $ire!.+imilarly, mployee in the feedback survey might act as the Efeedbackreuester as well as Efeedback provider. )his is often referred to as a !role-playing dimension!.

    *ini "imension-Out Trigger+#ast C%anging "imension,:1obile /o, Age

    * Dimensional +odeling9imensional modeling has been broadly accepted as the dominant techniue for

    data warehouse presentation. 9imensional modeling also has emerged as the only

    coherent architecture for building distributed data warehouse systems. "oth %/F

    and dimensional models can be represented in t

    consists, typically, of a large table of facts 2known as a fact table4, with a number of

    other tables surrounding it that contain descriptive data, called dimensions. 7hen itis drawn, it resembles the shape of a star, therefore the name.

  • 7/23/2019 All About Data Warehouse

    15/22

    8.1 I-%',() % D0-)+0% -%)0/

    >t addresses the problem of overly comple schemas in the presentation area. A

    dimensional model contains the same information as a normalized model but

    packages the data in a format whose design goals are user understandability, uery

    performance, and resilience to change.

    8.2 T*)+ % D0-)+0% -%)+

    There are three basic types of dimensional models, and they are:

    Star model:+tar schemas have one fact table and several dimension tables.

    )he dimension tables are not denormalized.

    Snowflake model:Further normalization and epansion of the dimension tablesin a star schema result in the implementation of a snow6ake design. Adimension is said to be snow6aked when the low-cardinality columns in thedimension have been removed to separate normalized tables that then linkback into the original dimension table.

    Multi-star model:A multi-star model is a dimensional model that consists ofmultiple fact tables, joined together through dimensions.

    , Loading techni-ues of dimension and fact ta$les

    3oading techniues of dimensions8 Following are the types of loading techniues8

    T!$e :)he )ype B method is passive. >t manages dimensional changes andno action is performed. Ialues remain as they were at the time the dimensionrecord was 5rst inserted.

    T.&E /:$ere the new information simply overwrites the original information.As this methodology overwrites old with new data, therefore does not trackhistorical data. >ts common uses are for misspelled names. )echnically, thesurrogate key is not necessary, since the table will be uniue by the naturalkey. $owever, to optimize performance on joins one should use integer ratherthan characterH hence the surrogate key is used in case the natural key ischaracter.

    T.&E 0:$ere the complete history of a record in a dimension is preserved. )odo so, new record is added to the table to represent the new information.

    )herefore, both the original and the new record will be present. )he newrecord gets its own primary key 2surrogate key is used as natural key will be

    repeated for any change4. )o determine the status of record information,version numbers can assigned to the records using a new column. *therwise,'#ective date' columns can be used like +)An type %, the limited history is preserved. And this is implementedby using separate columns e.g. )o maintain only last change of address 2anychange older than last one will not be tracked4, these columns can used, onefor latest address and one for previous address. )he e#ective date columncan also be kept to track the date of last update.

  • 7/23/2019 All About Data Warehouse

    16/22

    T.&E 2:)ype K is implemented by using history table to maintain partial orcomplete history. $ere, one table keeps the current data, and an additionaltable is used to keep a record of some or all changes with the date column todetermine the date of creation of that record.

    1. Dimensional modelling phases

    11 Con/ersion from 0 D+$ere we take a very simple eample of conversion of an < model into a

    9imensional model for the sales invoicing business process 2As identi5cation of

    business process is the 5rst step towards the creation of dimensional model4

    Figure & ample In/oice

  • 7/23/2019 All About Data Warehouse

    17/22

    Liven below is the < model to capture the information from the above invoice

    OrderCustomer Item Order3Item

    >nvoiceJ/um2=M4

    ustJ>d2=M4

    >temJ>d2=M4

    >nvoiceJ/um2FM4

    >nvoiceJ9ate ustJ/ame

    >temJ/ame

    >temJ>d2FM4

    ustJ>d2FM4ustJAdd

    >temJ9esc

    ?uantity

    >temJ=rice

    7e 5rst need to identify the grain here. /ow in order to capture the highest level of

    detail available in the invoice we select each line in the invoice as the grain for our

    design.

    )he, measures that we are capturing here are uantity of an item sold per invoice

    and its corresponding total price. +o these would be our two facts in the fact table

    2"oth these facts are at the grain level that we have selected.4

    /ow the parameters that de5ne our facts here are >nvoice /umber, 9ate, ustomer,

    and >tem. +o these become our four dimensions.

    >f we observe closely, after separation of date, our invoice date would be left with

    just one attribute which is >nvoiceJ/um. $ence we treat as a degenerated

    dimension. Liven below is the dimension model for the above scenario.

    "imensions

    "3Customer "3Item "3"ate "3Invoice+"egenerated,

    ustJ>d2=M4 >temJ>d2=M4 9ateJ>d2=M4 >nvoiceJ/um

    ustJ/ame >temJ/ame 9ayJofJmonth

    ustJAdd >temJ9esc

    >temJ=rice

    #acts

    #3Sales

    ustJ>d2FM4>temJ>d2FM4

    9ateJ>d2FM4

    >nvoiceJ/um29egenerated4

    ?uantity

    )otalJ=rice

  • 7/23/2019 All About Data Warehouse

    18/22

  • 7/23/2019 All About Data Warehouse

    19/22

    12 F#"

    ?. 7hat is an *9+N plain their role with eample. OMimball pageKBPQ

    A. A physical set of tables sitting between theoperational systems andthe data warehouse or a specially administered hot partition of the datawarehouse itself. )he main reason for an *9+ is to provide immediatereporting of operational results if neither the operational system nor theregular data warehouse can provide satisfactoryaccess. "ecause an *9+ isnecessarily an etract of the operational data, it also may play the role ofsource for the data warehouse.

    ?(. plain /ormalization and f not then how the relation is maintainedbetween 1ini dimension and the dimension from which it issegregated. plain with eample to support the same. 2Mimball4

    A%. A dimension formed using a few segregated attributes out of theparent dimension is known as a 1ini dimension only when its key is a part ofthe fact tableH if the key is a foreign key in the parent dimension, we refer toit as an outrigger.

    >f we embed the most recent segregated attributes key in the parentdimension, we must treat it as a type attribute. >f we tracked all thesegregated attributes changes over time as a type ( slowly changingdimension, we would have reintroduced the rapidly changing monster

    dimension problem that we have been working to avoidR 7ith a type change, we overwrite the segregated attributes key in the parent dimensionrow whenever it changes instead of creating a new row. >t is alsorecommended that this key be labeled as most recent or current values tominimize confusion. ven with uniue labeling, be aware that presentingusers with two avenues for accessing segregated attributes data, througheither the mini dimension or the outrigger, can deliver more functionality andcompleity than some users can handle.

    ?K. 7hat is onformed FactN2>"1

  • 7/23/2019 All About Data Warehouse

    20/22

    ?&. 7hen is +now6aking recommendedN 2

  • 7/23/2019 All About Data Warehouse

    21/22

    )o help understand when and why we snow6ake, consider the sample customer

    table shown in Figure below.

    Customer dimension

    )he customer table shows two kinds of attributes. )hese are attributes relating tothe customer and attributes relating to the customer country. "oth set of attributes

    represent a di#erent grain 2level of detail4 and also both sets of attributes are

    populated by a di#erent source system.

    +uch a customer dimension table is a perfect candidate for +now6aking for two

    primary reasons8

    )he customer table represents two di#erent sets of attributes. *ne set showscustomer attributes and the other set shows customer:s country attributes."oth of these sets of attributes represent a di#erent level of detail ofgranularity. *ne set describes the customer and the other de5nes more about

    the country. Also, the data for all customers residing in a country is identical. *n detailed analysis, we observe that the customer attributes are populated

    by the company t may also be possible another source system in thecompany may also be supplying some of these attributes.

    )he snow6aked customer table is shown in Figure below. )he customer dimension

    table is said to be snow6aked when the low-cardinality attributes 2customer:s

  • 7/23/2019 All About Data Warehouse

    22/22

    country attributes4 in the dimension have been removed to separate a normalized

    table called country and this normalized table is then joined back into the original

    customer dimension table.

    Sno45a(ed customer table

    =lease feel free to add in case anything has been missed out or captured incorrectly.

    13 eferencesCase stud!:9imensional model development Ochapter S in