Integrated EDW Kimball

Embed Size (px)

Citation preview

  • 8/12/2019 Integrated EDW Kimball

    1/18

    EssentialStepsforEssentialStepsforEssentialStepsforEssentialStepsfor

    theIntegratedEDWtheIntegratedEDWtheIntegratedEDWtheIntegratedEDW

    AKimballGroupWhitePaper

    ByRalphKimball

  • 8/12/2019 Integrated EDW Kimball

    2/18

    TableofContentsTableofContentsTableofContentsTableofContents

    ExecutiveSummary.................................................................................3

    AbouttheAuthor......................................................................................3

    WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)Deliver?....4DrillingAcrossistheUltimateLitmusTestforIntegration.........................4

    TheOrganizationalChallengesofProvidinganIntegratedEDW..............5

    ConformedDimensionsandFacts...........................................................6

    UsingtheBusMatrixasaWaytoCommunicatewithExecutives.............6

    ManagingtheBackboneoftheIntegratedEDW.......................................7

    TheDimensionManager..........................................................................8

    TheFactProvider..................................................................................11

    ConfiguringBIToolstoUsetheIntegratedEDW....................................12

    AdvancedTopics...................................................................................13

    Conclusion.............................................................................................18

  • 8/12/2019 Integrated EDW Kimball

    3/18

    EssentialStepsfortheIntegratedEDWCopyright2008byKimballGroup.Allrightsreserved.

    ExecutiveSummaryExecutiveSummaryExecutiveSummaryExecutiveSummary

    Inthiswhitepaper,weproposeaspecificarchitectureforbuildinganintegratedenterprisedatawarehouse(EDW).Thisarchitecturedirectlysupportsmasterdatamanagementeffortsandprovidestheplatformforconsistentbusinessanalysisacrosstheenterprise.Wedescribethescopeandchallengesofbuildingan

    integratedenterprisedatawarehouse,andweprovidedetailedguidancefordesigningandadministeringthenecessaryprocessesthatsupportintegration.ThiswhitepaperhasbeenwritteninresponsetoalackofspecificguidanceintheindustryastowhatanintegratedEDWactuallyis,andwhatnecessarydesignelementsareneededtoachieveintegration.

    AbouttheAuthorAbouttheAuthorAbouttheAuthorAbouttheAuthor

    RalphKimballfoundedtheKimballGroup.Sincethemid1980s,hehasbeenthedatawarehouse/businessintelligence(DW/BI)industrysthoughtleaderonthedimensionalapproachandtrainedmorethan10,000ITprofessionals.PriortoworkingatMetaphorandfoundingRedBrickSystems,Ralphco-inventedtheStarworkstationatXeroxsPaloAltoResearchCenter(PARC).RalphhashisPh.D.inElectricalEngineeringfromStanfordUniversity.

    TheKimballGroupisthesourcefordimensionalDW/BIconsultingandeducation,consistentwithourbest-sellingToolkitbookseries,DesignTips,andaward-winningarticles.Visitwww.kimballgroup.comformoreinformation.

  • 8/12/2019 Integrated EDW Kimball

    4/18

    EssentialStepsfortheIntegratedEDW

    WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)WhatDoesanIntegratedEnterpriseDataWarehouse(EDW)Deliver?Deliver?Deliver?Deliver?

    ThemissionstatementfortheintegratedEDWistoprovidetheplatformforbusinessanalysistobeappliedconsistentlyacrosstheenterprise.Aboveall,thismissionstatementdemandsconsistencyacrossbusinessprocesssubjectareas

    andtheirassociateddatabases.Consistencyrequiresdetailedtextualdescriptionsofentitiessuchascustomers,products,locations,andcalendarstobeapplieduniformlyacrosssubjectareas,usingstandardizeddatavalues.Ofcourse,thisisafundamentaltenetofmasterdatamanagement(MDM).

    Consistencyrequiresaggregatedgroupingssuchastypes,categories,flavors,colors,andzonesdefinedwithinentitiestohavethesameinterpretationsacrosssubjectareas.Thiscanbeviewedasahigherlevelrequirementonthetextualdescriptionsdescribedinthepreviousparagraph.

    ConsistencyrequiresthatconstraintsposedbyBIapplicationswhichattemptto

    harvestthevalueofconsistenttextdescriptionsandgroupingsbeappliedwithidenticalapplicationlogicacrosssubjectareas.Forinstance,constrainingonaproductcategoryshouldalwaysbedrivenfromafieldnamedCategoryfoundintheProductdimension.

    Consistencyrequiresthatnumericfactsarerepresentedconsistentlyacrosssubjectareassothatitmakessensetocombinethemincomputationsandcomparethemtoeachother,perhapswithratiosordifferences.Forinstance,ifRevenueisanumericfactreportedfrommultiplesubjectareas,thenthedefinitionsofeachoftheserevenueinstancesmustbethesame.

    Consistencyrequiresthatinternationaldifferencesinlanguages,locationdescriptions,timezones,currencies,andbusinessrulesberesolvedtoallowalloftheaboveconsistencyrequirementstobeachieved!

    Consistencyrequiresthatauditing,compliance,authentication,andauthorizationfunctionsbeappliedinthesamewayacrosssubjectareas.

    Finally,consistencyimpliescoordinationwithindustrystandardsfordatacontent,dataexchange,andreporting,wherethosestandardsimpacttheenterprise.TypicalstandardsincludeACORD(insurance),MISMO(mortgages),SWIFTandNACHA(financialservices),HIPAAandHL7(healthcare),RosettaNet(manufacturing),andEDI(procurement).

    DrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegrationDrillingAcrossistheUltimateLitmusTestforIntegration

    EvenanEDWthatmeetsalloftheconsistencyrequirementsdescribedabovemustadditionallyprovideamechanismfordeliveringintegratedreportsandanalysesfromBItools,attachedtomanydatabaseinstances,possiblyhostedonremote,incompatiblesystems.Wecallthisdrillingacross.DrillingacrossistheessentialactoftheintegratedEDW.Whenwedrillacross,wegatherresultsfromseparatebusinessprocesssubjectareasandthenalignorcombinetheseresultsintoasingleanalysis.

    Forexample,supposeourintegratedEDWspansmanufacturing,distributionand

  • 8/12/2019 Integrated EDW Kimball

    5/18

    EssentialStepsfortheIntegratedEDW

    retailsalesinabusinessthatsellsaudio/visualsystems.Wellassumethateachofthesesubjectareasissupportedbyaseparatetransactionprocessingsystem.AproperlyconstructeddrillacrossreportcouldlooklikeFigure1.

    Figure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReportFigure1.AThreeFactTableDrillAcrossReport

    ThefirsttwocolumnsarerowheadersfromtheProductandCalendarconformeddimensions,respectively.Theremainingthreefactcolumnseachcomefromseparatedatabases,namelymanufacturing,distribution,andretailsales.ThisdeceptivelysimplereportcanonlybeproducedinaproperlyintegratedEDW.Inparticular,theProductandCalendardimensionsmustbeavailableinallthreeseparatedatabases,andtheCategoryandPeriodattributeswithinthosedimensionsmusthaveidenticalcontentsandinterpretations.Althoughthemetricsinthethreefactcolumnsaredifferent,themeaningofthemetricsmustbeconsistentacrossproductcategoriesandtimes.

    YoumustunderstandandappreciatethetightconstraintsontheintegratedEDWenvironmentdemandedbytheabovereport.Ifyoudont,youwontunderstandthiswhitepaper,andyouwonthavethepatiencetostudythedetailedstepsdescribedbelow.Or,toputthedesignchallengeinotherterms,ifyoueventuallybuildasuccessfulintegratedEDW,youwillhavevisitedeveryissueinthispaper.So,withthosewarnings,readon!

    TheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDWTheOrganizationalChallengesofProvidinganIntegratedEDW

    TheintegratedEDWdeliverablesdescribedaboveareadauntinglistindeed.But

    forthesedeliverablestoevenbepossible,theenterprisemustmakeaprofoundcommitment,startingfromtheexecutivesuite.Theseparatedivisionsoftheenterprisemusthaveasharedvisionofthevalueofdataintegration,andtheymustanticipatethestepsofcompromiseanddecisionmakingthatwillberequired.Thisvisioncanonlycomefromtheseniorexecutivesoftheenterprise,whomustspeakveryclearlyonthevalueofdataintegration.

    ExistingmasterdatamanagementprojectsprovideanenormousboostfortheintegratedEDW,sincepresumablytheexecutiveteamalreadyunderstandsandapprovesthecommitmenttobuildingandmaintainingmasterdata.AgoodMDM

  • 8/12/2019 Integrated EDW Kimball

    6/18

    EssentialStepsfortheIntegratedEDW

    resourcegreatlysimplifies,butdoesnoteliminate,theneedfortheEDWteamtobuildthestructuresnecessaryfordatawarehouseintegration.

    Inmanyorganizations,achicken-and-eggdilemmaexists,astowhetherMDMisrequiredbeforeanintegratedEDWispossible,orwhethertheEDWteamcreatestheMDMresources.Often,alowprofileEDWefforttobuildconformeddimensionssolelyfordatawarehousepurposesmorphsintoafull-fledgedMDM

    effortthatisonthecriticalpathtosupportingmainlineoperationalsystems.Inourclassessince1993,wehaveshownabackwardpointingarrowleadingfromcleaneddatawarehousedatatooperationalsystems.Intheearlydays,wesighedwistfullyandwishedthatthesourcesystemscaredaboutclean,consistentdata.Now,morethanfifteenyearslater,weseemtobegettingourwish!

    ConformedDimensionsandFactsConformedDimensionsandFactsConformedDimensionsandFactsConformedDimensionsandFacts

    Sincetheearliestdaysofdatawarehousing, conformeddimensionshavebeenusedtoconsistentlylabelandconstrainseparatedatasources.WelearnedaboutconformeddimensionsfromA.C.Nielsenin1983when,atMetaphorComputerSystems,webroughtNielsenssyndicatedscannerdatatogetherwithproduct

    shipmentsdataatconsumerpackagegoodscompanies.Theideabehindconformeddimensionsisverysimple:twodimensionsareconformediftheycontainoneormorecommonfields,whosecontentsaredrawnfromthesamedomains.Thatresultsinconstraintsandlabelshavingthesamecontentandmeaningwhenappliedagainstseparatedatasources.

    Conformedfactsaresimplynumericmeasuresthathavethesamebusinessandmathematicalinterpretationssothattheymaybecomparedandcomputedagainsteachotherconsistently.Usingthesenames,wehavetaughttheprinciplesofconformeddimensionsandconformedfactssince1993inourbooksandarticles.

    UsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutivesUsingtheBusMatrixasaWaytoCommunicatewithExecutives

    WhenyoucombinethelistofEDWsubjectareaswiththenotionofconformeddimensions,apowerfuldiagramemerges,whichwecallthe enterprisedatawarehousebusmatrix.AtypicalbusmatrixisshowninFigure2.

    Figure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDWFigure2.ABusMatrixforaManufacturingEDW

  • 8/12/2019 Integrated EDW Kimball

    7/18

    EssentialStepsfortheIntegratedEDW

    Thebusinessprocesssubjectareasareshownalongtheleftsideofthematrixandthedimensionsareshownacrossthetop.AnXmarkswhereasubjectareausesthedimension.Notethatsubjectareainourvocabularycorrespondstoabusinessprocess,typicallyrevolvingaroundatransactionaldatasource.Thuscustomerisnotasubjectarea.

    AtthebeginningofanEDWimplementation,thisbusmatrixisveryusefulasa

    guide,bothtoprioritizethedevelopmentofseparatesubjectareas,butalsotoidentifythepotentialscopeoftheconformeddimensions.Aswehaveoftenremarked,thecolumnsofthebusmatrixaretheinvitationlisttotheconformeddimensiondesignmeeting!

    Beforetheconformeddimensiondesignmeetingoccurs,thisbusmatrixshouldbepresentedtoseniormanagement,perhapsinexactlytheformofFigure2.Seniormanagementmustbeabletovisualizewhythesedimensions(masterentities)attachtothevariousbusinessprocesssubjectareas,andtheymustappreciatetheorganizationalchallengesofassemblingthediverseinterestgroupstogethertoagreeontheconformeddimensioncontent.Ifseniormanagementisnotinterestedinwhatthebusmatriximplies,thentomakealongstoryshort,youhavenohopeofbuildinganintegratedEDW.

    Itisworthrepeatingthedefinitionofaconformeddimensionatthispointtotakesomeofthepressureoffoftheconformingchallenge.Twoinstancesofadimensionareconformediftheycontainoneormorecommonfields,whosecontentsaredrawnfromthesamedomains.Thismeansthattheindividualsubjectareaproponentsdonothavetogiveuptheircherishedprivatedescriptiveattributes.Itmerelymeansthatasetofmaster,universallyagreed-uponattributesmustbeestablished.Thesemasterattributesthenbecomethecontentsoftheconformeddimensionandbecomethebasisfordrillingacross.

    TheKimballGroupbooksandourarticleanddesigntiparchivescontainawealthofadditionalmaterialonthestepsofbuildingthebusmatrixforanenterpriseand

    establishingconformeddimensionsandfacts.Pleasesee www.kimballgroup.com.

    ManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDWManagingtheBackboneoftheIntegratedEDW

    ThebackboneoftheintegratedEDWisthesetofconformeddimensionsandconformedfacts.Eveniftheenterpriseexecutivessupporttheintegrationinitiative,andtheconformeddimensiondesignmeetinggoeswell,thereisalottotheoperationalmanagementofthisbackbone.Thismanagementcanbevisualizedmostclearlybydescribingtwopersonalityarchetypes:the dimensionmanagerandthefactprovider.Briefly,thedimensionmanagerisacentralizedauthoritywhobuildsanddistributesaconformeddimensiontotherestoftheenterprise,andthefactprovideristheclientwhoreceivesandutilizestheconformeddimension,almostalwayswhilemanagingoneormorefacttableswithinasubjectarea.

    Atthispointinthewhitepaperwemustmakethreefundamentalarchitecturalclaimstopreventfalseargumentsarisingthatturnintodistractions:

    1) Theneedfordimensionmanagersandfactprovidersarisessolelyfromthenaturalre-useofdimensionsacrossmultiplefacttables(orOLAPcubes).OncetheEDWcommunityhascommittedtosupportingcross-subjectareaanalysis,thereisnowaytoavoidallthestepsdescribedinthiswhitepaper!

  • 8/12/2019 Integrated EDW Kimball

    8/18

    EssentialStepsfortheIntegratedEDW

    2) Althoughwedescribethehandofffromthedimensionmanagertothefactproviderasifitwereoccurringinadistributedenvironmentwheretheyareremotefromeachother,theirrespectiverolesandresponsibilitiesarethesamewhethertheEDWisfullycentralizedonasinglemachineorisprofoundlydistributedacrossmanydiversemachinesindifferentlocations.

    3) Therolesofdimensionmanagerandfactprovider,althoughobviouslycouchedindimensionalmodelingterms,donotarisefromaparticularmodelingpersuasion.Allofthestepsdescribedinthiswhitepaperwouldbeneededinafullynormalizedenvironment.Actually,themanagementofprimary,durable,andnaturalkeysdescribedlaterinthiswhitepaper,aresubstantiallymorecomplicatedinanormalizedenvironmentbecauseoftheneedtopropagatechangingkeysupanddownthechainoflinkednormalizedtables.

    Thenexttwosectionsdescribetherolesofthedimensionmanagerandthefactprovider.

    TheDimensionManagerTheDimensionManagerTheDimensionManagerTheDimensionManagerThedimensionmanagerdefinesthecontentandstructureofaconformeddimension,anddeliversthatconformeddimensiontodownstreamclientsknownasfactproviders.ThisrolecandefinitelyexistwithinanMDMframework,buttheroleismuchmorefocusedthanjustbeingthekeeperofthesingletruthaboutanentity.Thedimensionmanagerhasalistofdeliverablesandresponsibilities,allorientedaroundcreatinganddistributingphysicalversionsofthedimensiontablesthatrepresentthemajorentitiesoftheenterprise.Inmanyenterprises,keyconformeddimensionsincludecustomer,product,service,location,employee,promotion,vendor,andcalendar.Inthefollowing,aswedescribethedimensionmanagerstasks,wewillusecustomerastheexampletokeepthediscussionfrombeingtooabstract.Herearethetasksofthecustomerdimensionmanager:

    Definethecontentofthecustomerdimension.Thedimensionmanagerchairsthedesignmeetingfortheconformedcustomerdimension.Atthatmeeting,allthestakeholdersfromthecustomerfacingtransactionsystemscometoagreementonasetofdimensionalattributesthateveryonewillusewhendrillingacrossseparatesubjectareas.Rememberthattheseattributesareusedasthebasisforconstrainingandgroupingcustomers.TypicalconformedcustomerattributesincludeType,Category,Location(multiplefieldsimplementinganaddress),PrimaryContact(name,title,address),FirstContactDate,CreditWorthiness,DemographicCategory,andothers.Everycustomeroftheenterpriseappearsintheconformedcustomerdimension.

    Receivenotificationofnewcustomers.Thedimensionmanageristhekeeperofthemasterlistofdimensionmembers,inthiscasecustomers.Thedimensionmanagermustbenotifiedwheneveranewcustomerisregistered.InafullblownMDMenvironment,newcustomersshouldonlyberegisteredbyusinganMDM-suppliedprocesswhichisunderthedirectcontrolofthedimensionmanager.InamoremodestdatawarehouseenvironmentwithoutacentralizedMDMfacility,eachremotecustomerfacingprocesshasthepotentialforregisteringanewcustomer.Inthesecases,thedimensionmanagerreceivesnotificationsofnewcustomersafterthefact.WithoutanMDMfacility,thedimensionmanagerisforcedtomaintainalistofnaturalkeysofcustomersfromeachpossiblesource.Thesenaturalkeysarethe

  • 8/12/2019 Integrated EDW Kimball

    9/18

    EssentialStepsfortheIntegratedEDW

    onlywaytoreliablydistinguishanewcustomerfromanoldcustomer.

    De-duplicatecustomerdimension.Thedimensionmanagermustde-duplicatethemasterlistofcustomers.Customerlistsintherealworldarenearlyimpossibletode-duplicatecompletely.EvenwhencustomersareregisteredthroughacentralMDMprocess,itisoftenpossibletocreateduplicates,eitherforindividualcustomersorbusinessentities.Thede-duplicationproblemismuchworsewhenno

    centralMDMresourceexists,sincetheseparatecustomerfacingprocessesarebydefinitionnotwellcoordinated.Evenworse,theseseparatecustomerfacingprocessesmayapplydifferentbusinessrulesandhavedifferentdatabasestructureswhencollectingcustomeridentityinformation.

    Assignsuniquedurablekeytoeachcustomer.Thedimensionmanagermustidentifyandkeeptrackofauniquedurablekeyforeachcustomer.ManyDBAsautomaticallyassumethatthisisthenaturalkey.Butquicklychoosingthenaturalkeymaybethewrongchoice.Anaturalkeymaynotbedurable!Usingourcustomerexample,ifthereisanyconceivablebusinessrulethatcouldchangethenaturalkeyovertime,thenitisnotdurable.Also,intheabsenceofaformalMDMprocess,naturalkeyscanarisefrommorethanonecustomerfacingprocess.Inthiscase,differentcustomerscouldhavenaturalkeysofverydifferentformats.Finally,asourcesystemsnaturalkeymaybeacomplex,multi-fielddatastructure.Forallthesereasons,thedimensionmanagerneedstostepbackfromliteralnaturalkeysandassignauniquedurablekeythatiscompletelyunderthecontrolofthedimensionmanager.Werecommendthatthisunique,durablekeybeasimplesequentiallyassignedinteger,withnostructureorsemanticsembeddedinthekeyvalue.Notethatthecreationofsuchaunique,durablekeydoesnotprecludecarryingoriginalnaturalkeysintheconformeddimensionrecord,butofcoursethisbecomescomplicatedwhentherearemultipleoriginalsourcesregisteringcustomers,potentiallywithduplications.

    TrackstimevarianceofcustomerswithType1,2,and3SCDs.Thedimensionmanagermustrespondtochangesintheconformedattributesdescribinga

    customer.Muchhasbeenwrittenabouttrackingthetimevarianceofdimensionmembersusingslowlychangingdimensions(SCDs).AType1changeoverwritesthechangedattributeandthereforedestroyshistory.AType2changecreatesanewdimensionrecordforthatcustomer,properlytimestampedasoftheeffectivemomentofthechange.AType3changecreatesanewfieldinthecustomerdimensionthatallowsanalternaterealitytobetracked.Thedimensionmanagerupdatesthecustomerdimensioninresponsetochangenotificationsreceivedfromvarioussources.SeeanyoftheKimballGroupbooksorourwebsiteforanextensivediscussionofSCDs.

    Assignssurrogatekeysforthecustomerdimension.Type2isthemostcommonandpowerfuloftheSCDtechniquessinceitprovidesprecisesynchronizationofa

    customerdescriptionwiththatcustomerstransactionhistory.SinceType2createsanewrecordforthesamecustomer,thedimensionmanagerisforcedtogeneralizethecustomerdimensionprimarykeybeyondtheunique,durablekey.Theprimarykeyshouldbeasimplesurrogatekey,sequentiallyassignedasneeded,withnostructureorsemanticsinthekeyvalue.Thisprimarykeyisseparatefromtheuniquedurablekey,whichsimplyappearsinthedimensionasanormalfield.Theunique,durablekeyisthegluethatbindstheseparateSCD2recordsforasinglecustomertogether.SeeFigure3showingthecompleterecommendedsetofkeysforthecustomerdimension,includingnatural,durable,andsurrogatekeys.

  • 8/12/2019 Integrated EDW Kimball

    10/18

    EssentialStepsfortheIntegratedEDW

    Figure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionFigure3.RecommendedKeyStructureForaCustomerDimensionHandleslatearrivingdimensiondata.WhenthedimensionmanagerreceiveslatenotificationofaType2changeaffectingacustomer,specialprocessingisneeded.Anewdimensionrecordmustbecreated,andtheeffectivedatesofthechangesadjusted.Thechangedattributemustbepropagatedforwardintimethroughexistingdimensionrecords.PleaseseeTheDataWarehouseETLToolkitbook[Wiley,2004]foracompletedescriptionoftheseprocessingsteps.

    Providesversionnumbersforthedimension.Beforereleasingachangeddimensiontothedownstreamfactproviders,thedimensionmanagermustupdatethedimensionversionnumberifType1orType3changeshaveoccurred,oriflate

    arrivingType2changeshaveoccurred.ThedimensionversionnumberdoesnotchangeifonlycontemporaryType2changeshavebeenmadesincethepreviousreleaseofthedimension.Werecommendembeddingthedimensionversionnumberasafieldinthedimensionitself,whereeveryrecordinthedimensioncontainsthesameversionnumbervalue.Inthisway,allquerytoolsandreportwritersattemptingtodrillacrossseparateinstancesofthedimensioncanincludetheversionnumberintheSQLSELECTlist,andtherebyautomaticallyavoidaligningincompatibledatafromdifferentdimensionversions.

    Addsprivateattributestodimensions.Thedimensionmanagermustincorporateprivatedepartmentalattributesinthereleaseofthedimensionstothefactproviders.TheseareattributesthatareofinteresttoonlyapartoftheEDWcommunity,perhapsasingledepartment.Paradoxically,theseattributesmustbe

    partofthemasterdimensionreleasesothatsuchdepartmentscanusetheattributesforconstrainingandgroupingwhenperformingdrillacrossqueries.Ifsomeoftheprivateattributeshavesensitivecontent,thenotherdepartmentsmustbeshieldedfromusingtheseattributesviatheauthenticationandauthorizationfunctionsoftheEDW.

    Buildsshrunkendimensionsasneeded.Thedimensionmanagerisresponsibleforbuildingvariousshrunkendimensionsthatareneededbyfacttablesathighlevelsofgranularity.Forexample,acustomerdimensionmightberolledupto

  • 8/12/2019 Integrated EDW Kimball

    11/18

    EssentialStepsfortheIntegratedEDW

    DemographicCategorytosupportafacttablethatreportssalesatthislevel.Thedimensionmanagerisresponsibleforcreatingthisshrunkendimensionandassigningitskeys.Suchadimensioncannotbecreatedbydefiningaviewonthelowestlevelcustomerdimension,sincerecordsinsuchaviewwouldhavetobedrawnfromtheindividualcustomerlist,andtheseindividualcustomersdonotnecessaryexistoveralltimes.Thusashrunkendimensionmustbeaseparate,independentdimensiontablewithitsownkeys.

    Replicatesdimensionstofactproviders.Thedimensionmanagerperiodicallyreplicatesthedimensionanditsshrunkenversionstoallthedownstreamfactproviders.Allthefactprovidersshouldattachthenewdimensionstotheirfacttablesatthesametime,especiallyiftheversionnumberhaschanged.

    Documentsandcommunicateschanges.Thedimensionmanagermaintainsmetadataanddocumentationdescribingallthechangesmadetothedimensionwitheachrelease.

    Coordinateswithotherdimensionmanagers.Althougheachconformeddimensioncanbeadministeredseparately,itmakessenseforthedimensionmanagerstocoordinatetheirreleasestolessentheimpactonthedownstreamfactproviders.

    TheFactProviderTheFactProviderTheFactProviderTheFactProvider

    Thefactprovidersitsdownstreamfromthedimensionmanagerandrespondstoeachreleaseofeachdimensionthatisattachedtoafacttableundertheproviderscontrol.

    Avoidschangestoconformedattributes.Thefactprovidermustnotalterthevaluesofanyconformeddimensionattributes,orthewholelogicofdrillingacrossdiversesubjectareaswillbecorrupted.

    Respondstolatearrivingdimensionupdates.Whenthefactproviderreceiveslate

    arrivingupdatestoadimension,theprimarykeysofthenewlycreateddimensionrecordsmustbeinsertedintoallfacttablesusingthatdimensionwhosetimespansoverlapthedateofthechange.Ifthesenewlycreatedkeysarenotinsertedintotheaffectedfacttables,thenthenewdimensionrecordwillnottietothetransactionalhistory.Thenewdimensionkeymustoverwriteexistingdimensionkeysintheaffectedfacttablesfromthetimeofthedimensionchangeuptothenextdimensionchangethatwasalreadycorrectlyadministered.ThisprocessisdescribedinmoredetailinTheDataWarehouseETLToolkit.

    Tiesconformeddimensionreleasetolocaldimension.Thedimensionmanagermustprovidetothefactprovideramappingthattiesthefactproviderslocalnaturalkeytotheprimarysurrogatekeyassignedbythedimensionmanager.Inthesurrogatekeypipeline(seebelow),thefactproviderreplacesthelocalnaturalkeys

    intherelevantfacttableswiththeconformeddimensionprimarysurrogatekeysusingthismapping.

    Processesdimensionsthroughsurrogatekeypipeline.Thefactproviderconvertsthenaturalkeysattachedtocontemporarytransactionrecordsintothecorrectprimarysurrogatekeys,andloadsthefactrecordsintothefinaltableswiththesesurrogatekeys.

    Handleslatearrivingfacts.Thesurrogatekeypipelinedescribedinthepreviousparagraphcanbeimplementedintwodifferentways.Traditionally,thefactprovider

  • 8/12/2019 Integrated EDW Kimball

    12/18

    EssentialStepsfortheIntegratedEDW

    maintainsacurrentkeylookuptableforeachdimensionthattiesthenaturalkeystothcontemporarysurrogatekeys.Thisworksforthemostcurrentfacttabledatawhereyocanbesurethatthecontemporarysurrogatekeyistheonetouse.Butthelookuptablescannotbeusedforlatearrivingfactdatasinceitispossiblethatoneormoreosurrogatekeysmustbeused.Inthistraditionalapproach,thefactprovidermustrevertoaninefficientdimensiontablelookupinordertofigureoutwhicholdsurrogatekeyapplies.

    Amoremodernapproachtothesurrogatekeypipelineimplementsadynamiccacherecordslookedupinthedimensiontableratherthanaseparatelymaintainedlookuptable.Thiscachehandlescontemporaryfactrecordsaswellaslatearrivingfactrecorwithasinglemechanism.SeeTheDataWarehouseETLToolkitbookformoredetai

    Synchronizesdimensionreleaseswithotherfactproviders.Itiscriticallyimportantfoallthefactproviderstorespondtodimensionreleasesatthesametime.Otherwiseaclientapplicationattemptingtodrillacrosssubjectareaswillencounterdimensionswdifferentversionnumbers.Seethedescriptionofusingdimensionversionnumbersinthelastparagraphofthiswhitepaper.

    ConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWConfiguringBIToolstoUsetheIntegratedEDWThereisnopointingoingtoallthetroubleofsettingupdimensionmanagers,factproviders,andconformeddimensionsifyouarentgoingtoperformdrillacrossquerieInotherwords,youneedtosort-mergeseparateanswersetsontherowheadersdefinedbythevaluesfromtheconformeddimensionattributes.TherearemanywaystodothisinstandardBItools,andinstraightSQL.

    Mechanismfordrillacross.InSQLadrillacrossquerybringingdatafrommanufacturingshipmentsandretailsalescouldbeimplementedasfollows:

    SELECTMfg.ProductCategory,Mfg.Year,Mfg_Amount,Sales_AmountFROM

    --SubqueryMfgreturnstotalshipmentsfromManufacturing(SELECTCategoryASProductCategory,Year,SUM(Ship_Amount)Mfg_AmountFROMMfg_ShipmentsAINNERJOINProductCONA.Product_Key=C.Product_KeyINNERJOINDateDONA.Sales_Date_Key=D.Date_KeyGROUPBYCategory,Year)MfgINNERJOIN

    --SubquerySalesreturnstotalsalesfromtheSalesdatabase(SELECTProdCat_NameASProductCategory,Year,SUM(Amount)Sales_Amount

    FROMSales_factFINNERJOINProductCONF.Product_Key=C.Product_KeyINNERJOINDateDONF.Sales_Date_Key=D.Date_KeyGROUPBYProdCat_Name,Year)Sales--JoinconditionforoursmallresultsetsONMfg.ProductCategory=Sales.ProductCategoryANDMfg.Year=Sales.Year

  • 8/12/2019 Integrated EDW Kimball

    13/18

    EssentialStepsfortheIntegratedEDW

    Thisshouldperformalmostasfastasdoingthetwoindividualqueriesagainsttheseparatefacttablesbecausethejoinisonrelativelysmallsubsetofdatathatsalreadyinmemory.

    Usesdimensionversionnumberswheresort-merge(outerjoin)issupportedbyBItoolindrillacrossqueries.AproperlyinstrumentedBItoolthatsort-mergesthefinalseparateanswersetsthatcomposeadrillacrossquerycanprovidevaluable

    protectionagainsterroneousresultsthatcomefromaccessingconformeddimensionsthathavedifferentversionnumbers.IftheBItooldoesincludetheversionnumberintheSELECTlist,andtheresultsaresort-merged(outerjoined)thentheresultsfromthefacttablequerieswillenduponseparaterowsoftheanswerset,properlylabeledbythedimensionversion.Thisisntmuchconsolationtotheenduser,butatleasttheproblemisdiagnosedinanobviousway.

    InFigure4weshowareportdrillingacrossthesamethreedatabasesasinFigure1,butwhereadimensionversionmismatchoccurs.Perhapsthedefinitionofcertainproductcategorieshasbeenadjustedbetweenproductdimensionversion7andversion8.Inthiscase,theretailsalesfacttableisusingversion8whereastheothertwofacttablesarestillusingversion7.ByincludingtheproductdimensionversionattributeintheSQLSELECTlist,weautomaticallyavoidmergingpotentiallyincompatibledata.Suchanerrorwouldbeparticularlyinsidiousbecausewithouttherowsbeingseparated,theresultwouldlookperfectlyreasonable,butitcouldbe

    disastrouslymisleading.

    Figure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatchFigure4.ADrillAcrossReportWithaDimensionVersionMismatch

    AdvancedTopicsAdvancedTopicsAdvancedTopicsAdvancedTopicsInthissectionwedescribespecialrefinementstothechallengeofEDWintegrationthatarebeyondthebasicstepspresentedintheprevioussections.

  • 8/12/2019 Integrated EDW Kimball

    14/18

    EssentialStepsfortheIntegratedEDW

    FactproviderimplementslocalSCDsinadditiontoconformedSCDs.Atrickyproblemoccurswhenalocallyprovideddimensionattributeundergoesachangeatadifferenttimethananychangesdownloadedfromthedimensionmanager.Thisislogicallyequivalenttohandlinglatearrivingdimensions,butrequiresthefactprovidertocreateasurrogatekeyforthedimensionthatwillnotbeusedbythedimensionmanager.Thedimensionmanagermayneedtopartitionthekeyspacetoassignabandofkeystothefactproviderforthispurpose.

    Dimensionmanagersandfactprovidersresolveinternationalrepresentationdifferences.AtrulyinternationalEDWpresentsmanychallenges,whichareexploredinsignificantdetailinTheDataWebhouseToolkit,(KimballandMerz,Wiley2000).Thesechallengesinclude:

    Foreignalphabetsandcharactersets.ManyoftheinternationaldisplayandprintingproblemsinaninternationalEDWrequirebeingabletorepresentforeigncharacters,includingnotjusttheaccentedcharactersfromwesternEuropeanalphabets,butCyrillic,Arabic,Japanese,Chinese,anddozensofotherlessfamiliarwritingsystems.Itisimportanttounderstandthatthisisnotafontproblem.Thisisacharactersetproblem.Afontissimplyanartistsrenderingofasetofcharacters.TherearehundredsoffontsavailableforstandardEnglish.ButstandardEnglishhasarelativelysmallcharactersetthatisenoughforanyonesuseunlessyouareaprofessionaltypographer.ThissmallcharactersetisusuallyencodedinASCII(AmericanStandardCodeforInformationInterchange),whichisan8-bitencodingthathasamaximumof255possiblecharacters.Onlyabout100ofthese255charactershaveastandardinterpretationthatcanbeinvokedfromanormalEnglishkeyboard,butthisisusuallyenoughforEnglishspeakingcomputerusers.Itshouldbeclear,though,thatASCIIiswoefullyinadequateforrepresentingthethousandsofcharactersneededfornonEnglishwritingsystems.Aninternationalbodyofsystemarchitects,theUnicodeConsortium,hasdefinedastandardknownasUnicodeforrepresentingcharactersandalphabetsinalmostalloftheworldslanguagesandcultures.

    Theirworkcanbeaccessedonthewebat www.unicode.org.TheprimaryuseofUnicodeisa16-bitencodingthathasamaximumof65,535possiblecharacters.TheUnicodeStandard,version5.0,whichisthepublishedversionofUnicodeasofthewritingofthiswhitepaper,nowcoverstheprincipalwrittenlanguagesoftheAmericas,Europe,theMiddleEast,Africa,India,Asia,andPacifica.

    Addressesandtheirextensionstolocationsandmaps.NamesandaddressesarethemostdifficultandfarreachinginternationaldesignproblemintheinternationalEDW.TobyAtkinsonhaswrittenaremarkablebookdescribingtheintricaciesofinternationalnamesandaddresses.InhisMerriamWebstersGuidetoInternationalBusinessCommunications

    (Merriam-Webster,1999)hegivesthefollowingexample.Supposeyouhaveanameandaddresslikethefollowing:

    SndorCsillaNemzetkziKiadKftRkcziu.737626PCS

    Areyoupreparedtostorethisinadatabase?Isthisapostallyvalidaddress?Doesthisrepresentapersonoracompany?Maleorfemale?

  • 8/12/2019 Integrated EDW Kimball

    15/18

    EssentialStepsfortheIntegratedEDW

    Wouldtherecipientbeinsultedbyanythingaboutthis?Canyoursystemparseittodeterminetheprecisegeographiclocale?Whatsalutationwouldbeappropriateifyouweregreetingthisentityinaletteroronthetelephone?Whatisgoingtohappentothevariousspecialcharacterswhenitisprinted?Canyouevenenterthesecharactersfromyourvariouskeyboards?IfyourEDWcontainsinformationaboutpeopleorbusinesseslocatedinmultiplecountries,thenyouneedtoplancarefullyforacompletesystemspanning

    datainput,transactionprocessing,addresslabelandmailingproduction,realtimecustomerresponsesystems,andyourmarketingorienteddatawarehouse.

    Numbers.Numbersarerepresenteddifferentlyindifferentcultures.Thenumber100.456isslightlylargerthanonehundredintheUnitedStates,butslightlylargerthanonehundredthousandinGermany.InIndia,alargenumbermaybewrittenas2334789,sincetheymaygroupthedigitsbytwosafterthefirstgroupofthree.InIndia,alakhrepresents100,000andacrorerepresents10,000,000.Othercountriesuseperiods,commas,andevenapostrophestoseparatethedigits.AninternationalEDWmustbeabletoreadandwritenumberscorrectly,givenanassignedculturalcontext.

    TelephoneNumbers.Telephonenumbers,likepostaladdresses,havetwobasicrepresentations.Oneisfordomesticconsumption,andoneisforinternationaluse.Tomakemattersworse,theinternationalversionisofteninterpretedinadifferentwaybyeachinternationalobserver.Atelephonenumber(randomlycreatedforillustrativepurposes)inSouthAfricaforexampleiswrittenas

    021-222-3333

    butmustbedialedfromtheUnitedStatesas

    011-27-21-222-3333.

    Theleading011isthewaytheUnitedStatesdialsinternationalnumbers.Thiswillnotbethesameinothercountries.

    Currencies.Multinationalbusinessesoftenbooktransactions,collectrevenues,andpayexpensesinmanydifferentcurrencies.AgoodbasicdesignforallofthesesituationsisshowninFigure5.

    Figure5.AMultinationalFactTableFigure5.AMultinationalFactTableFigure5.AMultinationalFactTableFigure5.AMultinationalFactTable

    Theprimaryamountofthetransactionisrepresentedinthelocalcurrency.

  • 8/12/2019 Integrated EDW Kimball

    16/18

    EssentialStepsfortheIntegratedEDW

    Insomesense,thisisalwaysthecorrectvalueofthetransaction.Foreasyreportingpurposes,asecondfieldincludedinthetransactionfactrecordexpressesthesameamountinasinglestandardcurrency,suchasUnitedStatesdollars.Theequivalencybetweenthetwoamountsisabasicdesigndecisionforthefacttable,andperhapsisanagreedupondailyspotratefortheconversionofthelocalcurrencyintotheglobalcurrency.Nowalltransactionsinasinglecurrencycanbeaddedupeasilyfromthefacttable

    byconstrainingthecurrencydimensiontoasinglecurrencytype.Transactionsfromaroundtheworldcaneasilybeaddedupbysummingthestandardcurrencyfield.Notethatcurrenciesandcountriesarecloselycorrelatedbuttheyarenotthesame.Countriesmaychangetheidentityoftheircurrencyduringperiodsofsevereinflation.

    TimeofDay.Thecalculationofthetruewallclocktimeinagivenlocationaroundtheworldissurprisinglycomplicated.Mostpeoplethinkthereare24timezones,correspondingtothe24possiblehoursperday.Butwithevenalittleforeigntravelexperience,onebeginstorealizethatthissituationismuchmorecomplex.TheentirecountryofIndia,forinstance,sitsinbetweenthesehourboundaries,sinceatdifferenttimesoftheyear,itis

    either5.5or6.5hoursaheadofGreenwichMeanTime.Therulesofwhenvariouslocationsgoonandoffdaylightsavingstimeareamazinglyintricate.PartsofIndiana,forexample,goondaylightsavingstime,andotherpartsdonot.Thedateswhendaylightsavingstimegoesintoeffectvarybylocation.ThetimedifferencebetweenLondon,EnglandandSydney,Australiacanvarybyasmuchastwohours,dependingonthetimeofyear.Inreality,therearemorethan500timezonesintheworld,andthelistisconstantlychanging.Thecomplexityoftimezonecalculationsmakesitclearthatonecannotembedtimezoneassumptionsinthecodeofapplicationsorfixedqueries.ItisalsoprettyclearthateachITorganizationshouldnotre-inventthewheelandderiveallthetimezonerulesindependently.Fortunately,thewebcomestoourrescue.Anumberoftimezoneconversionservices,suchaswww.timezoneconverter.com,areavailableon-linethathaveup-to-datedatabasesreflectingallthecomplexitiesoftimezonecalculations.

    Calendars.Eachcountryhasauniquelistofholidays.Inmanycasestheholidaysdonotoccuronthesamedayinsuccessiveyears.Someholidays,suchasEaster,arebasedonverycomplexrules,thatinvolvethephasesofthemoon,orotherevents.Somereligiousholidaysarenotcelebratedonthesamedayinvariouspartsofthesamecountry.Holidaysaresocomplicatedthatitprobablydoesnotmakesensetotrytodefinethemmorethantenortwentyyearsintothefuture.Thus,muchaswithtimezones,thetechnicaldefinitionofholidaysintheEDWneedstobedrivenfromaservice.Atthetimeofthiswriting,someofthebestpubliclyavailablesourcesof

    internationalholidaydefinitionscanbefoundonthewebbysearchingGoogleforinternationalholidaycalendar.

    Reports,printing,andcollatingsequences.Aninterestingissueinmultinationalreportingishowtoprepareasetofconsistentreportsformanagersacrosssuchanorganizationindifferentlanguages.Therearethreebasicissuesthatmustbedealtwithsimultaneously:sorting(collating),grouping,andconforming.

    Manylanguagesystemssorttheirspecialcharactersinauniqueway.

  • 8/12/2019 Integrated EDW Kimball

    17/18

    EssentialStepsfortheIntegratedEDW

    AtkinsonsbookdiscussesthespecificrulesforsortinginCatalan,Czech,Danish,Finnish,German,Hungarian,Norwegian,Polish,Slovenian,Spanish,Swedish,andTurkish.AndtheseareonlylanguagesusingtheRomanalphabet.Areportcouldsortthesamesetofcustomernamesdifferentlyindifferentlanguages.

    Greatcaremustbetakenifasetofattributesinadimensionistranslated

    fromonelanguagetoanother.Forinstance,ifthecategoryanddepartmentnamesforalargenumberofproductsaretranslatedintomorethanonelanguage,thenthecardinalityandthedetailedmany-to-manyandmany-to-onerelationshipsmustbeidenticalbetweenthetwolanguagesversionsofthedimension,orelsetheuseofanattributefromthedimensionasarowheader(groupingcriterion)willnotproducethesameresultsintheseparatelanguages.Becausethemaintenanceoftwolanguageversionsofalargedimensiontablewouldbesosubtleanddifficult,werecommendagainstthisapproach.

    Ifthesamedimensiontablehasseverallanguageversionsindifferentcountries,thenitmaybeimpossibletoconformdatasourcesacrosstheseversions,becauseatanSQLquerylevel,therowheadersoftheseparateanswersetsindifferentlanguagescouldnotbematched.

    Ifweassumethatwewantasetofreportstospanmultiplelanguages,thenwerecommendimplementingatwolayerarchitecture.Inthelowerlayer,westorealldataandproduceallreportsfromasinglebaselanguagesystem.Intheupperlayer,thefinishedreportisaugmentedwithtranslationsinauxiliaryreportingcolumns.Theseauxiliaryreportingcolumnsdonotaffectsorting,grouping,ortheabilitytoconformreportsacrossdatasourceslocatedindifferentcountries.Ifweadoptthisapproach,managersfromdifferentcountriesshouldbeabletositinthesameroomwiththeirownversionsofthesamereports,butbeabletounderstandeachothersreportsandcomparethem.

    Dimensionmanagersandfactprovidersensurethatauditing,compliance,authentication,authorization,andusagetrackingfunctionsareapplieduniformlyfoallBIclients.Thissetofresponsibilitiesisespeciallychallengingsincetheyareoutsidethescopeofthestepsdescribedinthiswhitepaper.AcentralizedMDMresourcemaystandardizeclientsdirectaccesstomasterdata,suchascustomer.Butsuchdirectaccessprobablyoccursoveranenterpriseservicebus(ESB),perhapsimplementedonaserviceorientedarchitecture(SOA)framework.ThisaccessdirectlytotheMDMresourceisverydifferentthanusingacustomerdimensioninaBIreportproducedbytheEDW.EvenwhenmodernroleenabledauthenticationandauthorizationsafeguardsareinplacewhenusingtheEDW,subtledifferencesinthedefinitionofrolesmaygiverisetoinconsistency.For

    example,arolenamedsenioranalystmayhavedifferentinterpretationsatdifferententrypointstotheEDW.Logically,thechallengeofconformingtheseroledefinitionsissimilartoconformingdimensionalattributes,buttheroledefinitionsarestoredandmaintainedentirelydifferently.Inmanycases,theseroledefinitionsarestoredandenforcedinlocalLDAPdirectoryserversthatinterceptendusersloginrequestsallacrosstheEDWlandscape.Andfinally,thecriteriaforwhoqualifiestobeasenioranalystmaydependonlocaladministrationthatistiedmoretothehumanresourcesfunctionthanbusinessresponsibility.ThebestthatcanbesaidforthisdifficultdesignchallengeisthatpersonnelresponsiblefordefiningtheLDAP-enabledrolesshouldbeinvitedtotheoriginaldimensionconforming

  • 8/12/2019 Integrated EDW Kimball

    18/18

    meetingssothattheybecomeawareofthescopeofEDWintegration.

    Dimensionmanagersandfactproviderscoordinatewithindustrystandardsfordatacontent,dataexchange,andreporting,suchasACORD(insurance),MISMO(mortgages),SWIFTandNACHA(financialservices),HIPAAandHL7(healthcare)RosettaNet(manufacturing),andEDI(procurement).TheexistenceofindustrystandardsismostlygoodnewsfortheEDWsinceeachindustrystandardprovides

    thedefinitionofmanyconformeddimensionattributesandfacts.Butoftenthesestandardsareaccompaniedbylegalrestrictionsonhowtheinformationishandled.

    ConclusionConclusionConclusionConclusion

    TheintegratedEDWpromisesarational,consistentviewofenterprisedata.Thispromisehasbeenrepeatedendlesslyinthetradeliterature.Butuntilnow,therehasbeennospecificdesignforactuallyimplementingtheintegratedEDW.Inthispaper

    wehavepreciselyidentifiedtheabilitytodrillacrossasthecentraldeliverableoftheintegratedEDW.Thenwehavemethodicallydescribedtherequiredstepsandresponsibilitieswhichgiverisetothearchetypalrolesofthedimensionmanagerandthefactprovider.AlthoughthisimplementationoftheintegratedEDWsurelymustseemdaunting,webelievethatthestepsandresponsibilitieswehavedescribedarebasicandunavoidable,nomatterhowyourdatawarehouseenvironmentisorganized.Finally,thisarchitecturerepresentsadistillationofmorethantwodecadesexperienceinbuildingdatawarehousebasedonconformeddimensionsandfacts.Ifyoucarefullyconsiderthedetailedrecommendationsinthispaper,youshouldavoidre-inventingthewheelwhenyouarebuildingyourintegratedEDW.