Data Warehouse Development & Schemas

Embed Size (px)

Citation preview

  • 8/2/2019 Data Warehouse Development & Schemas

    1/23

    DataData WarehouseWarehouseDevelopment & SchemasDevelopment & Schemas

  • 8/2/2019 Data Warehouse Development & Schemas

    2/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 2

    Data Warehouse DevelopmentData Warehouse DevelopmentData warehouse development approaches

    I nmon Model: EDW approach (top-down)Kimball Model: Data mart approach (bottom-up)

    Which model is best?There is no one-size-fits-all strategy to DW

    One alternative is the hosted warehouse

    Data warehouse structure:The Star Schema vs. Relational

    Real-time data warehousing?

  • 8/2/2019 Data Warehouse Development & Schemas

    3/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 3

    I nmon Model: The EDW ApproachI nmon Model: The EDW Approach

    Top-down DevelopmentSpiral Development ApproachERD BasedHe insisted that data should be organized into subject oriented,integrated, non volatile and time variant structures.Detailed data is regularly extracted from the ODS and Data martsand temporarily hosted in the staging area for aggregation,summarization and then extracted and loaded into the Datawarehouse.

  • 8/2/2019 Data Warehouse Development & Schemas

    4/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 4

    Kimball Model: The Data Mart ApproachKimball Model: The Data Mart Approach

    Bottom up approach uses bus structure .Plan big, build smallSubject oriented or department-oriented data warehouse such as marketing orsales.This model strikes a good balance between centralized and localized flexibility.

    This architecture makes the data warehouse more of a virtual reality than aphysical reality.

    All data marts could be located in one server or could be located on differentservers across the enterprise while the data warehouse would be a virtual entitybeing nothing more than a sum total of all the data marts.

  • 8/2/2019 Data Warehouse Development & Schemas

    5/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 5

    DWDW Development ApproachesDevelopment Approaches(Inmon Approach) (Kimball Approach)

  • 8/2/2019 Data Warehouse Development & Schemas

    6/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 6

    Data Warehouse Schema ArchitectureData Warehouse Schema Architecture

    - Star schema- Snowflake schema

    - Fact constellation schema

  • 8/2/2019 Data Warehouse Development & Schemas

    7/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 7

    Star schemaStar schema

    A star schema can be simple or complex. A simple star consistsof one fact table; a complex star can have more than one facttable.

    It contains two types of tables

    Fact Tables: A fact table typically has two types of columns:foreign keys to dimension tables and measures those that containnumeric facts. A fact table can contain fact's data on detail oraggregated level.

    Dimension Tables: A dimension is a structure usually composedof one or more hierarchies that categorizes data.They are normally descriptive, textual valuesDimension tables are generally small in size then fact table.

  • 8/2/2019 Data Warehouse Development & Schemas

    8/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 8

    Star schemaStar schema

  • 8/2/2019 Data Warehouse Development & Schemas

    9/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8- 9

    The main characteristics of starThe main characteristics of starschema:schema:

    Simple structure -> easy to understand schemaGreat query effectives -> small number of tables to

    join

    Relatively long time of loading data into dimensiontables -> de-normalization, redundancy data causedthat size of the table could be large.The most commonly used in the data warehouseimplementations -> widely supported by a largenumber of business intelligence tools

  • 8/2/2019 Data Warehouse Development & Schemas

    10/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-10

    Snowflake schemaSnowflake schemaThe snowflake schema architecture is a morecomplex variation of the star schema used in a datawarehouse, because the tables which describe thedimensions are normalized.

  • 8/2/2019 Data Warehouse Development & Schemas

    11/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-11

    Fact constellation schemaFact constellation schemaFor each star schema it is possible to construct fact constellationschema(for example by splitting the original star schema into more starschemes each of them describes facts on another level of dimensionhierarchies)The fact constellation architecture contains multiple fact tables thatshare many dimension tables.The main shortcoming of the fact constellation schema is a morecomplicated design because many variants for particular kinds of aggregation must be considered and selected. Moreover, dimensiontables are still large.

  • 8/2/2019 Data Warehouse Development & Schemas

    12/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-12

    Fact constellation schemaFact constellation schema

  • 8/2/2019 Data Warehouse Development & Schemas

    13/23

  • 8/2/2019 Data Warehouse Development & Schemas

    14/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-14

    Risks in ImplementingRisks in Implementing DWDWNo mission or objectiveQuality of source data unknownSkills not in place

    Inadequate budgetLack of supporting softwareSource data not understoodWeak sponsor

    Users not computer literatePolitical problems or turf warsUnrealistic user expectations

    (Continued )

  • 8/2/2019 Data Warehouse Development & Schemas

    15/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-15

    Risks in ImplementingRisks in Implementing DWDW Cont.Cont. Architectural and design risksScope creep and changing requirements

    Vendors out of control

    Multiple platformsKey people leaving the projectLoss of the sponsorToo much new technology

    Having to fix an operational systemGeographically distributed environmentTeam geography and language culture

  • 8/2/2019 Data Warehouse Development & Schemas

    16/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-16

    Things to Avoid for SuccessfulThings to Avoid for SuccessfulImplementation of Implementation of DWDW

    Starting with the wrong sponsorship chainSetting expectations that you cannot meetEngaging in politically naive behaviorLoading the warehouse with information justbecause it is availableBelieving that data warehousing database

    design is the same as transactional DB designChoosing a data warehouse manager who istechnology oriented rather than user oriented

    (see more on page 356)

  • 8/2/2019 Data Warehouse Development & Schemas

    17/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-17

    RealReal--timetime DWDW(a.k.a. Active Data Warehousing)(a.k.a. Active Data Warehousing)

    Enabling real-time data updates forreal-time analysis and real-time decisionmaking is growing rapidly

    Push vs. Pull (of data)

    Concerns about real-time BINot all data should be updated continuously

    Mismatch of reports generated minutes apartMay be cost prohibitiveMay also be infeasible

  • 8/2/2019 Data Warehouse Development & Schemas

    18/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-18

    Evolution of DSS & DWEvolution of DSS & DW

  • 8/2/2019 Data Warehouse Development & Schemas

    19/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-19

    Active Data Warehousing Active Data Warehousing(by(by TeradataTeradata Corporation)Corporation)

  • 8/2/2019 Data Warehouse Development & Schemas

    20/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-20

    Comparing Traditional and ActiveComparing Traditional and Active DWDW

  • 8/2/2019 Data Warehouse Development & Schemas

    21/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-21

    Data Warehouse AdministrationData Warehouse Administration

    Due to its huge size and its intrinsic nature, aDW requires especially strong monitoring inorder to sustain its efficiency, productivity

    and security.The successful administration andmanagement of a data warehouse entailsskills and proficiency that go past what is

    required of a traditional databaseadministrator.

    Requires expertise in high-performance software,hardware, and networking technologies

  • 8/2/2019 Data Warehouse Development & Schemas

    22/23

  • 8/2/2019 Data Warehouse Development & Schemas

    23/23

    Prof. Pawan Kumar MBA IV SEM (SEC-A) 8-23

    End of the ChapterEnd of the Chapter

    Questions ?