BI Session 6 Prof Dhruv Nath

Embed Size (px)

Citation preview

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    1/58

    Dhruv Nath

    BITech Session on Data Warehousing

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    2/58

    Slides on OLAP

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    3/58

    DW : Contents

    ER Model vs Dimensional Model

    Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    4/58

    OLTP Databases use the Entity Relationship Model

    Why cant we use

    the ER Model for

    Analytics / BI ?

    Why no

    Many-Many

    relationships?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    5/58

    Problems with using the ER Model / 3NF for Querying

    Complex to understand and query

    All kinds of tables being joined to all kinds of other tables

    Maybe OK for joining a few tables. Not OK when lots of tables

    involved

    Complex to visualise

    The E-R Model is very symmetric

    no way to figure out what data is business numbers (changing)

    and what is constant (eg. Regions, Products)

    The E-R Model is designed for capturing / updating

    detailed data. Not for querying it

    Different Model required for querying this

    data by Management

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    6/58

    An Easier Model to Query

    SalesCollections

    Complaints

    Model

    Geography

    Dealer

    Product

    Year

    Dimensional ModelFacts and Dimensions

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    7/58

    Benefits of the Dimensional Model

    Simple Can be used directly by the user

    Very clear what data is business numbers

    (changing - facts) and what is constant (eg.Regions, Products - dimensions)

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    8/58

    Example : Dimensional Model of Data

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    What is the primary key in each dimension ?

    What is the primary key in the Fact table ?

    Dimension

    Dimension Dimension

    What are the foreign keys ? What

    relationships do they define ?

    What do we call this schema ?

    Star Schema

    Fact

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    9/58

    Example : Dimensional Model of Data

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    Dimension

    Dimension Dimension

    Fact

    Each Dimension represents an entity (with attributes)

    The Star Schema can be visualised as a

    Data Cube. How ?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    10/58

    Visualising a Star Schema as a Data Cube

    Querying :

    OLAP

    (vs OLTP)

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    11/58

    Dimensional Model

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    Dimension

    Dimension Dimension

    Fact

    Can have any number

    of dimensions

    Usually 5 - 15

    How are snapshots added on ?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    12/58

    Exercise : Compare the ER Model with theDimensional Model of Data

    ER Model

    Designed for entering / storing data(transactions)

    Optimized for transactions: single

    row entry and retrieval

    Thousands of concurrent users

    No way to figure out what data isbusiness numbers (changing) and

    what is constant / static / near-

    static (eg. Regions, Products). All

    of them are fields or relations.

    Therefore tough to implement a

    query

    JOINs needed between any

    combination of tables. Therefore

    tough to implement a query

    Dimensional Model

    Designed for analysis / queryingby the user

    Optimized for bulk load and large,

    complex, unpredictable queries

    Few concurrent users

    What is constant / static / near-static (dimensions) and what are

    business numbers (facts) very

    clear. Therefore easier to

    implement a query

    JOINS only between the FactTable and each Dimension Table.

    Therefore easier to implement a

    query

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    13/58

    Data Marts

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    How would Data Marts created out of

    such a Data Warehouse look ?

    Similar. Some fields may be

    missing. Examples ?

    Dimension

    Dimension Dimension

    Corporate customers : No personal details

    Retail customers : No Organisational details

    Fact

    Data Cubes usually formed in Data Marts

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    14/58

    DW : Contents

    ER Model vs Dimensional Model Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    15/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    Line_Item

    Places_Order

    Sells_to

    Exercise : ER-Model to Dimensional Model

    Exercise : Convert this ER Model into a

    Dimensional Model (Star Schema)

    ContainsIs_Ordered

    Print for

    Students

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    16/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Dimensional Model

    FactEmp. Id

    Name

    Qualifications

    Cust IdCust Name

    Address

    DateQuarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    RateStar : Instead of keeping a relationship from Sales_Rep to

    Customer, the relationship is from both to line item

    Emp Id

    Cust Id

    Date

    Order Num

    Product Code

    Quantity

    What are the Foreign Keys in the Fact Table ?

    What is the primary key in the Fact Table ?

    New Dimension created : Time.

    Time will always be a dimension in a Data Warehouse

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    17/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Exercise : Is this a Normalised design ?

    FactEmp. Id

    Name

    Qualifications

    Cust IdCust Name

    Address

    DateQuarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product Code

    Quantity

    Print for

    Students

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    18/58

    Exercise : Is this a Normalised design ?

    In the Fact Table, Emp Id is functionally dependent

    on (Cust Id + Date) not the primary key Logically, every time Customer P places an order

    on Salesman Q, we will have one row in the fact

    table for this Customer, Salesman combination

    So redundancy. Cust Id should have been enough.

    Therefore anomalies ???

    Insert : Cannot insert a Customer Salesman

    relationship, till the Customer places an order

    Delete : If an order is cancelled, and this is the only

    order the salesman has from this Customer, we lose

    the Salesman Customer relationship

    Does this lack of normalisation cause a problem ?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    19/58

    Does lack of normalisation cause a problem ?

    A Datawarehouse has no updation, deletion or

    insertion

    Only snapshots getting added on with time

    So no anomalies ----- Lack of normalisation is not

    a problem The E-R Model tries to remove redundancy

    completely

    The Dimensional model tries to simplify theschema, and therefore brings in redundancy

    eg. the relationship between sales_rep and customer

    is repeated in every line_item where these two are

    involved

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    20/58

    Does lack of normalisation cause a problem contd. ?

    Cannot enter a Salesman Customer relationship till thecustomer places at least one order

    Instead it is shown as a relationship between a customerand a line item, and a salesperson and the same lineitem. The relationship is only through the line item (Fact)

    Is this a problem ?

    In a DW we decide what our focus is - those are thefacts.

    In this case our fact is the line items sold, not the

    relationship between the salesperson / customer rep andthe customer

    If the relationship (even without the order) is important tomaintain at is important, we create another Star Schema,around some other fact (say, Opportunity)

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    21/58

    Constellation

    Multiple STARs

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    22/58

    Exercise Implementing Data Marts

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    23/58

    DW : Contents

    ER Model vs Dimensional Model Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    24/58

    Which of these can be facts ?

    Region Sales No. of ComplaintsType of complaint Outstandings

    Premium paid Salary Colour

    Cash_on_hand collections

    breakages product customer

    interest

    Typical characteristics of facts ??

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    25/58

    Typical Characteristics of Facts

    Numerical

    Additive why ?

    Querying involves scanning lots of records

    The end result of the query should be short - one or two pages /

    screens

    Additive facts can provide this

    Examples ?

    Sales, Collections, Revenue, Expenses

    Continuously valued (even numbers (eg. no. of

    complaints / no. of transactions are consideredcontinuously valued)

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    26/58

    Will Facts always be additive ?

    Semi-additive Facts ? Explain

    Account Balance - Explain

    Can be added across some dimensions, not all

    Guidelines What forms additive facts and what forms semi-

    additive facts ?

    Flows vs Levels (eg. Deposits vs balance, eg. Collections vsCurrent outstandings)

    Non-additive Facts ? Explain

    Interest %age, %age target achievement, %age profit

    Cannot be added across any dimension

    Can this be converted into an Additive fact ?

    Convert interest %age to an absolute value

    When is this done ?

    ETL (Transform stage)

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    27/58

    Facts will usually be additive, or semi-

    additive. Avoid non-additive facts

    Additive Facts : Summarise

    However, it is possible to have facts without

    satisfying some or all of these conditions

    Ultimately, the designer decides.

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    28/58

    Review : Facts - Guidelines

    Numerical Continuously valued

    Additive

    Semi-additive

    Non-additive

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    29/58

    Dimensions

    Determined by what you want as row and columnheaders in your query reports :

    Usually :

    Textual

    Discrete Could also be numeric. Where ?

    Where they form column headers, and no calculations are done

    on them (eg. Age, Salary). Typically a range

    Time is always one dimension. Why ?

    Because of snapshots

    Dimensions are an entry point

    into a Data Warehouse

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    30/58

    Exercise : Facts or Dimensions ?

    Region Sales No. of ComplaintsType of complaint Outstandings

    Premium paid Salary Colour

    Cash_on_hand collectionsbreakages product customer

    interest

    The same thing can be modelled as a fact or

    as a dimension. Depends on the designerNumeric dimensions are in the form of a range

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    31/58

    DW : Contents

    ER Model vs Dimensional Model Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    32/58

    Clients

    Data Cubes

    BI Products and Vendors

    Data Marts

    Data

    Warehouse

    OLTP

    Databases

    DBMS

    VendorsOracle, Microsoft SQL Server, IBM (DB2),..

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    33/58

    Clients

    Data Cubes

    BI Products and Vendors

    Data Marts

    Data

    Warehouse

    OLTP

    Databases

    Provide everything except the OLTP DBMS and DW. ETL included

    BI Tool

    Vendors

    SAS, Cognos (IBM), Business Objects (SAP), Qlikview..

    I l ti D t W h

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    34/58

    Implementing a Data Warehouse Where should the Pilot be done ?

    Four Regions (rep by 4 teams) :1. Dynamic and keen Regional Manager very

    poor historical data

    2. Excellent historical data. RM interested butdoesnt have much time

    3. Recently started Region. Not much historical

    data, but good current data. RM interested,

    may spend some time

    4. Small, unimportant Region, but good RM,

    and interested. Good historical data, but not

    too much of it

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    35/58

    DW : Contents

    ER Model vs Dimensional Model Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

    Exercise : How big are the Fact and Dimension

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    36/58

    Exercise : How big are the Fact and DimensionTables ? a) Number of records b) Size in bytes

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    1 lakh customers, 10 regions.

    Data stored for the past 10 years

    Dimension

    Dimension Dimension

    Fact

    What if we store daily balances, and for each of the

    1000 branches ?

    Implications ? Space, speed. So what do we do ?

    Optimise on Fact table size. Ignore dimension tables !!!

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    37/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Optimisation : Exercise : Can we modify this Star

    Schema to cut down space ?

    FactEmp. Id

    Name

    Qualifications

    Cust Id

    Cust NameAddress

    Date

    Quarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product Code

    Quantity

    Is the Dimension

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    38/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Star Schema Option 2

    FactEmp. Id

    Name

    Qualifications

    Cust Id

    Cust NameAddress

    Date

    Quarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product Code

    Quantity

    Emp. Id

    Name

    Qualifications

    Advantage / Disadvantage ?

    Fact Table space vs. Ease of Querying

    Which one would you use ?

    Is the Dimension

    Table Normalised ? Denormalised

    Dimension Table

    More highly

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    39/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Star Schema Option 3

    FactEmp. Id

    Name

    Qualifications

    Cust Id

    Cust NameAddress

    Date

    Quarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product Code

    Quantity

    Emp. Id

    Name

    Qualifications

    Cust Id

    Cust Name

    Address

    Emp. Id

    Name

    Qualifications

    Advantage / Disadvantage ?

    Fact Table space vs. Ease of Querying

    Which one would you use ?

    More highly

    Denormalised

    Dimension Table

    Optimisation : What occupies the maximum space in

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    40/58

    Optimisation : What occupies the maximum space inthe Fact Table ?

    Cust. Id

    Month & Yr

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter Keys

    Dimension

    Dimension Dimension

    Fact

    How do we reduce the size of the keys ?

    Use surrogate keys

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    41/58

    Optimisation : Use Surrogate keys

    Operational Keys - Disadvantage ?

    English like Ids : occupy space

    Surrogate Keys - meaningless integers. 2 or 4

    byte integers most common. Advantage ?

    Much shorter Disadvantage ?

    Processing reqd to transform from op to surrogate

    In any case, when the data comes from multiple

    sources, keys in all but one of the sources need to

    change

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    42/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Exercise : Add surrogate keys to this schema

    FactEmp. Id

    Name

    Qualifications

    Cust Id

    Cust NameAddress

    Date

    Quarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product CodeQuantity

    Cust KeyEmp KeyEmp Key (PK)

    Cust Key (PK)

    Order KeyOrder Key

    Product Key

    Product Key

    Do we need both the original and the

    surrogate key in the Dimension Table ?

    Fact Table ?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    43/58

    SALES_REP

    ORDER

    CUSTOMER

    PRODUCT

    LINE_ITEM TIME

    Designing a Data Warehouse

    FactEmp. Id

    Name

    Qualifications

    Cust Id

    Cust NameAddress

    Date

    Quarter

    Order Num

    Credit TermsLead Time

    Product Code

    Product Name

    Brand

    Rate

    Emp Id

    Cust Id

    Date

    Order Num

    Product CodeQuantity

    Cust KeyEmp KeyEmp Key (PK)

    Cust Key (PK)

    Order KeyOrder Key

    Product Key

    Product KeyBased on this exercise, what is the process for

    converting an ER Model into a Dimensional

    Model (Data Warehouse)

    Date Key (PK)

    Date Key

    h i

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    44/58

    The DW Design Process

    Identify an association table as the central

    fact table

    Choose the Dimensions

    Add date (time) dimension Replace all operational keys with surrogate

    keys

    Promote foreign keys from each dimensiontable to the fact table

    Choose the Facts

    A h Di i li d ?

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    45/58

    Are the Dimensions normalised ?

    Cust. Id

    Month

    Region Code

    Balance

    Cust. Id

    Cust Name

    Address

    Phone

    Region Code

    Region Name

    Address

    Manager

    Month & Yr

    Quarter

    Dimension

    Dimension Dimension

    Fact

    Add fields to each dimension to make it denormalised

    Now, what does the schema look like if we

    normalise each dimension table ?Snowflake Schema

    Are Snowflake Schemas desirable ? Why ?

    Speed of querying. Complexity of querying for the user

    Thinking question : Is there

    any situation where we wouldnormalise a dimension table

    ?

    DW C

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    46/58

    DW : Contents

    ER Model vs Dimensional Model Designing a Data Warehouse

    Starting with the ER Model

    Facts and Dimensions

    BI Products and Vendors

    Data Warehouse Optimisation

    OLAP Implementation

    Representing dimensions

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    47/58

    Representing dimensionsSKU

    Brand

    Product

    Product Category

    Department

    All products

    Store

    Locality

    PIN Code

    City

    Region

    All

    Date

    Month

    Quarter

    Year

    All

    Promotion

    All

    How do we represent a query - eg. Get Sales by SKU by Store by Date by Promotion ? How do we show a Roll-up / Drill-down ?

    Representing dimensions

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    48/58

    Representing dimensionsSKU

    Brand

    Product

    Product Category

    Department

    All products

    Store

    Locality

    PIN Code

    City

    Region

    All

    Date

    Month

    Quarter

    Year

    All

    Promotion

    All

    For this query, we need to add fields across rows in the Fact Table. How

    many rows need to be summed? Problems ?Speed. Solution ?

    Pre-aggregate sums and store

    Multiple levels of aggregates

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    49/58

    Multiple levels of aggregatesSKU

    Brand

    Product

    Product Category

    Department

    All products

    Store

    Locality

    PIN Code

    City

    Region

    All

    Date

    Month

    Quarter

    Year

    All

    Promotion

    All

    Store multiple level aggregatesRedundancy : To speed up

    querying

    A ti I

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    50/58

    Aggregation : Issues

    When are aggregates computed ?

    During every update

    How do we decide what aggregates to keep ?

    Frequency of usage / repeat queries

    Priority of users

    Managers / Analysts should figure out the likely frequency. Therefore what aggregates to keep

    A ti I

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    51/58

    Aggregation : Issues

    Where are Aggregations stored ?

    Separate Fact table

    Families of Stars (Constellations)

    When are they computed ?

    During every update

    How do we decide what aggregates to keep ? Frequency of usage / repeat queries

    Priority of users

    Users should not be aware of aggregation. The software

    automatically uses the aggregate Fact table to answer thequery. Why ?

    I l i OLAP

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    52/58

    Implementing OLAP

    Relational OLAP Disc Implemented using a regular Relational DBMS

    Linked list structures

    Multi-Dimensional OLAP Disc MDDB Created in advance and stored for

    querying

    Array structures

    Advantages and Disadvantages ? Disc

    ROLAP vs MOLAP

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    53/58

    ROLAP vs MOLAP Linked List Structure slow

    Space Optimised only records

    that have some value are stored

    All data is available in the ROLAP.

    Can handle large DW

    No Pre-aggregated data

    therefore slow

    Array Structure therefore fast

    All cells in the Fact Table are

    stored whether they exist or not Therefore huge space (Explain)

    eg. (Bank example) A customer

    does not have any Account in a

    given branch

    A customer does not performany transaction in most of his

    accounts on specific days

    Therefore only small DW can be

    handled.

    For large DW, summarised data

    can be kept in the MDDB. Drilling

    down requires going back to

    ROLAP (Called HOLAP Hybrid

    OLAP)

    Pre-aggregated data therefore

    fast

    MOLAP

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    54/58

    MOLAP

    Sparse Matrix techniques used tooptimised space

    ROLAP s MOLAP

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    55/58

    ROLAP vs MOLAP

    DBMS vendors started off with ROLAP

    (knowhow already existed), but are now addingMOLAP

    Pure BI vendors largely into MOLAP (proprietary)

    Role Play Implementation

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    56/58

    Role Play Implementationacross Multiple Locations

    Book

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    57/58

    Book

    The Data Warehouse Toolkit RalphKendall, Margy Ross - Wiley

  • 7/27/2019 BI Session 6 Prof Dhruv Nath

    58/58

    Dhruv Nath

    BITech Session on Data Warehousing