24
Understanding Data Warehousing Introduction

Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Embed Size (px)

Citation preview

Page 1: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Understanding Data Warehousing

Introduction

Page 2: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Introduction

Data has always been an essential ingredient to decision-making and, in modern business, the need to obtain, store, and use data has increased dramatically as the complexities and scope of the global marketplace has

expanded.

Data warehousing is an environment established for the sole purpose of gathering, integrating, and delivering

data from across multiple data sources for use in enterprise decision-making. However, its effectiveness

can be expanded to support any person, process, or system needing current and historical data which is

consistent and relatable.

Page 3: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Defining Data Warehouse

Data warehouse is a computing environment composed of several technologies and products, including:

– Data Acquisition– Data Management– Data Modeling– Data Quality– Data Analysis– Metadata Management– Development Tools– Storage Management– Applications– Administrative Functions

Page 4: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Defining Data Warehouse (Part 2)

Data Warehousing is about managing the data. The following data features are key reasons for having a data warehouse:

– Subject Orientation– Data Integration– Non-volatile– Time Variance– Data Granularity

Page 5: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Benefits of Data Warehousing

Data Warehousing provides the following benefits: A comprehensive and integrated perspective of the

enterprise Availability of current and historical information for

strategic decision making Mitigating operational risks related to supporting

the decision-making process Providing a flexible and interactive source of

information

Page 6: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Introducing Business Intelligence

Business Intelligence is a set of disciplines designed specifically to establish a consistent decision-making environment.

Business Intelligence does not replace Data Warehousing, but uses it extensively in it processes.

Business Intelligence can be described as a two-step process:

– Transforming data into information– Transforming information into knowledge

Page 7: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Functional Components of a Data Warehouse

Data Acquisition

Data StorageInformation

Delivery

Page 8: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Physical Components of a Data Warehouse

Data Sources

Data Staging

Data Storage

Information Delivery

Metadata

Page 9: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Source Data

Data Sources can include:– Operational Data– Internal Data – Archived Data– External Data

Data can consist of structured or unstructured, prepared or raw formats.

Page 10: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Data Staging

The activities of data staging are:– Extracting data from the data sources– Transforming the data into usable information– Loading the data and metadata into data storage

ETL (Data Extraction, Transformation, and Loading) is considered the most time-consuming and human-intensive activities in data warehousing.

Page 11: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Data Quality

One purpose of Data Staging is to raise the quality of the data used in decision making: bad data will lead to bad decisions.

Data Quality is influenced by:– Inadequate database designs– Aging of data– Dummy or absent data – Non-unique identifies– Ineffective primary keys– Violation of business rules– Lack of policies and procedures– Input errors

Page 12: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Data Storage

Organizations must establish the storage requirements for:

– Data staging– Corporate data warehouse– Individual data marts– OLAP-based multidimensional databases

Page 13: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Information Delivery

The requirements for Information Delivery reside in expectations related to:

– Query types and frequencies– Report types and frequencies– Types of analysis– Distribution of information– Real-time requirements– Applications for decision support– Potential growth and expansion

Page 14: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Metadata

The core of the data warehouse is its

METADATA

Page 15: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

What is a Data Mart?

A data mart is a subset of a data warehouse. A data warehouse will typically contain data relevant to the entire enterprise, while a data mart contains data relevant to a line of business or department within the enterprise.

Deployment of data warehouses and data marts will usually take one of the following approaches:

– Top-down (data warehouse first, data marts second)– Bottom-up (data marts first, data warehouse second)

Page 16: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Data Warehouse Architecture

There are five basic architectures in data warehousing:

– Centralized Data Warehouse – one data warehouse with no data marts.

– Independent Data Marts – several autonomous data marts with no central data warehouse.

– Federated Data Marts – several data marts operating under standardized controls with no central warehouse.

– Hub-and-Spoke – several data marts with a central data warehouse.

– Data-Mart Bus – several data marts are created and conform to the standards and controls of the original data mart.

Page 17: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Why Data Warehousing?

What does a data warehouse provide the user?

– Ability to run simple queries and reports against current and historical data

– Ability to perform “what if” scenarios– Ability to iteratively query and analyze deeper into the

data– Ability to identify historical trends and apply them

effectively to future situations.

Page 18: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Challenges in Data Acquisition

The typical challenges facing data acquisition activities are:

– Large number of data sources– Disparate data sources– External data sources– Ongoing data feeds– Different computing platforms– Data replication– Data integration– Data cleansing– Complex data transformations

Page 19: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Challenges in Data Storage

The typical challenges facing data storage activities are:

– Large data volumes– Large data sets– New data types– Data storage in staging area– Multiple index types– Parallel processing– Data archiving– Tool compatibilities

Page 20: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Challenges in Information Delivery

The typical challenges facing information delivery activities are:

– Multiple user types– Multiple query types– Complex queries– OLAP– Multidimensional analysis– Web-enablement– Metadata management– Tools from multiple vendors

Page 21: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Relevant Data Warehouse Standards

Relevant standards for data warehousing, specifically metadata, are provided through:

– Meta Data Coalition– Object Management Group– OLAP Council for Multi-dimensional Application

Programmers Interface (MDAPI)

Page 22: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Basic Project Plan

The basic plan for a data warehouse project is:

– Planning– Defining requirements– Design– Build– Deploy– Maintain

Page 23: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

The Toolkit

The Toolkit is designed to be holistic to the enterprise’s relationship with data, not just data warehousing. As part of its scope, a second presentation is available to introduce Data Analytics and Data Mining, which is related to the second step of Business Intelligence.

The goal of the Data Warehouse/Analytics Toolkit is to define the contributing factors, major components, and their relationships, while providing the basic tools to take action based on the organization’s needs.

Page 24: Understanding Data Warehousing Introduction. Data has always been an essential ingredient to decision- making and, in modern business, the need to obtain,

Moving Forward

The participant can take two directions in using the toolkit at this point. To continue with the data warehouse discussion, the next document of interest is, Developing Warehouse Capabilities, which is intended to be a step-by-step guide in creating a Big Data foundation in your organization. To learn more about data-related activities within an enterprise, see the presentation, Introduction to Data Analytics and Mining.

. Multiple templates have been created to support the process and aid organizations in their efforts to improve their Data Warehouse and Data Analytic capabilities.