Upload
uxb5154
View
144
Download
3
Embed Size (px)
Citation preview
Data Warehouse Upasana Bhasin
Inmon vs. Kimball
W. H. Inmon’s approach
According to Bill Inmon, “A data warehouse is a subject-oriented, integrated, time-variant and
non-volatile collection of data in support of management's decision making process” (Ponniah,
2010). Inmon supports the top-down approach for building a data warehouse in which instead of
collecting fragments of information, a big enterprise-wide data warehouse is built. In this
approach, a data warehouse is a centralized repository of data for the entire enterprise. The data
is stored at the lowest level of granularity in the data warehouse and should be available in both,
detailed and summarized levels with the help of drilling down and drilling up methods. The
information in the data warehouse is stored in 3rd normal form. The data warehouse consists of a
number of dependent data marts who source information from it. In the top-down approach, data
is extracted from operational data sources. The data is then loaded into the staging area where it
is validated to ensure accuracy. From there, the data is moved to Operation Data Store (ODS). In
order to avoid data extraction from ODS, in a parallel process; data is transported into the data
warehouse. Data from ODS is regularly extracted for aggregation and summarization into the
staging area and then loaded into the data warehouse. Once data is loaded into the data
warehouse, the data marts extract data from it and perform transformations on the data. After the
data marts are loaded with data, the Online Analytic Processing (OLAP) environment will be
available to the users.
R. Kimball’s approach
1
Data Warehouse Upasana Bhasin
According to R. Kimball, “A data warehouse is nothing more than the union of all the constituent
data marts.” Kimball supports the bottom-up approach for building a data warehouse in which
data marts are created first and contain data at the lowest level of granularity. All the data marts
are then joined together by conforming the dimensions. The data marts are connected to the data
warehouse with a bus structure which contains elements that are common to data marts such as
conformed dimensions, measures etc. In this approach, data from the operational systems is
loaded into the staging area where it is processed and consolidated. It is then moved to the ODS.
Once the ODS is loaded with fresh data, the data is extracted to the staging area where it is
processed and moved to the data marts. The data in the data mart is then moved to the staging
area where it is summarized and loaded into the data warehouse. The end users can access this
data for analysis.
Key differences in approach
According to Inmon, a data warehouse is a collection of data. It is inherently architected and is a
single and central storage of data. He supports the top-down approach in which traditional
relational database tools are used for the development a data warehouse. ER modeling technique
is used in this approach and information in the data warehouse is stored in 3 rd normal form. Data
can be accessed quickly if implemented with iterations. His approach towards data modeling is
subject-oriented. This method has exposure to high risk of failure. End-user accessibility is low.
This approach requires high level of cross-functional skills. The overall process is quite complex.
On the other hand, Kimball defines a data warehouse as a collection of all constituent data marts
with conformed dimensions. The data warehouse bus in the bus architecture helps in integration
of the data marts to create the data warehouse. He is in favor of the bottom-up approach which is
2
Data Warehouse Upasana Bhasin
user driven, comparatively easier to implement, supports multi-dimensional database design and
ensures consistency of metadata. His approach towards data modeling is process oriented. In this
method, star schemas are used to create denormalized dimensional models. The concept of
‘Conformed Dimensions’ is used to avoid data replication. There is no single source of
information. All data marts have their own narrow view of data. The data marts provide
information to the end-users for business analysis. This method has exposure to less risk of
failure. End-user accessibility is high. The main disadvantage of this method is that it causes
data fragmentation. The overall process is quite simple to use.
Key similarities/agreements in approach
In both the approaches, data is collected from various sources into the staging area where is it
integrated, transformed and then loaded into the data warehouse. The time attribute of data is
given importance in both the approaches. Both methods use the ETL process. Both Inmon and
Kimball agree that for enterprise wide data warehouse, stand-alone data marts are of minimal
use. Whether using Inmon’s approach or Kimball’s, it is necessary for the data warehouse team
to hire employees who have good soft skills both, substantially and effectively.
Find an article to discuss the Inmon vs. Kimball controversy, and write a brief critique of
the article. Your opinion as to which approach should produce a better design – with
supporting arguments. You should choose either Inmon or Kimball, Not “it depends”
approach.
The article, ‘Data Warehousing Battle of the Giants: Comparing the Basics of the Kimball and
Inmon Models’ explains the nature and history of data warehouse and helps in understanding
both, the Inmon and Kimball model in detail. It also provides a list of characteristics as a basis
3
Data Warehouse Upasana Bhasin
for determining which approach is appropriate for developing a data warehouse for a particular
organization.
A number of factors have to be considered while developing a data warehouse such as resources,
user requirements, level of granularity etc. I would prefer using Kimball’s approach for
developing a data warehouse because each data mart contains information specific to a particular
business area. Managing individual data marts is much faster and easier than managing a
centralized data warehouse. The overall process is quite simple to use and involves less risk of
failure. The use of dimensional modeling in Kimball’s approach helps in providing high level of
performance. This approach is user driven, ensures consistency of metadata and end-user
accessibility is high.
4
Data Warehouse Upasana Bhasin
REFERENCES
1. Ponniah, P. (2010). Data Warehousing Fundamentals, a Comprehensive Guide for IT
Professionals. Wiley & Sons.
2. Kimball, R & Ross, M. (2002). The Data Warehouse Toolkit. Wiley & Sons.
3. Inmon, W. (2005). Building the Data Warehouse. Wiley & Sons.
4. Retrieved September 20, 2010 from http://www.exforsys.com/tutorials/msas/data-
warehouse-design-kimball-vs-inmon.html
5. Retrieved September 20, 2010 from http://www.bi-bestpractices.com/view-articles/4768
6. Retrieved September 20, 2010 from http://mydbaworld.wordpress.com/2009/07/23/bill-
inmon-vs-ralph-kimball/
7. Retrieved September 20, 2010 from
http://www.information-management.com/infodirect/19990901/1400-1.html
5