Chapter 14Chapter 14The Data WarehouseThe Data Warehouse
Fundamentals of Database Management Systemsby
Mark L. Gillenson, Ph.D.
University of Memphis
Presentation by: Amita Goyal Chin, Ph.D.
Virginia Commonwealth University
John Wiley & Sons, Inc.
14-14-22
Chapter ObjectivesChapter Objectives
Compare the data needs of transaction Compare the data needs of transaction processing systems with those of decision processing systems with those of decision support systems. support systems.
Describe the data warehouse concept and Describe the data warehouse concept and list its main features. list its main features.
Compare the enterprise data warehouse Compare the enterprise data warehouse with the data mart. with the data mart.
14-14-33
Chapter ObjectivesChapter Objectives
Design a data warehouse. Design a data warehouse.
Build a data warehouse, including the Build a data warehouse, including the steps of data extraction, data cleaning, steps of data extraction, data cleaning, data transformation, and data loading. data transformation, and data loading.
Describe how to use a data warehouse Describe how to use a data warehouse with online analytic processing and data with online analytic processing and data mining. mining.
14-14-44
Chapter ObjectivesChapter Objectives
List the types of expertise needed to List the types of expertise needed to administer a data warehouse. administer a data warehouse.
List the challenges in data warehousing.List the challenges in data warehousing.
14-14-55
Application SystemsApplication Systems
Transaction Processing Systems (TPS)Transaction Processing Systems (TPS) Everyday application systems that support Everyday application systems that support
banking and insurance operations, manage banking and insurance operations, manage the parts inventory on manufacturing the parts inventory on manufacturing assembly lines, keep track of airline and hotel assembly lines, keep track of airline and hotel reservations, support Web-based sales, etc.reservations, support Web-based sales, etc.
Decision Support Systems (DSS)Decision Support Systems (DSS) specifically designed to aid managers in specifically designed to aid managers in
decision-making tasks.decision-making tasks.
14-14-66
The Data Warehouse ConceptThe Data Warehouse Concept
A data warehouse is a broad-based, A data warehouse is a broad-based, shared database for management decision shared database for management decision making that contains data that has been making that contains data that has been accumulated over time.accumulated over time.
Formally, a database warehouse is, “a Formally, a database warehouse is, “a subject oriented, integrated, non-volatile, subject oriented, integrated, non-volatile, and time variant collection of data in and time variant collection of data in support of management’s decisions.”support of management’s decisions.”
14-14-77
Characteristics of Characteristics of Data Warehouse DataData Warehouse Data
The data is subject orientedThe data is subject oriented The data is integratedThe data is integrated The data is non-volatileThe data is non-volatile The data is time variantThe data is time variant The data must be high qualityThe data must be high quality The data may be aggregatedThe data may be aggregated The data is often denormalizedThe data is often denormalized The data is not necessarily absolutely currentThe data is not necessarily absolutely current
14-14-88
The Data is Subject OrientedThe Data is Subject Oriented
Data warehouses are organized around Data warehouses are organized around subjects, really the major entities of subjects, really the major entities of concern in the business environment.concern in the business environment. Sales, customers, orders, claims, accounts, Sales, customers, orders, claims, accounts,
employees, other entities that are central to employees, other entities that are central to the company’s business.the company’s business.
14-14-99
The Data is IntegratedThe Data is Integrated Data about each of the subjects in the data warehouse is typically Data about each of the subjects in the data warehouse is typically
collected from several of the company’s transactional databases, collected from several of the company’s transactional databases, each of which supports one or more applications that have each of which supports one or more applications that have something to do with the particular subject.something to do with the particular subject.
All of the data about a subject must be organized or integrated in All of the data about a subject must be organized or integrated in such a way that it provides a unified, overall picture of all the such a way that it provides a unified, overall picture of all the important details about the subject over time.important details about the subject over time.
Data from disparate application databases must be transformed into Data from disparate application databases must be transformed into common measurements, codes, data types.common measurements, codes, data types.
14-14-1010
The Data is Non-VolatileThe Data is Non-Volatile
Once data is added to the data Once data is added to the data warehouse, it doesn’t change.warehouse, it doesn’t change.
It will never change. Changing it would be It will never change. Changing it would be like going back and rewriting history.like going back and rewriting history.
14-14-1111
The Data is Time VariantThe Data is Time Variant
Data warehouse data, with its historic nature, Data warehouse data, with its historic nature, always includes some kind of a timestamp.always includes some kind of a timestamp.
If we are storing sales data on a weekly or If we are storing sales data on a weekly or monthly basis and we have accumulated ten monthly basis and we have accumulated ten years of such historic data, each weekly or years of such historic data, each weekly or monthly sales figure must be accompanied by a monthly sales figure must be accompanied by a timestamp indicating the week or month (and timestamp indicating the week or month (and year!) that it represents.year!) that it represents.
14-14-1212
The Data Must Be The Data Must Be High QualityHigh Quality
Consider a section of a data warehouse in which the Consider a section of a data warehouse in which the subject is customer.subject is customer.
If there is a customer address misspelling in one If there is a customer address misspelling in one transactional file, when the data from that file is transactional file, when the data from that file is integrated with the data from the other transactional files, integrated with the data from the other transactional files, there will be some difficulty in reconciling whether the there will be some difficulty in reconciling whether the two different addresses both represent one customer, or two different addresses both represent one customer, or whether they actually represent two different customers.whether they actually represent two different customers.
This must be reconciled as the data is integrated and This must be reconciled as the data is integrated and entered into the data warehouse.entered into the data warehouse.
14-14-1313
The Data May Be AggregatedThe Data May Be Aggregated
The type of data that management requires for decision The type of data that management requires for decision making is generally summarized data.making is generally summarized data.
The sheer volume of all the historic detail data would The sheer volume of all the historic detail data would make the data warehouse unacceptably huge in many make the data warehouse unacceptably huge in many cases.cases.
If the detail data was stored in the data warehouse, the If the detail data was stored in the data warehouse, the amount of time that it would take to summarize the data amount of time that it would take to summarize the data for management every time a query was posed would for management every time a query was posed would often be unacceptable.often be unacceptable.
14-14-1414
The Data is Often The Data is Often DenormalizedDenormalized
If a company is willing to tolerate the substantial If a company is willing to tolerate the substantial additional space taken up by the redundant additional space taken up by the redundant denormalized data, it can gain the advantage of the denormalized data, it can gain the advantage of the improved query performance that redundancy provides improved query performance that redundancy provides without paying the penalties of increased update time without paying the penalties of increased update time and potential data integrity problems.and potential data integrity problems.
This works because the data integrity problems that can This works because the data integrity problems that can be caused by redundant data only arise when the data is be caused by redundant data only arise when the data is updated. The historic data in the data warehouse will not updated. The historic data in the data warehouse will not be updated.be updated.
14-14-1515
The Data is Not Necessarily The Data is Not Necessarily Absolutely CurrentAbsolutely Current
Data warehouse data is updated at some Data warehouse data is updated at some time interval -- weekly, monthly, etc.time interval -- weekly, monthly, etc.
Any changes since the last data Any changes since the last data warehouse update are not recorded in it warehouse update are not recorded in it until the next scheduled update.until the next scheduled update.
Inconsequential when looking at long-term Inconsequential when looking at long-term trends.trends.
14-14-1616
Types of Data WarehousesTypes of Data Warehouses
Enterprise Data Enterprise Data Warehouse (EDW)Warehouse (EDW)
Data Mart (DM)Data Mart (DM)
14-14-1717
Enterprise Data WarehouseEnterprise Data Warehouse
Large-scale; incorporates the data of an entire Large-scale; incorporates the data of an entire company or of a major division, site, or activity of company or of a major division, site, or activity of a company.a company.
A full scale EDW is built around several different A full scale EDW is built around several different subjects.subjects.
Support a wide variety of DSS applications and Support a wide variety of DSS applications and serve as a data resource with which company serve as a data resource with which company managers can explore new ways of using the managers can explore new ways of using the company’s data to its advantage.company’s data to its advantage.
14-14-1818
The Data MartThe Data Mart
Small-scale; designed to support a small Small-scale; designed to support a small part of an organization.part of an organization.
A company will often have several DMs.A company will often have several DMs.
Are based on a limited number of subjects Are based on a limited number of subjects (possibly one) and are constructed from a (possibly one) and are constructed from a limited number of transactional databases.limited number of transactional databases.
14-14-1919
Which to Choose:Which to Choose:The EDW, the DM, or Both?The EDW, the DM, or Both?
It depends from company to company.It depends from company to company.
Top-down development implies that the EDW was Top-down development implies that the EDW was created first and then later data was extracted from an created first and then later data was extracted from an EDW to create one or more DMs.EDW to create one or more DMs.
A company that has deliberately or as a matter of A company that has deliberately or as a matter of circumstance developed a series of independent DMs circumstance developed a series of independent DMs may decide, in a bottom-up development fashion to build may decide, in a bottom-up development fashion to build an EDW out of the existing DMs.an EDW out of the existing DMs.
14-14-2020
Designing a Data WarehouseDesigning a Data Warehouse
Two characteristics of data warehouses are Two characteristics of data warehouses are central to any design:central to any design: The subject orientation.The subject orientation. The historic nature of the data.The historic nature of the data.
Data warehouses are often referred to as Data warehouses are often referred to as multidimensional databasesmultidimensional databases because each because each occurrence of the subject is referenced by an occurrence of the subject is referenced by an occurrence of each of several dimensions or occurrence of each of several dimensions or characteristics of the subject, one of which is characteristics of the subject, one of which is time.time.
14-14-2121
Multidimensional DatabasesMultidimensional Databases
Two dimensions can easily be visualized on a Two dimensions can easily be visualized on a flat piece of paper.flat piece of paper.
14-14-2222
Multidimensional DatabasesMultidimensional Databases
Three dimensions can easily be visualized on a flat piece of paper as a cube.Three dimensions can easily be visualized on a flat piece of paper as a cube.
Four or more dimensions are more difficult to visualize.Four or more dimensions are more difficult to visualize.
14-14-2323
Storing Multidimensional DataStoring Multidimensional Data
There is much interest in storing There is much interest in storing multidimensional data in relational databases.multidimensional data in relational databases.
The The star schemastar schema.. Visual design in which the subject is in the middle and Visual design in which the subject is in the middle and
the dimensions radiate outwards.the dimensions radiate outwards.
Have a “fact table” which represents the data Have a “fact table” which represents the data warehouse “subject” and several “dimension tables.”warehouse “subject” and several “dimension tables.”
14-14-2424
General Hardware Company General Hardware Company Data WarehouseData Warehouse
Here is the General Hardware transactional database.Here is the General Hardware transactional database.
14-14-2525
General Hardware Company General Hardware Company Data WarehouseData Warehouse
SALE is the fact table.SALE is the fact table. Like any relational Like any relational
table, must have a table, must have a primary key.primary key.
Dimension tables:Dimension tables: SALESPERSONSALESPERSON PRODUCTPRODUCT TIME PERIODTIME PERIOD
14-14-2626
General Hardware Company General Hardware Company Data WarehouseData Warehouse
14-14-2727
Good Reading Bookstores Good Reading Bookstores Data WarehouseData Warehouse
Do they need a data warehouse, since they Do they need a data warehouse, since they already store a date attribute?already store a date attribute?
Yes, for two reasons:Yes, for two reasons: While the transactional database performs acceptably While the transactional database performs acceptably
with perhaps the last couple of months of data in it, its with perhaps the last couple of months of data in it, its performance would degrade to an unacceptable level performance would degrade to an unacceptable level if we tried to keep ten years of data in it.if we tried to keep ten years of data in it.
The kinds of management decision making that The kinds of management decision making that require long-term historic sales data require require long-term historic sales data require aggregate not daily data.aggregate not daily data.
14-14-2828
Good Reading Bookstores Good Reading Bookstores Data WarehouseData Warehouse
SALE is the fact table.SALE is the fact table. Like any relational table, must Like any relational table, must
have a primary key.have a primary key.
Dimension tables:Dimension tables: BOOKBOOK PUBLISHERPUBLISHER CUSTOMERCUSTOMER TIME PERIODTIME PERIOD
Snowflake designSnowflake design One dimension table (BOOK) One dimension table (BOOK)
leads to another dimension leads to another dimension table (PUBLISHER).table (PUBLISHER).
14-14-2929
Lucky Rent-A-Car Data Lucky Rent-A-Car Data WarehouseWarehouse
RENTAL is the fact table.RENTAL is the fact table. Does not contain aggregated Does not contain aggregated
data.data.
Dimension tables:Dimension tables: CARCAR MANUFACTURERMANUFACTURER CUSTOMERCUSTOMER TIME PERIODTIME PERIOD
Snowflake designSnowflake design One dimension table (CAR) One dimension table (CAR)
leads to another dimension table leads to another dimension table (MANUFACTURER).(MANUFACTURER).
14-14-3030
What About a World Music What About a World Music Association Data Warehouse?Association Data Warehouse? There is already a Year attribute in the RECORDING There is already a Year attribute in the RECORDING
table.table.
The essence of the WMA data is historic.The essence of the WMA data is historic.
By its nature, the amount of data in a WMA type By its nature, the amount of data in a WMA type transactional database is much lower than the amount of transactional database is much lower than the amount of data in a Good Reading or Lucky-type transactional data in a Good Reading or Lucky-type transactional database.database.
Since the nature of the WMA transactional database blurs Since the nature of the WMA transactional database blurs with what a WMA data warehouse would look like, no with what a WMA data warehouse would look like, no WMA data warehouse is needed.WMA data warehouse is needed.
14-14-3131
Building a Data WarehouseBuilding a Data Warehouse
Data ExtractionData Extraction
Data CleaningData Cleaning
Data TransformationData Transformation
Data LoadingData Loading
14-14-3232
Building a Data Warehouse:Building a Data Warehouse:Data ExtractionData Extraction
Process of copying the data from the transactional Process of copying the data from the transactional databases in preparation for loading it into the data databases in preparation for loading it into the data warehouse.warehouse.
This is not a one-time event.This is not a one-time event.
The data is likely to come from several transactional The data is likely to come from several transactional databases.databases.
Some of the data entering into this process may come Some of the data entering into this process may come from outside of the company (data enrichment).from outside of the company (data enrichment).
14-14-3333
Lucky Rent-A-Car with Lucky Rent-A-Car with Enrichment DataEnrichment Data
In the CUSTOMER In the CUSTOMER table, Customer Age, table, Customer Age, Customer Income, and Customer Income, and Customer Education is Customer Education is the enrichment data.the enrichment data.
14-14-3434
Data CleaningData Cleaning
Transactional data can have all kinds of Transactional data can have all kinds of errors in it.errors in it.
Data warehouses are very sensitive to Data warehouses are very sensitive to data errorsdata errors Data errors must be “cleaned” or “cleansed” Data errors must be “cleaned” or “cleansed”
or “scrubbed” as the data is loaded into the or “scrubbed” as the data is loaded into the data warehouse.data warehouse.
14-14-3535
Data CleaningData Cleaning
There are two steps to cleaning transactional There are two steps to cleaning transactional data in preparation for loading it into a data data in preparation for loading it into a data warehouse.warehouse.
Identify the problem data.Identify the problem data.• Due to the massive volume of data, this is typically done Due to the massive volume of data, this is typically done
using a program.using a program.
Fix it.Fix it.• Can be handled by using sophisticated artificial intelligence Can be handled by using sophisticated artificial intelligence
programs or by creating exception reports for employees to programs or by creating exception reports for employees to scrutinize.scrutinize.
14-14-3636
Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning
Errors in Errors in Customer:Customer: Missing data - Missing data -
in row 1, city is in row 1, city is blank.blank.
Questionable Questionable data - the state data - the state for rows 2 & 6 for rows 2 & 6 should be the should be the same.same.
14-14-3737
Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning
Errors in Errors in Customer:Customer: Possible Possible
Misspelling - do Misspelling - do rows 3 & 8 rows 3 & 8 refer to the refer to the same person?same person?
Impossible Impossible Data - row 10s Data - row 10s state “RP” is state “RP” is wrong.wrong.
14-14-3838
Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning
Errors in SALE:Errors in SALE: Questionable data Questionable data
- is the book - is the book quantity of 21 in quantity of 21 in row 2 correct?row 2 correct?
Impossible/Out-of-Impossible/Out-of-Range Data - row Range Data - row 5 indicates that a 5 indicates that a single book costs single book costs $3,200.99.$3,200.99.
14-14-3939
Good Reading Bookstores Good Reading Bookstores Before Data CleaningBefore Data Cleaning
Errors in SALE:Errors in SALE: Apparently Apparently
Incorrect Data - Incorrect Data - there is no there is no customer number customer number 12738, as stated 12738, as stated in row 8.in row 8.
Impossible Data - Impossible Data - row 10 shows a row 10 shows a negative price for negative price for a book, which is a book, which is impossible.impossible.
14-14-4040
Data TransformationData Transformation
As the data is extracted from the transactional As the data is extracted from the transactional databases, it must go through several kinds of databases, it must go through several kinds of data transformations on its way to the data data transformations on its way to the data warehouse:warehouse: Data from different transactional databases being Data from different transactional databases being
merged to form the data warehouse tables.merged to form the data warehouse tables.
Data will often be aggregated as it is being extracted Data will often be aggregated as it is being extracted from the transactional databases and prepared for the from the transactional databases and prepared for the data warehouse.data warehouse.
14-14-4141
Data TransformationData Transformation Units of measure used for attributes in different transactional Units of measure used for attributes in different transactional
databases must be reconciled as they are being merged into databases must be reconciled as they are being merged into common data warehouse tables.common data warehouse tables.
Coding schemes used for attributes in different transactional Coding schemes used for attributes in different transactional databases must be reconciled as they are being merged into databases must be reconciled as they are being merged into common data warehouse tables.common data warehouse tables.
Sometimes values from different attributes in transactional Sometimes values from different attributes in transactional databases are combined into a single attribute in the data databases are combined into a single attribute in the data warehouse (e.g., employee name).warehouse (e.g., employee name).
14-14-4242
Data LoadingData Loading
After all of the extracting, cleaning, and After all of the extracting, cleaning, and transforming, the data is ready to be transforming, the data is ready to be loaded into the data warehouse.loaded into the data warehouse.
A schedule for regularly updating the data A schedule for regularly updating the data warehouse must be put in place.warehouse must be put in place.
14-14-4343
Using a Data WarehouseUsing a Data Warehouse
Online analytic processing (OLAP)Online analytic processing (OLAP)
Data MiningData Mining
14-14-4444
Online Analytic ProcessingOnline Analytic Processing
A decision support methodology based on A decision support methodology based on viewing data in multiple dimensions.viewing data in multiple dimensions.
There are many OLAP systems on the There are many OLAP systems on the market today.market today.
The OLAP environment’s multidimensional The OLAP environment’s multidimensional data is very well suited for querying and data is very well suited for querying and for multi-time period trend analyses.for multi-time period trend analyses.
14-14-4545
Online Analytic ProcessingOnline Analytic Processing
Drill-DownDrill-Down Going back to the database and retrieving finer levels Going back to the database and retrieving finer levels
of data detail than you have already retrieved.of data detail than you have already retrieved.
SliceSlice A subset of the data that focuses on a single value of A subset of the data that focuses on a single value of
one of the dimensions.one of the dimensions.
Pivot or RotationPivot or Rotation Merely a matter of interchanging the data dimensions.Merely a matter of interchanging the data dimensions.
14-14-4646
Online Analytic ProcessingOnline Analytic Processing
A A sliceslice of the of the patient data patient data cube.cube.
14-14-4747
Data MiningData Mining
The searching out of hidden knowledge in the The searching out of hidden knowledge in the company’s data that can give the company a company’s data that can give the company a competitive advantage in its marketplace.competitive advantage in its marketplace.
Due to the massive volume of data warehouse Due to the massive volume of data warehouse data, data mining must be done by software.data, data mining must be done by software. Case-based learningCase-based learning Decision treesDecision trees Neural networksNeural networks Genetic algorithmsGenetic algorithms
14-14-4848
Data Mining Application: Data Mining Application: Market Based AnalysisMarket Based Analysis
Consider the data collected by a supermarket as it checks out its Consider the data collected by a supermarket as it checks out its customers by scanning the bar codes on the products they’re customers by scanning the bar codes on the products they’re purchasing.purchasing.
The company might have software study the collected market The company might have software study the collected market baskets, each of which is literally the goods that a particular baskets, each of which is literally the goods that a particular customer bought in one trip to the store.customer bought in one trip to the store.
The software might try to discover whether certain items “fall into” The software might try to discover whether certain items “fall into” the same market basket more frequently than would otherwise be the same market basket more frequently than would otherwise be expected.expected.
Then the items often bought in the same shopping trip can be Then the items often bought in the same shopping trip can be placed next to each other in the store to remind someone buying placed next to each other in the store to remind someone buying one that they might also need the other.one that they might also need the other.
14-14-4949
Data Mining: Lucky Rent-A CarData Mining: Lucky Rent-A Car
ClassManufacturerName Cost
CustomerNumber Age Income Education
1 Compact Ford 320 884730 54 58,000 B.A.2 Luxury Lincoln 850 528262 45 158,000 M.B.A.3 Full-Size General Motors 489 109565 48 62,000 B.S.4 Sub-Compact Toyota 159 532277 25 34,000 High School5 Luxury Lincoln 675 155434 42 125,000 Ph.D.6 Compact Chrysler 360 965578 64 47,500 High School7 Mid-Size Nissan 429 688632 31 43,000 M.B.A.8 Luxury Lincoln 925 342786 47 95,000 M.A.9 Full-Size General Motors 480 385633 51 72,000 B.S.
10 Compact Toyota 230 464367 64 200,000 M.A.11 Luxury Jaguar 1170 528262 45 158,000 M.B.A.12 Sub-Compact Nissan 89 759930 29 28,000 B.A.13 Full-Size Ford 335 478432 57 53,500 B.S.14 Full-Size Chrysler 328 207867 29 162,000 Ph.D.CAR/RENTAL/CUSTOMER
A data mining application may look for patterns in the A data mining application may look for patterns in the data.data. Rows 2, 5, 8, and 11 all involve rentals of luxury class cars with Rows 2, 5, 8, and 11 all involve rentals of luxury class cars with
high-cost (revenue to the company) figures.high-cost (revenue to the company) figures.
14-14-5050
Data Mining: Lucky Rent-A CarData Mining: Lucky Rent-A Car
ClassManufacturerName Cost
CustomerNumber Age Income Education
1 Compact Ford 320 884730 54 58,000 B.A.2 Luxury Lincoln 850 528262 45 158,000 M.B.A.3 Full-Size General Motors 489 109565 48 62,000 B.S.4 Sub-Compact Toyota 159 532277 25 34,000 High School5 Luxury Lincoln 675 155434 42 125,000 Ph.D.6 Compact Chrysler 360 965578 64 47,500 High School7 Mid-Size Nissan 429 688632 31 43,000 M.B.A.8 Luxury Lincoln 925 342786 47 95,000 M.A.9 Full-Size General Motors 480 385633 51 72,000 B.S.
10 Compact Toyota 230 464367 64 200,000 M.A.11 Luxury Jaguar 1170 528262 45 158,000 M.B.A.12 Sub-Compact Nissan 89 759930 29 28,000 B.A.13 Full-Size Ford 335 478432 57 53,500 B.S.14 Full-Size Chrysler 328 207867 29 162,000 Ph.D.CAR/RENTAL/CUSTOMER
If, as is the case here, these similar rentals were made by If, as is the case here, these similar rentals were made by people with similar demographics, a “cluster”, then future people with similar demographics, a “cluster”, then future marketing can concentrate on selling this product to people marketing can concentrate on selling this product to people with these demographics.with these demographics.
14-14-5151
Administering a Data Administering a Data WarehouseWarehouse
The data warehouse requires a serious level of The data warehouse requires a serious level of management.management.
Data warehouse administrator - personnel Data warehouse administrator - personnel specialization in the management of the data specialization in the management of the data warehouse.warehouse.
Three kinds of employee expertise is required:Three kinds of employee expertise is required: Business expertiseBusiness expertise Data expertiseData expertise Technical expertiseTechnical expertise
14-14-5252
Administering a Data Administering a Data Warehouse: Business Warehouse: Business
ExpertiseExpertise An understanding of the company’s business An understanding of the company’s business
processes that underlies an understanding of the processes that underlies an understanding of the company’s transactional data and databases.company’s transactional data and databases.
An understanding of the company’s business An understanding of the company’s business goals to help in determining what data should be goals to help in determining what data should be stored in the data warehouse for eventual OLAP stored in the data warehouse for eventual OLAP and data mining purposes.and data mining purposes.
14-14-5353
Administering a Data Administering a Data Warehouse: Data ExpertiseWarehouse: Data Expertise
An understanding of the company’s transactional An understanding of the company’s transactional data and databases for selection and integration data and databases for selection and integration into the data warehouse. into the data warehouse.
An understanding of the company’s transactional An understanding of the company’s transactional data and databases to design and manage data data and databases to design and manage data cleaning and data transformation, as necessary. cleaning and data transformation, as necessary.
Familiarity with outside data sources for the Familiarity with outside data sources for the acquisition of enrichment data.acquisition of enrichment data.
14-14-5454
Administering a Data Administering a Data Warehouse: Technical Warehouse: Technical
ExpertiseExpertise An understanding of data warehouse An understanding of data warehouse
design principles for the initial design. design principles for the initial design.
An understanding of OLAP and data An understanding of OLAP and data mining techniques so that the data mining techniques so that the data warehouse design will properly support warehouse design will properly support these processes.these processes.
14-14-5555
Administering a Data Administering a Data Warehouse: Technical Warehouse: Technical
ExpertiseExpertise An understanding of the company’s transactional An understanding of the company’s transactional
databases in order to manage or coordinate the databases in order to manage or coordinate the regularly scheduled appending of new data to regularly scheduled appending of new data to the data warehouse. the data warehouse.
An understanding of how to handle very large An understanding of how to handle very large databases with their unique requirements for databases with their unique requirements for security, backup and recovery, being split across security, backup and recovery, being split across multiple disk devices, etc.multiple disk devices, etc.
14-14-5656
Challenges in Data Challenges in Data WarehousingWarehousing
Data cleaning and finding more “dirty” data than Data cleaning and finding more “dirty” data than expected.expected.
Problems associated with coordinating the regular Problems associated with coordinating the regular appending of new data from the transactional appending of new data from the transactional databases to the data warehouse.databases to the data warehouse.
Difficulties in managing very large databases.Difficulties in managing very large databases.
The challenge of building and maintaining the data The challenge of building and maintaining the data dictionary.dictionary.
14-14-5757
“Copyright 2004 John Wiley & Sons, Inc. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information contained herein.”