41
1 Data warehousing A Warehouse to Learn and Earn. An alternative for book keeping.

1 Data warehousing A Warehouse to Learn and Earn. An alternative for book keeping

Embed Size (px)

Citation preview

Page 1: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

1

Data warehousing

A Warehouse to Learn and Earn.

An alternative for book keeping.

Page 2: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

2

Starting with data warehousing

• The data warehousing market consists of tools, technologies, and methodologies that allow for the construction, usage, management, and maintenance of the hardware and software used for a data warehouse, as well as the actual data itself.

Page 3: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

3

Basic definitions

Data Warehouse: The term Data Warehouse was

coined by Bill Inmon in 1990, which he defined in the following way: "A warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process".

Page 4: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

4

Definition of the terms in the sentence are as follows:

Subject Oriented:Data that gives information about a

particular subject instead of about a company's ongoing operations.

Integrated: Data that is gathered into the data

warehouse from a variety of sources and merged into a coherent whole.

Page 5: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

5

Time-variant: All data in the data warehouse is

identified with a particular time period.

Non-volatileData is stable in a data warehouse.

More data is added but data is never removed. This enables management to gain a consistent picture of the business.

Page 6: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

6

simpler definition of a data warehouse.

a data warehouse is "a copy of transaction data specifically structured for query and analysis".

This definition provides less insight and depth but is no less accurate.

Page 7: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

7

SOME FACTS ABOUT DATA WAREHOUSE

However, a single-subject data warehouse is typically referred to as a data mart, while data warehouses are generally enterprise in scope.

Also, data warehouses can be volatile. Due to the large amount of storage required for a data warehouse, (multi-terabyte data warehouses are not uncommon), only a certain number of periods of history are kept in the warehouse. For instance, if three years of data are decided on and loaded into the warehouse, every month the oldest month will be "rolled off" the database, and the newest month added.

Page 8: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

8

COMPONENTS OF DATA WAREHOUSING

Page 9: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

9

CREATING DATAWAREHOUSING

Data warehousing is essentially what you need to do in order to create a data warehouse, and what you do with it.

It is the process of creating, populating, and then querying a data warehouse and can involve a number of discrete

technologies.

Page 10: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

10

11 steps in data warehousing

Recognize that the job is probably harder than you expect.

Understand the data in your existing systems.

Be sure to recognize equivalent entities. Use metadata to support data quality.Select the right data transformation tools. Take advantage of external resources. Use new information distribution methods.

Page 11: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

11

Focus on high-payback marketing applications.

Emphasize early wins to build support

throughout the organization.

Don't underestimate hardware requirements.

Consider outsourcing your data warehouse development and maintenance.

Page 12: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

12

1-11 Steps To Successful Data Warehousing

1. Recognize that the job is probably harder than you expect.

Experts frequently report that 30-to-50 percent of the information in a typical database is missing or incorrect.

That situation may not be noticeable -- or may even be acceptable -- in an operational system that focuses on swiftly and accurately processing current transactions.

Page 13: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

13

But that percentage of error is totally unacceptable in a data warehousing system designed to sort through millions of historical records to identify trends or select potential customers for a new product. And, even when the data is correct, it may not be usable in a data warehouse environment.

Page 14: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

14

• Recognize that the job is probably harder than you expect.

Page 15: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

15

EXAMPLES FOR THE FIRST STEP

For example, legacy system programmers often use shortcuts to save disk space or CPU cycles, such as using numbers instead of names of cities, that make the data meaningless in a generic environment. Another challenge is that database schema often change over the lifecycle of a project, yet few companies take the time to rebuild historical databases to account for those changes.

Page 16: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

16

2. Understand the data in your existing systems.The first step in any data warehousing project is to perform a detailed analysis of the status of all databases that will either contribute -- or potentially contribute -- to the data warehouse.

An important part of understanding the existing data is determining interrelationships between various systems. Interrelationships must be maintained as the data is moved into the warehouse.

Page 17: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

17

In addition, the data warehouse implementation often involves making changes to database schema. You must have a clear understanding of data relationships among heterogeneous systems to determine -- in advance -- how any change may impact the system. Otherwise, it's possible for changes to create inconsistencies that ripple across the entire enterprise, creating enormous headaches.

Page 18: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

18

Understand the data in your existing systems.

Page 19: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

19

3. Be sure to recognize equivalent entities.

One of the most important aspects of preparing for a data warehousing project is identifying equivalent entities and heterogeneous systems. The problem arises because the same essential piece of information may appear under different field names in different parts of the organization.

For example, two different divisions may be servicing the same customer but have the name entered in a slightly different manner.

Page 20: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

20

For eg: AIG or American International Group are the same.

You can use a data transformation product that's capable of fuzzy matching to identify and correct this and similar problems.

Establishing a common database structure can be just as important as merging the corporate cultures after a merger, which is crucial to obtaining the full effects of synergy.

Page 21: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

21

Be sure to recognize equivalent entities.

Page 22: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

22

4. Use metadata to support data quality.

Metadata is crucial to a successful data warehousing implementation. By definition, metadata is data about data, such as the tags that indicate the subject of a web document.

There are many types of metadata that can be associated with a database to characterize and index data, facilitate or restrict access to data, determine the source and currency of data, etc.

Page 23: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

23

One major challenge is trying to synchronize the metadata between different vendor products, different functions, and different metadata stores .

This emphasizes the importance of starting as soon as possible to create and capture metadata for interfaces, business processes, and database requirements. Several vendors have developed products that have the potential to integrate metadata from disparate sources and establish a central repository that can provide information to both administrators and users.

Page 24: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

24

Use metadata to support data quality.

Page 25: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

25

5. Select the right data transformation tools. Data transformation tools extract data from the

operational sources, clean it, and load it into the data warehouse while capturing the history of that process.

This transformation process may include the creation and population of new fields from the operational data, summarizing data to an appropriate level for analysis, performing error checking operations to validate the integrity of the data, etc.

Page 26: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

26

Look for a tool that makes it possible to map data from source to target with a simple point-and-click interface.

It's also useful to track and manage the relationships of interrelated data entities that are affected when changing database structures.

Finally, try to find a tool that can capture and store metadata during the conversion process.

Page 27: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

27

Select the right data transformation tools.

Page 28: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

28

6. Take advantage of external resources.

External sources of information, such as data from a customer's transaction processing systems, or market research data provided by a third party, can greatly increase the value of internal information.

Page 29: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

29

o Take advantage of external resources

Page 30: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

30

7. Use new information distribution methods.

The biggest technological improvements in recent years in the data warehousing field have come in the area of information delivery. In the past, highly skilled analysts were needed to prepare information in the format requested by users.

Today, information can be delivered in several different ways directly to people who need it. Users can subscribe to regular reports and have them delivered via e-mail.

Page 31: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

31

• If the report contains data that meets the subscription criteria, a customer-filtered view of the data can be delivered to the user by means of an e-mail attachment.

• The report data stays securely and economically on the server, and only the pages requested by the authorized user are sent on demand.

• Another option is to let the user log in, search for data, and open the reports that provide the needed information.

Page 32: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

32

Use new information distribution methods.

Page 33: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

33

8. Focus on high-payback marketing applications.

Most of the really hot applications in data warehousing right now involve marketing because of the potential for an immediate payback in increased revenues.

Page 34: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

34

Focus on high-payback marketing applications.

Page 35: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

35

9.Emphasize early wins to build support throughout the organization.The wide range of off-the-shelf solutions has made it possible to drastically reduce cost and lead-time requirements for data warehousing applications. Off-the-shelf solutions won't usually complete project objectives, but they often can be used to provide point solutions in a short time that serve as a training and demonstration platform and, most importantly, build momentum for full-scale implementation.

Page 36: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

36

10. Don't underestimate hardware requirements.

The hardware requirements for data warehousing database servers are high. The primary reason is the large number of CPU cycles required to slice and dice data over and over again to meet the varying needs of users throughout the organization.

Page 37: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

37

Database size

Database size also plays in server performance requirements, ranging up to terabytes and higher.

For those reasons, be sure to select a scalable platform regardless of how much headroom you have provided in your server specification. A more ambitious idea is to combine loosely-coupled systems that let companies spread the database over multiple servers, although the database still appears as a single entity to users.

Page 38: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

38

11. Consider outsourcing your data warehouse development and maintenance.

A large percentage of medium and large companies use outsourcing to avoid the difficulty of locating and the high cost of retaining skilled IT staff members.

Most data warehousing applications fit the main criteria for a good outsourcing project -- a large project that has been defined to the point that it doesn't require day-to-day interaction between business and development teams.

Page 39: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

39

There have been many cases where an outsourcing team is able to make dramatic improvements in a new or existing data warehouse.

Typically these improvements don't necessarily stem from an increased level of skill on the outsourcing team, but flow from the nature of outsourcing.

The outsourcing team brings fresh ideas and perspective to their assignment, and is often able to suggest methods and solutions that they've developed and proven successful in previous assignments.

Page 40: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

40

• Mining corporate data for valuable customer information can improve business performance. But it's not as simple as it sounds.

Page 41: 1 Data warehousing  A Warehouse to Learn and Earn.  An alternative for book keeping

41

END OF THE SEMINAR

• THANKYOU