12
International Journal of Computer Engineering and Applications, Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469 Kuldeep Deshpande and Dr.Bhimappa Desai 100 REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE Kuldeep Deshpande 1 , Dr. Bhimappa Desai 2 Ellicium Solutions, Pune, India 2Capgemini, Pune, India ABSTRACT: Datawarehouse (DWH) systems integrate the data in various operational sources in an organization for analytical usage. Design of Datawarehouses poses a unique challenge as it requires meeting requirements of a diverse set of business users and within constraints posed by various operational systems. In this paper we discuss model driven approach for requirement gathering and design of datawarehouse. We then introduce a case study for user focused requirement gathering technique. Using this case study we demonstrate how intense user involvement can lead to successful design of a Datawarehouse. Various lifecycle activities of requirement gathering are discussed in detail with associated tools and techniques. Keywords: Datawarehouse, Model driven, Architecture [1] INTRODUCTION CWM defines [10] Model Driven approach as “standard framework for software development that addresses the complete life cycle of designing, deploying, integrating, and managing applications by using models in software development.” Model Driven Architecture is an approach for system specification and interoperability based on use of formal models. In [10], authors have described how MDA and CWM (Common Warehouse Metamodel) can be used for requirements gathering and design of Datawarehouse. Similarly in [12],

REQUIREMENT GATHERING FOR MODEL DRIVEN …€¦ · REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE Kuldeep Deshpande and Dr.Bhimappa Desai 101 a datawarehouse framework

  • Upload
    vohuong

  • View
    221

  • Download
    0

Embed Size (px)

Citation preview

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 100

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF

DATAWAREHOUSE

Kuldeep Deshpande1, Dr. Bhimappa Desai2

Ellicium Solutions, Pune, India 2Capgemini, Pune, India

ABSTRACT:

Datawarehouse (DWH) systems integrate the data in various operational sources in an organization for analytical usage. Design of Datawarehouses poses a unique challenge as it requires meeting requirements of a diverse set of business users and within constraints posed by various operational systems. In this paper we discuss model driven approach for requirement gathering and design of datawarehouse. We then introduce a case study for user focused requirement gathering technique. Using this case study we demonstrate how intense user involvement can lead to successful design of a Datawarehouse. Various lifecycle activities of requirement gathering are discussed in detail with associated tools and techniques.

Keywords: Datawarehouse, Model driven, Architecture

[1] INTRODUCTION

CWM defines [10] Model Driven approach as “standard framework for software development

that addresses the complete life cycle of designing, deploying, integrating, and managing

applications by using models in software development.”

Model Driven Architecture is an approach for system specification and interoperability based

on use of formal models. In [10], authors have described how MDA and CWM (Common Warehouse

Metamodel) can be used for requirements gathering and design of Datawarehouse. Similarly in [12],

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 101

a datawarehouse framework (DWF) and Unified process (2TUP) has been proposed for development

of datawarehouse using model driven architecture.

[2] NEED FOR MORE RESEARCH

Kimball [1] has stressed that requirements should determine not just what data should go into

datawarehouse, but also how it is organized and updated. However, existing requirement gathering

techniques do not focus on gathering requirements for slowly changing dimensions. In general we

found very little focus on requirement gathering for physical design of datawarehouse from end

users’ perspective.

In recent years, there has been extensive research in the area of Model Driven Architecture for

Datawarehouse. However, linking MDA with traditional user driven and supply driven approaches

of requirement gathering has not been given due attention.

Most of the literature focuses on merits and demerits of requirement analysis methods. However,

very few of them describe experience of implementing various approaches / methodologies in real

life projects. Especially relationship of approach for requirement gathering with success / failure of

DW is not given due attention. Such a study can be an important guide for real DW practitioners.

Also such a study should focus on why a methodology for requirement gathering can be helpful for

a particular type of organization.

[3] BUSINESS INVOLVEMENT IN REQUIREMENT GATHERING

Business involvement in datawarehousing initiatives is a much talked about topic. Everyone in

datawarehouse / BI implementation space agrees that sponsor for a BI initiative should be a well-

respected business leader in the organization. Strong support and sponsorship from business

management is the most critical factor when assessing data warehouse readiness [1]. It is well

accepted that BI program should be led by business and implemented by IT.

However, there are many examples of BI programs that fail due to superficial involvement of

business in the BI programs. Before we discuss effective involvement of business in BI programs,

let us look at how can we categorize business users in terms of their position in the organization, IT

savvy nature, understanding of source systems, role in report generation function etc.

Majority of datawarehouse initiatives are driven by a need to replace existing departmental / personal

data marts / data stores with an enterprise wide decision support system. In such scenarios, each

department has their own data marts in place. Thus there exists a culture in the organizations for data

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 102

based decision making. If business community does not currently place value on information and

analyses, its readiness for a data warehouse is questionable [1]. A datawarehouse program should

explore this analytic culture in the existing setup for executing the datawarehouse program.

In such a setup, there is a team of analysts who are responsible for extracting data from various

source systems and loading the data into departmental data stores. These analysts understand source

systems fairly well, but are not source system experts. They do not control any changes to source

systems. They are however responsible for understanding source data, transforming it in a format

that their business users would like to see. Typically these analysts are good data programmers with

good understanding of business processes. Kimball has used the term ‘Business system analyst’ for

IT resources who are user centric [1]. On the other hand in such a setup there exists a team of business

analysts who are responsible for report / data consumption and are sometimes also responsible for

business decision making. They are good at analyzing reports and getting data in the hands of

ultimate decision makers. This category of business users is sometimes responsible for building

statistical analytics by using data in departmental data marts. We will refer to this category of

business users as ‘Business Information consumers’. Both the categories of business users play an

important role in requirement gathering process.

[4] OBJECTIVES AND CONTRIBUTION

This paper has proposed a methodology for DW requirement analysis with the help of a case study.

• A requirement analysis framework is proposed that supports model driven design of

datawarehouse. This proposed framework builds CIM & PIM layers of the MDA approach.

• Various phases of the proposed framework have been discussed in detail with activities to be

performed, deliverables and interdependencies between tasks.

• Detailed guidelines have been developed regarding involvement of business users in various

steps of requirement analysis for a datawarehouse.

• A comprehensive guideline regarding estimation of effort for proposed framework has been

discussed. This should be of great help to practitioners as guidelines for effort estimation for

projects.

[5] INTRODUCTION TO CASE STUDY

We will discuss datawarehouse requirement analysis methodology proposed in this paper with the help

of a case study. We have implemented this methodology for a leading lending organization in Asia

Pacific. The organization has various financial products like leasing for retail customers, traditional

lending and corporate leasing. Its suite of products has evolved over last 10 years through various

acquisitions and in response to market needs for product advancements. As a result, the organization has

developed various legacy systems. For collections, it has purchased a leading collections management

system.

Thus IT landscape of the organization has following characteristics:

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 103

• Diverse operational systems – Totally it has 23 operational systems across all regions.

• Isolated data stores / marts for decision making – Every day entire production version of operational

sources were replicated in a separate database. This ‘Production mirror’ was then used by various

business groups to create their own isolated decision support data stores.

• No single version of truth existed. Definition of key business terms such as product, contract,

recovery amount etc. was not uniform across business functions.

• Wealth of information existed with the business power users. Thus it was obvious that any possible

DW solution can be successful only if it can extract this wealth of information from power users.

With this background, the organization commissioned a program to build Enterprise Datawarehouse.

Challenges in this program were as follows:

• The EDW had to be built in a very tight timeframe. Time that the organization was willing to invest

for EDW program was 20-30 percent less than industry averages.

• The organization had a thin IT team and same team of source system experts / business analysts

was dedicated to multiple programs, one of which was the EDW program.

• Although the organization had isolated decision support systems in place, business users / IT teams

were not familiar with Datawarehousing concepts and had to be trained on formal Datawarehousing

methodologies.

• Power users had built decision support systems using diverse technologies such as SAS and these

systems formed critical source of information for any EDW effort.

The team for this requirement gathering exercise included: a Datawarehouse architect, a data modeler, 3

business SMEs from risk, finance and sales departments each and 3 business data analysts from risk,

finance and sales departments each.

[6] PROPOSED FRAMEWORK

In this section we discuss the proposed requirement gathering framework in detail. We have divided

the proposed framework into 2 parts: Process model and Model viewpoints and layers.

Process model for requirement gathering

Following process flow demonstrates sequence of activities that we recommend as part of proposed

requirement gathering framework:

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 104

Figure 1 – Process model for MDA driven requirement gathering

• Project Kick Off:

In this approach objectives of kick off workshop are threefold: First is to understand key

stakeholders and their expectations from the DW program, secondly to decide roles and

responsibilities of various stakeholders and thirdly to understand high level IT landscape of the

organization.

In this case two workshops of 2 hours each were conducted and following deliverables were

created from the workshops:

o IT system landscape highlighting flow of data o An overview of reporting systems in

the organization o End user issues and concerns

o Business objective why the organization undertook the EDW program were discussed

Project sponsor was asked to draw a list of participants for the kick off workshop. We advise

to dedicate not more than 2 days of effort for the project kick off.

• Business – IT Interviews

Interviews with business and IT teams are a follow up from step the project kick off.

Focus of interviews is as follows:

o Business Interviews:

To understand current state reporting needs

To understand what is the wish list of the user from information requirement

perspective.

To understand what is the information in current reporting setup that the end

users don’t trust.

To understand specific pain areas of business in terms of reporting e.g.

reconciliation of data from 2 systems, month end data being reported late etc.

o IT Interviews:

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 105

To understand each source system in detail. Information captured should

include: nature of data in sources, technology platform used, underlying

database structures etc.

To understand current report generation process.

In this case, 5 business users (including head of credit risk, senior sales executive and other business

executives) and 5 IT employees (head of architecture, head of infrastructure, reporting lead and

reporting analyst) were interviewed.

• Explain concept of datawarehouse

In most of the organizations that make their first attempt at building a datawarehouse, personal

decision support systems exist in some shape or form. These personal DSS systems are in the form

of MS Access databases created by end users, SAS datasets OR even MS Excel files. It is important

that the power users are familiar with the design of new Datawarehouse being built. Traditional

approach for end user training is to conduct classroom training towards end of the DW build phase.

We recommend conducting training for end users to introduce them to formal Datawarehousing

principles even before requirement gathering begins. This helps the power users to speak the same

language as the DW designers. In this case, we conducted a 4 hour session for power users within

business team and a 4 hour session for IT analysts / reporting teams for above mentioned topics.

• Build Data Dictionary

This is the most critical task in the lifecycle of requirement analysis. Success of the

Datawarehouse program depends to a great extent on completeness of the dictionary in this approach.

Following are the steps for building a data dictionary:

a. Identify a team of business analysts who understand business requirements and source

data in their area of business.

b. Each business analysts goes through all the reports used in their business area and lists

down business concepts and terms used by decision makers.

c. An interrelation diagram is drawn between various terms. This diagram visually

explains the relation between various terms.

d. Each report requirement from business is analysed. Individual data elements are linked

to terms.

e. This is a business focused exercise and not a technical exercise. Focus should be more

on business concepts than listing tables and columns.

f. Each element is mapped to source systems from which they can be sourced.

g. After each business analyst creates data dictionary for his area, a consolidation effort

must be undertaken in which all individual dictionaries are merged and an enterprise

wide dictionary is created.

h. We recommend following structure for the data dictionary:

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 106

Table 1 – Data Dictionary

Data Analysis

We recommend data analysis to be an activity to be performed as a parallel activity to building

of data dictionary. As explained above, during building of data dictionary, business analysts list down

business terms and elements associated with a term. Then a listing of various sources is made against

each business element.

A team of data analysts should profile data for each business term and review the following:

Check whether data is being available in all source systems for elements being discovered

Check whether all source systems provide data at same level of granularity for

elements being discovered

Verify encoding being done for codes and reference data and document business rules

Verify data quality issues (e.g. presence of null values, junk data etc.) in the source systems

Review Data Dictionary, Build Data Model

Building the data dictionary and building the data model are iterative processes. Once business

analysts start building the data dictionary, the modeller should start modelling the relationships

between various categories and terms. By following this iterative approach, the time required for

developing datawarehouse data model is reduced.

In this case the iterative approach was continuously followed. The data dictionary building effort

took 5 weeks whereas first draft of the data model was completed in 6 weeks. Dimensional

modelling approach was followed for developing the data model. Each business term category was

converted into a subject area, each business term was converted in an Entity / table and data element

was converted in attribute.

While the business analysts are putting together the data dictionary, the data analyst should analyze

data in all source systems and ensure that data belonging to a business term is available at same

granularity in the source systems. If this is not the case then each business term can be split into

multiple terms.

Thus an iterative approach between data analysis, data dictionary building and data model building

is recommended for effective data model and data dictionary development.

1st Walkthrough of Model

Business

Term

category

Bus.

term

Data

Element

Definition Source System 1

Table Eleme

nt

Bus.

Rule

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 107

As per this approach, data model walkthrough is a joint exercise between Datawarehouse

technical architect, business analyst and business end users.

We recommend data model walkthrough to be a workshop based exercise. Before the walkthrough

each participant should be given access to the data dictionary and the data model. Following checks

should be performed on the model during the walkthrough session:

o Check granularity of a fact table is same – Data Analyst o Check logical group of

elements in a fact table. – Business Analyst o Review elements that decide “Slowly

changing dimension” – Business users o Make sure common reports do not require

too many joins – Business Analyst o Make sure any data elements are not missed out

– All

o Check the definitions of elements in the data dictionary are correct – Business users

1st walkthrough of the model should be used as a checkpoint and should be a time bound

exercise rather than a thorough review of entire dictionary and model.

In this case 4 sessions of 4 hours each were scheduled for the walkthrough. We recommend about 4

hours of walkthrough for each subject area in the data model / business area in the data dictionary.

Summary Requirements and design

Most business queries analyze a summarization or aggregation of data across one or more

dimensions. Hence it is recommended that data is aggregated by combining multiple concepts

together and/or combining large amounts of detailed data together to create summary tables. The

main objective when designing summary tables is to minimize the amount of data being accessed

and the number of tables being joined.

Now that business analysts and business users have a clear picture after first walkthrough of the

data dictionary and the data model, they should focus their attention on deciding what commonly

required business questions that they have are. Business analysts should identify all commonly

queried elements in the data dictionary and give examples of business queries. This input will be

used by the data modeler in deciding the design of summary subject areas. Deciding summary

requirements is an exercise owned by Business analysts with inputs from business users and the

modeler. Following are the steps to be followed by the business analysts while deciding summary

requirements:

Identify business reporting requirement for aggregation (e.g. sales report)

Identify base fact table on which aggregation will be applied

Identify level of aggregation – e.g. summarize by month, summarize by product etc

Identify dimensional data that is required to be added to aggregated fact table Identify fact

elements that need to be added to aggregate subject area

Above listed requirements are used by the data modeller for designing the summary subject area and

data marts.

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 108

Reports Walkthrough

This is a task that the DW designer and power users should jointly perform. In this task business

users provide list of critical reports that they use. DW designer / business analyst maps these reports

to the Datawarehouse and crate SQLs that show how the reports can be generated.

During the review of reports with the proposed DW model, there might be data elements that

cannot be mapped to the DW. This may be because requirement for those elements was not provided

to the DW design team. List of such missed requirements should be reviewed by the project manager,

business sponsor to decide which of the missed elements should be included in the Datawarehouse

design.

Gather SCD Requirements

In case of business dimensions such as product or customer address description of the dimension

changes slowly than on a regular basis. One of the advantages of datawarehouse is that it can track

these changes and business can see data as it changed over a period of time. Implementing SCD

comes with its challenges. Many Datawarehousing projects commit a mistake of letting the decision

of designing requirements for SCD by IT teams / ETL developers.

Power users should review list of elements in each table and categorize them into 3 categories:

Business Key – This is an element or a group of elements that uniquely identify a dimension

record. We compare new records coming in from source with records existing in the DW based

on these elements.

Change key – This is an element or a group of elements that are absolutely business critical

and if any change happens to these elements, we need to keep history.

Non critical elements – This is group of elements for which we need to know only the latest

value and are not interested in history.

Table Name Column Name Busines s

Key

Change

Key

Update

Element

D_PROD PROD_ID X

D_PROD PROD_CD X

D_PROD PROD_DES X

Figure 2 – SCD Requirements

Actual decision of deciding keys for SCD should be left with business users and DW designer

should provide his inputs in terms of best practices.

Document Business Rules

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 109

Format in which business requires data to be reported is different that format in which data is

collected and stored in source systems. Also, data capture and storage rules in multiple source

systems may be different. These 2 aspects of data storage in sources requires a comprehensive set of

business rules to be developed for purpose of single view of business across multiple business

systems and in a format that is easier to interpret and actionable for the business. During business

rule documentation, each owner of business term should be asked to document in business language

rules that will transform the data into actionable report. It is preferred that a team of business user

and data analyst is assigned the task of deriving business rules for business terms within their area

of business. We strongly advise not to assign this responsibility to source system owners or assigning

this responsibility to multiple business / data analysts for different sources for the same business

term.

2nd Walkthrough of dictionary

Second / final walkthrough of the model is a comprehensive review of entire Data dictionary and

data model. This is a step in the requirement gathering and design in which all inputs that go into the

data dictionary and the deliverables are reviewed as a complete set. Like the first walkthrough /

review of the dictionary, each participant in the review process should be assigned a responsibility

of review of a particular aspect of data dictionary and data model. In addition to the aspects reviewed

in 1st walkthrough, following additional checks should be applied:

Verify that all comments that came out of 1st review of data dictionary and data model are

addressed – to be verified by all participants

Verify that data required for critical business elements is available in and can be mapped to all

major source systems – to be verified by data analyst and business analyst

Verify that business rules specified for various source systems by different business / data

analysts are resulting into single view of business across various source systems

Verify that common business queries do not require complex joins for extracting the data from

the datawarehouse – to be verified by data analysts

2nd walkthrough of the data dictionary and data model concludes the requirement analysis

process.

International Journal of Computer Engineering and Applications,

Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469

Kuldeep Deshpande and Dr.Bhimappa Desai 110

Figure 2 – MDA Viewpoints and Layers

[6] CONCLUSION AND NEED FOR FUTUR WORK

In this paper a requirement gathering methodology for model driven design of datawarehouse was

proposed with the help of a case study that was implemented for a midsized leasing organization.

This methodology proposes to build Enterprise Datawarehouse for an organization on existing

analytical processes in an organization. Existing analytical systems, personal data marts are reused for

developing the datawarehouse. This methodology is heavily dependent on involvement of end users

in the process of requirement gathering and design of a datawarehouse. We have outlined criteria for

selecting end users for requirement gathering.

This methodology ensures continuous end user training on usage of datawarehouse. This results in

sustained usage of datawarehouse by business users.

A comprehensive Data dictionary is a foundation of this approach. The data dictionary accelerates and

enhances entire lifecycle of DW development by automation of activities such as Data model design,

Metadata loading and generation of DW test cases.

Commercial tools such as Kalido have proposed DW design based on model driven approach.

However, their approach is tightly integrated with usage of specific tools. Our proposed approach is

independent of technology and can be implemented using any combination of design, ETL and

database tool.

Case study described in this paper did not use any standard formats such as CWM. Use of such industry

standard formats will facilitate exchange of metadata between data dictionary and various ETL tools

as well as automated generation of code. This needs to be experimented and the approach needs to be

integrated with a standard framework.

An empirical study of impact of model driven approach in reducing development cycle of

datawarehouse needs to be carried out.

REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE

Kuldeep Deshpande and Dr.Bhimappa Desai 111

Every requirement gathering approach may not be suitable for different scenarios. Constraints such as

complex source systems, absence of data savvy business users may reduce applicability of this

approach. This needs to be studied by application of this approach to various business scenarios.

REFERENCES

[1] Michael Bergman, “The deep Web: surfacing hidden value”. In the Journal Of Electronic

Publishing 7(1) (2001).

[2] R. Kimball and J. Caserta. The Data Warehouse Lifecycle Toolkit. John Wiley & Sons, 2004.

[3] Matteo Golfarelli and Stefano Rizzi. A Comprehensive Approach to Datawarehouse Testing.

DOLAP’09, November 6, 2009

[4] Golfarelli Matteo. From User Requirements to Conceptual Design in Data Warehouse Design – a

Survey. Data Warehousing Design and Advanced Engineering Applications: Methods for Complex

Construction. L. Bellatreche (Ed.), IGI Global, 2009.

[5] Robert Winter and Bernhard Strauch , Demand-driven Information Requirements Analysis in Data

Warehousing, International Conference on Systems Sciences, IEEE, 2003

[6] Giorgini et al, Goal-Oriented Requirement Analysis for Data Warehouse Design, DOLAP’05,

November 4–5, 2005, Bremen, Germany.

[7] Pardillo et al, USING ONTOLOGIES FOR THE DESIGN OF DATA WAREHOUSES,

International Journal of Database Management Systems ( IJDMS ), Vol.3, No.2, May 2011

[8] Yuhong Guo et al, Triple-Driven Data Modeling Methodology in Data Warehousing: A

Case Study, DOLAP’06, November 10, 2006

[9] William Inmon, Building the Datawarehouse, Wiley publishing

[10] Winter, R. and Strauch, B. Demand-driven information requirements analysis in data warehousing.

Journal of Data Warehousing (2003) 38-47

[11] Model-Driven Architecture (MDA) and Data Warehouse Design, Scholl et al

[12] Golfarelli et al, WAND A CASE Tool for Data Warehouse Design

[13] Essaidi, Osmani, A Unified Method for Developing Data Warehouses,