Upload
vohuong
View
221
Download
0
Embed Size (px)
Citation preview
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 100
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF
DATAWAREHOUSE
Kuldeep Deshpande1, Dr. Bhimappa Desai2
Ellicium Solutions, Pune, India 2Capgemini, Pune, India
ABSTRACT:
Datawarehouse (DWH) systems integrate the data in various operational sources in an organization for analytical usage. Design of Datawarehouses poses a unique challenge as it requires meeting requirements of a diverse set of business users and within constraints posed by various operational systems. In this paper we discuss model driven approach for requirement gathering and design of datawarehouse. We then introduce a case study for user focused requirement gathering technique. Using this case study we demonstrate how intense user involvement can lead to successful design of a Datawarehouse. Various lifecycle activities of requirement gathering are discussed in detail with associated tools and techniques.
Keywords: Datawarehouse, Model driven, Architecture
[1] INTRODUCTION
CWM defines [10] Model Driven approach as “standard framework for software development
that addresses the complete life cycle of designing, deploying, integrating, and managing
applications by using models in software development.”
Model Driven Architecture is an approach for system specification and interoperability based
on use of formal models. In [10], authors have described how MDA and CWM (Common Warehouse
Metamodel) can be used for requirements gathering and design of Datawarehouse. Similarly in [12],
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 101
a datawarehouse framework (DWF) and Unified process (2TUP) has been proposed for development
of datawarehouse using model driven architecture.
[2] NEED FOR MORE RESEARCH
Kimball [1] has stressed that requirements should determine not just what data should go into
datawarehouse, but also how it is organized and updated. However, existing requirement gathering
techniques do not focus on gathering requirements for slowly changing dimensions. In general we
found very little focus on requirement gathering for physical design of datawarehouse from end
users’ perspective.
In recent years, there has been extensive research in the area of Model Driven Architecture for
Datawarehouse. However, linking MDA with traditional user driven and supply driven approaches
of requirement gathering has not been given due attention.
Most of the literature focuses on merits and demerits of requirement analysis methods. However,
very few of them describe experience of implementing various approaches / methodologies in real
life projects. Especially relationship of approach for requirement gathering with success / failure of
DW is not given due attention. Such a study can be an important guide for real DW practitioners.
Also such a study should focus on why a methodology for requirement gathering can be helpful for
a particular type of organization.
[3] BUSINESS INVOLVEMENT IN REQUIREMENT GATHERING
Business involvement in datawarehousing initiatives is a much talked about topic. Everyone in
datawarehouse / BI implementation space agrees that sponsor for a BI initiative should be a well-
respected business leader in the organization. Strong support and sponsorship from business
management is the most critical factor when assessing data warehouse readiness [1]. It is well
accepted that BI program should be led by business and implemented by IT.
However, there are many examples of BI programs that fail due to superficial involvement of
business in the BI programs. Before we discuss effective involvement of business in BI programs,
let us look at how can we categorize business users in terms of their position in the organization, IT
savvy nature, understanding of source systems, role in report generation function etc.
Majority of datawarehouse initiatives are driven by a need to replace existing departmental / personal
data marts / data stores with an enterprise wide decision support system. In such scenarios, each
department has their own data marts in place. Thus there exists a culture in the organizations for data
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 102
based decision making. If business community does not currently place value on information and
analyses, its readiness for a data warehouse is questionable [1]. A datawarehouse program should
explore this analytic culture in the existing setup for executing the datawarehouse program.
In such a setup, there is a team of analysts who are responsible for extracting data from various
source systems and loading the data into departmental data stores. These analysts understand source
systems fairly well, but are not source system experts. They do not control any changes to source
systems. They are however responsible for understanding source data, transforming it in a format
that their business users would like to see. Typically these analysts are good data programmers with
good understanding of business processes. Kimball has used the term ‘Business system analyst’ for
IT resources who are user centric [1]. On the other hand in such a setup there exists a team of business
analysts who are responsible for report / data consumption and are sometimes also responsible for
business decision making. They are good at analyzing reports and getting data in the hands of
ultimate decision makers. This category of business users is sometimes responsible for building
statistical analytics by using data in departmental data marts. We will refer to this category of
business users as ‘Business Information consumers’. Both the categories of business users play an
important role in requirement gathering process.
[4] OBJECTIVES AND CONTRIBUTION
This paper has proposed a methodology for DW requirement analysis with the help of a case study.
• A requirement analysis framework is proposed that supports model driven design of
datawarehouse. This proposed framework builds CIM & PIM layers of the MDA approach.
• Various phases of the proposed framework have been discussed in detail with activities to be
performed, deliverables and interdependencies between tasks.
• Detailed guidelines have been developed regarding involvement of business users in various
steps of requirement analysis for a datawarehouse.
• A comprehensive guideline regarding estimation of effort for proposed framework has been
discussed. This should be of great help to practitioners as guidelines for effort estimation for
projects.
[5] INTRODUCTION TO CASE STUDY
We will discuss datawarehouse requirement analysis methodology proposed in this paper with the help
of a case study. We have implemented this methodology for a leading lending organization in Asia
Pacific. The organization has various financial products like leasing for retail customers, traditional
lending and corporate leasing. Its suite of products has evolved over last 10 years through various
acquisitions and in response to market needs for product advancements. As a result, the organization has
developed various legacy systems. For collections, it has purchased a leading collections management
system.
Thus IT landscape of the organization has following characteristics:
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 103
• Diverse operational systems – Totally it has 23 operational systems across all regions.
• Isolated data stores / marts for decision making – Every day entire production version of operational
sources were replicated in a separate database. This ‘Production mirror’ was then used by various
business groups to create their own isolated decision support data stores.
• No single version of truth existed. Definition of key business terms such as product, contract,
recovery amount etc. was not uniform across business functions.
• Wealth of information existed with the business power users. Thus it was obvious that any possible
DW solution can be successful only if it can extract this wealth of information from power users.
With this background, the organization commissioned a program to build Enterprise Datawarehouse.
Challenges in this program were as follows:
• The EDW had to be built in a very tight timeframe. Time that the organization was willing to invest
for EDW program was 20-30 percent less than industry averages.
• The organization had a thin IT team and same team of source system experts / business analysts
was dedicated to multiple programs, one of which was the EDW program.
• Although the organization had isolated decision support systems in place, business users / IT teams
were not familiar with Datawarehousing concepts and had to be trained on formal Datawarehousing
methodologies.
• Power users had built decision support systems using diverse technologies such as SAS and these
systems formed critical source of information for any EDW effort.
The team for this requirement gathering exercise included: a Datawarehouse architect, a data modeler, 3
business SMEs from risk, finance and sales departments each and 3 business data analysts from risk,
finance and sales departments each.
[6] PROPOSED FRAMEWORK
In this section we discuss the proposed requirement gathering framework in detail. We have divided
the proposed framework into 2 parts: Process model and Model viewpoints and layers.
Process model for requirement gathering
Following process flow demonstrates sequence of activities that we recommend as part of proposed
requirement gathering framework:
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 104
Figure 1 – Process model for MDA driven requirement gathering
• Project Kick Off:
In this approach objectives of kick off workshop are threefold: First is to understand key
stakeholders and their expectations from the DW program, secondly to decide roles and
responsibilities of various stakeholders and thirdly to understand high level IT landscape of the
organization.
In this case two workshops of 2 hours each were conducted and following deliverables were
created from the workshops:
o IT system landscape highlighting flow of data o An overview of reporting systems in
the organization o End user issues and concerns
o Business objective why the organization undertook the EDW program were discussed
Project sponsor was asked to draw a list of participants for the kick off workshop. We advise
to dedicate not more than 2 days of effort for the project kick off.
• Business – IT Interviews
Interviews with business and IT teams are a follow up from step the project kick off.
Focus of interviews is as follows:
o Business Interviews:
To understand current state reporting needs
To understand what is the wish list of the user from information requirement
perspective.
To understand what is the information in current reporting setup that the end
users don’t trust.
To understand specific pain areas of business in terms of reporting e.g.
reconciliation of data from 2 systems, month end data being reported late etc.
o IT Interviews:
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 105
To understand each source system in detail. Information captured should
include: nature of data in sources, technology platform used, underlying
database structures etc.
To understand current report generation process.
In this case, 5 business users (including head of credit risk, senior sales executive and other business
executives) and 5 IT employees (head of architecture, head of infrastructure, reporting lead and
reporting analyst) were interviewed.
• Explain concept of datawarehouse
In most of the organizations that make their first attempt at building a datawarehouse, personal
decision support systems exist in some shape or form. These personal DSS systems are in the form
of MS Access databases created by end users, SAS datasets OR even MS Excel files. It is important
that the power users are familiar with the design of new Datawarehouse being built. Traditional
approach for end user training is to conduct classroom training towards end of the DW build phase.
We recommend conducting training for end users to introduce them to formal Datawarehousing
principles even before requirement gathering begins. This helps the power users to speak the same
language as the DW designers. In this case, we conducted a 4 hour session for power users within
business team and a 4 hour session for IT analysts / reporting teams for above mentioned topics.
• Build Data Dictionary
This is the most critical task in the lifecycle of requirement analysis. Success of the
Datawarehouse program depends to a great extent on completeness of the dictionary in this approach.
Following are the steps for building a data dictionary:
a. Identify a team of business analysts who understand business requirements and source
data in their area of business.
b. Each business analysts goes through all the reports used in their business area and lists
down business concepts and terms used by decision makers.
c. An interrelation diagram is drawn between various terms. This diagram visually
explains the relation between various terms.
d. Each report requirement from business is analysed. Individual data elements are linked
to terms.
e. This is a business focused exercise and not a technical exercise. Focus should be more
on business concepts than listing tables and columns.
f. Each element is mapped to source systems from which they can be sourced.
g. After each business analyst creates data dictionary for his area, a consolidation effort
must be undertaken in which all individual dictionaries are merged and an enterprise
wide dictionary is created.
h. We recommend following structure for the data dictionary:
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 106
Table 1 – Data Dictionary
Data Analysis
We recommend data analysis to be an activity to be performed as a parallel activity to building
of data dictionary. As explained above, during building of data dictionary, business analysts list down
business terms and elements associated with a term. Then a listing of various sources is made against
each business element.
A team of data analysts should profile data for each business term and review the following:
Check whether data is being available in all source systems for elements being discovered
Check whether all source systems provide data at same level of granularity for
elements being discovered
Verify encoding being done for codes and reference data and document business rules
Verify data quality issues (e.g. presence of null values, junk data etc.) in the source systems
Review Data Dictionary, Build Data Model
Building the data dictionary and building the data model are iterative processes. Once business
analysts start building the data dictionary, the modeller should start modelling the relationships
between various categories and terms. By following this iterative approach, the time required for
developing datawarehouse data model is reduced.
In this case the iterative approach was continuously followed. The data dictionary building effort
took 5 weeks whereas first draft of the data model was completed in 6 weeks. Dimensional
modelling approach was followed for developing the data model. Each business term category was
converted into a subject area, each business term was converted in an Entity / table and data element
was converted in attribute.
While the business analysts are putting together the data dictionary, the data analyst should analyze
data in all source systems and ensure that data belonging to a business term is available at same
granularity in the source systems. If this is not the case then each business term can be split into
multiple terms.
Thus an iterative approach between data analysis, data dictionary building and data model building
is recommended for effective data model and data dictionary development.
1st Walkthrough of Model
Business
Term
category
Bus.
term
Data
Element
Definition Source System 1
Table Eleme
nt
Bus.
Rule
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 107
As per this approach, data model walkthrough is a joint exercise between Datawarehouse
technical architect, business analyst and business end users.
We recommend data model walkthrough to be a workshop based exercise. Before the walkthrough
each participant should be given access to the data dictionary and the data model. Following checks
should be performed on the model during the walkthrough session:
o Check granularity of a fact table is same – Data Analyst o Check logical group of
elements in a fact table. – Business Analyst o Review elements that decide “Slowly
changing dimension” – Business users o Make sure common reports do not require
too many joins – Business Analyst o Make sure any data elements are not missed out
– All
o Check the definitions of elements in the data dictionary are correct – Business users
1st walkthrough of the model should be used as a checkpoint and should be a time bound
exercise rather than a thorough review of entire dictionary and model.
In this case 4 sessions of 4 hours each were scheduled for the walkthrough. We recommend about 4
hours of walkthrough for each subject area in the data model / business area in the data dictionary.
Summary Requirements and design
Most business queries analyze a summarization or aggregation of data across one or more
dimensions. Hence it is recommended that data is aggregated by combining multiple concepts
together and/or combining large amounts of detailed data together to create summary tables. The
main objective when designing summary tables is to minimize the amount of data being accessed
and the number of tables being joined.
Now that business analysts and business users have a clear picture after first walkthrough of the
data dictionary and the data model, they should focus their attention on deciding what commonly
required business questions that they have are. Business analysts should identify all commonly
queried elements in the data dictionary and give examples of business queries. This input will be
used by the data modeler in deciding the design of summary subject areas. Deciding summary
requirements is an exercise owned by Business analysts with inputs from business users and the
modeler. Following are the steps to be followed by the business analysts while deciding summary
requirements:
Identify business reporting requirement for aggregation (e.g. sales report)
Identify base fact table on which aggregation will be applied
Identify level of aggregation – e.g. summarize by month, summarize by product etc
Identify dimensional data that is required to be added to aggregated fact table Identify fact
elements that need to be added to aggregate subject area
Above listed requirements are used by the data modeller for designing the summary subject area and
data marts.
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 108
Reports Walkthrough
This is a task that the DW designer and power users should jointly perform. In this task business
users provide list of critical reports that they use. DW designer / business analyst maps these reports
to the Datawarehouse and crate SQLs that show how the reports can be generated.
During the review of reports with the proposed DW model, there might be data elements that
cannot be mapped to the DW. This may be because requirement for those elements was not provided
to the DW design team. List of such missed requirements should be reviewed by the project manager,
business sponsor to decide which of the missed elements should be included in the Datawarehouse
design.
Gather SCD Requirements
In case of business dimensions such as product or customer address description of the dimension
changes slowly than on a regular basis. One of the advantages of datawarehouse is that it can track
these changes and business can see data as it changed over a period of time. Implementing SCD
comes with its challenges. Many Datawarehousing projects commit a mistake of letting the decision
of designing requirements for SCD by IT teams / ETL developers.
Power users should review list of elements in each table and categorize them into 3 categories:
Business Key – This is an element or a group of elements that uniquely identify a dimension
record. We compare new records coming in from source with records existing in the DW based
on these elements.
Change key – This is an element or a group of elements that are absolutely business critical
and if any change happens to these elements, we need to keep history.
Non critical elements – This is group of elements for which we need to know only the latest
value and are not interested in history.
Table Name Column Name Busines s
Key
Change
Key
Update
Element
D_PROD PROD_ID X
D_PROD PROD_CD X
D_PROD PROD_DES X
Figure 2 – SCD Requirements
Actual decision of deciding keys for SCD should be left with business users and DW designer
should provide his inputs in terms of best practices.
Document Business Rules
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 109
Format in which business requires data to be reported is different that format in which data is
collected and stored in source systems. Also, data capture and storage rules in multiple source
systems may be different. These 2 aspects of data storage in sources requires a comprehensive set of
business rules to be developed for purpose of single view of business across multiple business
systems and in a format that is easier to interpret and actionable for the business. During business
rule documentation, each owner of business term should be asked to document in business language
rules that will transform the data into actionable report. It is preferred that a team of business user
and data analyst is assigned the task of deriving business rules for business terms within their area
of business. We strongly advise not to assign this responsibility to source system owners or assigning
this responsibility to multiple business / data analysts for different sources for the same business
term.
2nd Walkthrough of dictionary
Second / final walkthrough of the model is a comprehensive review of entire Data dictionary and
data model. This is a step in the requirement gathering and design in which all inputs that go into the
data dictionary and the deliverables are reviewed as a complete set. Like the first walkthrough /
review of the dictionary, each participant in the review process should be assigned a responsibility
of review of a particular aspect of data dictionary and data model. In addition to the aspects reviewed
in 1st walkthrough, following additional checks should be applied:
Verify that all comments that came out of 1st review of data dictionary and data model are
addressed – to be verified by all participants
Verify that data required for critical business elements is available in and can be mapped to all
major source systems – to be verified by data analyst and business analyst
Verify that business rules specified for various source systems by different business / data
analysts are resulting into single view of business across various source systems
Verify that common business queries do not require complex joins for extracting the data from
the datawarehouse – to be verified by data analysts
2nd walkthrough of the data dictionary and data model concludes the requirement analysis
process.
International Journal of Computer Engineering and Applications,
Volume XII, Issue I, Jan. 18, www.ijcea.com ISSN 2321-3469
Kuldeep Deshpande and Dr.Bhimappa Desai 110
Figure 2 – MDA Viewpoints and Layers
[6] CONCLUSION AND NEED FOR FUTUR WORK
In this paper a requirement gathering methodology for model driven design of datawarehouse was
proposed with the help of a case study that was implemented for a midsized leasing organization.
This methodology proposes to build Enterprise Datawarehouse for an organization on existing
analytical processes in an organization. Existing analytical systems, personal data marts are reused for
developing the datawarehouse. This methodology is heavily dependent on involvement of end users
in the process of requirement gathering and design of a datawarehouse. We have outlined criteria for
selecting end users for requirement gathering.
This methodology ensures continuous end user training on usage of datawarehouse. This results in
sustained usage of datawarehouse by business users.
A comprehensive Data dictionary is a foundation of this approach. The data dictionary accelerates and
enhances entire lifecycle of DW development by automation of activities such as Data model design,
Metadata loading and generation of DW test cases.
Commercial tools such as Kalido have proposed DW design based on model driven approach.
However, their approach is tightly integrated with usage of specific tools. Our proposed approach is
independent of technology and can be implemented using any combination of design, ETL and
database tool.
Case study described in this paper did not use any standard formats such as CWM. Use of such industry
standard formats will facilitate exchange of metadata between data dictionary and various ETL tools
as well as automated generation of code. This needs to be experimented and the approach needs to be
integrated with a standard framework.
An empirical study of impact of model driven approach in reducing development cycle of
datawarehouse needs to be carried out.
REQUIREMENT GATHERING FOR MODEL DRIVEN DESIGN OF DATAWAREHOUSE
Kuldeep Deshpande and Dr.Bhimappa Desai 111
Every requirement gathering approach may not be suitable for different scenarios. Constraints such as
complex source systems, absence of data savvy business users may reduce applicability of this
approach. This needs to be studied by application of this approach to various business scenarios.
REFERENCES
[1] Michael Bergman, “The deep Web: surfacing hidden value”. In the Journal Of Electronic
Publishing 7(1) (2001).
[2] R. Kimball and J. Caserta. The Data Warehouse Lifecycle Toolkit. John Wiley & Sons, 2004.
[3] Matteo Golfarelli and Stefano Rizzi. A Comprehensive Approach to Datawarehouse Testing.
DOLAP’09, November 6, 2009
[4] Golfarelli Matteo. From User Requirements to Conceptual Design in Data Warehouse Design – a
Survey. Data Warehousing Design and Advanced Engineering Applications: Methods for Complex
Construction. L. Bellatreche (Ed.), IGI Global, 2009.
[5] Robert Winter and Bernhard Strauch , Demand-driven Information Requirements Analysis in Data
Warehousing, International Conference on Systems Sciences, IEEE, 2003
[6] Giorgini et al, Goal-Oriented Requirement Analysis for Data Warehouse Design, DOLAP’05,
November 4–5, 2005, Bremen, Germany.
[7] Pardillo et al, USING ONTOLOGIES FOR THE DESIGN OF DATA WAREHOUSES,
International Journal of Database Management Systems ( IJDMS ), Vol.3, No.2, May 2011
[8] Yuhong Guo et al, Triple-Driven Data Modeling Methodology in Data Warehousing: A
Case Study, DOLAP’06, November 10, 2006
[9] William Inmon, Building the Datawarehouse, Wiley publishing
[10] Winter, R. and Strauch, B. Demand-driven information requirements analysis in data warehousing.
Journal of Data Warehousing (2003) 38-47
[11] Model-Driven Architecture (MDA) and Data Warehouse Design, Scholl et al
[12] Golfarelli et al, WAND A CASE Tool for Data Warehouse Design
[13] Essaidi, Osmani, A Unified Method for Developing Data Warehouses,