35
Naeem Ahmed Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro Email: [email protected] Data Warehouse and Data Mining Lecture No. 10 Decision Support System and Life cycle

Data Warehouse and Data Mining - · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

  • Upload
    dodung

  • View
    224

  • Download
    6

Embed Size (px)

Citation preview

Page 1: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Naeem Ahmed

Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro

Email: [email protected]

Data Warehouse and Data Mining Lecture No. 10

Decision Support System and Life cycle

Page 2: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support System •  Decision Support Systems (DSS) can defined in

two ways: –  By describing the software tools or the technologies,

used to perform business decisions

–  By describing the function or the concept of Decision Support Systems in a tool neutral way

Page 3: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support System •  Erik Thomson definition:

–  The activity of using logic-based fact-processing rules in combination with goal-oriented management rules with or without human intervention to translate larger sets of lower-level facts and fact relationships into smaller sets of higher-level facts and fact relationships (and vise versa).

Page 4: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support System •  A DSS (Decision Support System) is a computer-

based information system that supports business or organizational decision-making activities

•  A DSS is an Information technology to help the knowledge worker (executive, manager, analyst) make faster and better decisions. –  Which orders should we fill to maximize revenues? –  Will a 10% discount increase sales volume sufficiently?

Page 5: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support System

Page 6: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support System •  DSS treats facts and rules differently. This

treatment groups DSS in the following categories: –  Data Mining Tools - Discover rules that explain

relationships between facts –  OLAP Tools - organize facts according to multiple

dimensions and they use powerful rules for combining those facts to form aggregate facts

–  Business Modeling Tools - organize the management and the expert tools

–  Data Visualization Tools - graphically illustrate relationships between facts

Page 7: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

3 Tier Decision Support System Information Sources Data Warehouse

Server (Tier 1)

OLAP Servers (Tier 2)

Clients (Tier 3)

Operational DB’s

Semistructured Sources

extract transform load refresh etc.

Data Marts

Data Warehouse

e.g., MOLAP

e.g., ROLAP

serve

Analysis

Query/Reporting

Data Mining

serve

serve

Page 8: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Decision Support Life cycle ①  Planning ②  Gathering data requirements & modeling ③  Physical database design & development ④  Data sourcing, integration, and mapping ⑤  Populating the data warehouse ⑥  Automating the Data management process ⑦  Creating the starter set of reports ⑧  Data validation & Testing ⑨  Training ⑩  Rollout

Page 9: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

1. Planning •  Planning encompasses many of the same tasks as

any other type of system development project

Page 10: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

1. Planning •  Improper planning and inadequate project

management tend to result in failures – Major issue

•  Following aspects need to be considered: –  Determine if company really needs a data warehouse. Is

it really ready for one? –  Decide on the type of data warehouse to be built and

where to keep it? –  Who will be using the data warehouse and at what

times?

Page 11: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Technical considerations •  Capacity Planning •  Data Integration Strategies •  Archiving Strategies •  Procedures for end-user access to archived Data •  Data refresh/update strategies •  Operations and job scheduling strategies •  Metadata management strategies •  LAN/WAN technology, DBMS Connectivity, DBMS

Load Utilities, Selection of Platform •  Front-end data access tools

Page 12: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

2. Gathering Requirements and Modeling

•  This phase of the life cycle is concerned with understanding the business needs and data requirements of the users of the system

Page 13: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Gathering Data Requirements •  Gathering data requirements includes

understanding: –  How the user does business –  What the business drivers are –  What attributes the user needs –  Which attributes are absolutely required and which are a

“wish list” –  What the business hierarchies are –  What data users use now & what would they like to have –  What levels of detail or the summary the user need –  What type of front-end data access tool will be used –  How the user expects to see the results of their queries

Page 14: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Requirement Analysis •  There are three principal reasons for performing a

requirements analysis: –  Getting the system right the first time. –  Supporting easy user access to the system –  Producing reasonable and reliable estimates of costs

•  Without a requirements analysis, one cannot have any grasp of how much an IT project will cost.

Page 15: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Interviewing for Requirements •  Interviewing enterprise management and end

users provides insights into the actual use of the data that cannot be obtained in any other way

•  Among the factors that must be determined by the process are the following: –  How best to input data –  How best to report information from the database –  How management interprets and analyzes the reports it

receives from the database

Page 16: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Determine who should be Interviewed

•  What sorts of users should be included in interview pool? •  The following list is a bare minimum:

–  Speak with people involved in the process of the business. –  Talk to vendors if plan is to use Supply Chain Management and

Customer Relationship Management applications. –  Talk to enterprise management about their information needs

including what sorts of reports, queries, and data mining capabilities they require. The objective is not to determine how to get those answers, but to determine what sorts of answers they think they would like to get whether it is possible or not.

–  Talk to end users about what they need, what problems they have with legacy systems, and obtain their wish lists of features, interfaces, and processes

Page 17: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Interviewing Techniques and Personal Agendas to Consider

•  Always keep the objective of the interview in mind, which is to learn about the business

•  Conduct user and management interviews separately –  This should be done not only because of the unavoidable

intimidation factor, but also because the two groups have different perspectives on the issues

•  When possible, arrange for 1:1 interviews. If not, keep interview groups as small as possible, preferably no more than five people –  Keep the number of interviewers small: generally no more than two

per session •  Approach the design as if no knowledge of the enterprise or

its data model is available •  Listen actively

Page 18: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Interviewing Techniques and Personal Agendas to Consider

•  Before beginning, explain what the interview process is. –  Tell subjects what to expect and what is expected from them. Thank

them for their time and make certain they understand that their observations will be taken into consideration

•  Have a prepared list of questions. Stay on track, but do not hesitate to ask aggressive follow-up questions where an answer raises an important issue. When finished, follow up where necessary for clarification and expansion.

•  Ask open-ended questions. Not like: “Is the daily sales system a) not adequate b) barely adequate c) adequate d) excellent, but like: “How do you feel about the daily sales system?

•  Take thorough notes or tape record the session

Page 19: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Interviewing Techniques and Personal Agendas to Consider

•  If facilities are available and the procedure would not be obtrusive, consider videotaping sessions

•  Review and summarize notes or taped transcription as soon as possible after the completion of each interview

•  Take control of the interview and maintain it, arbitrating any disagreements among participants that may arise, and keeping things moving at all times

•  Stay on track •  When the interview is finished, thank the participants for

their assistance in the process and acknowledge the importance of their input

Page 20: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Analyzing a Legacy Database •  To analyze the existing database for its strengths,

weaknesses, uses and to develop a list of what needs to be retained, what needs to be changed, and what needs to be eliminated

•  Analyze the legacy system both from an end user perspective and from an IT perspective

Page 21: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

General Procedure •  The following procedure provides a general means

for conducting an analysis of a legacy database: –  Capture all data collection screens (or any other means by which

data is inserted into the legacy database). –  Capture all reports (including terminal-only reports). –  With captured information presentation media in hand, conduct

interviews again, this time capturing how users use these media. Construct a list of both open and closed (specific) questions.

–  Drill down by following up with detail-oriented questions aimed at capturing the details of all I/O transactions. Begin generally and drill down successively to more detailed questions about user tasks.

–  Interview for types of data collected and input and how that data is used.

–  Interview with data collection screens and reports in hand

Page 22: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Identify New Data and Reports Required by users

•  How to perform a follow-up interview with a user with the intent of identifying new system inputs and outputs: –  Reviewing input screens and reports; ask the user if there

is any other information that they think should be there that is not.

•  Remember to ask if there is anything already included that should not be, either because it is not used or because nobody knows what it is for.

–  Ask the user to talk about the proposed new information: why they think it is necessary, what reports it applies to, and so on.

–  Analyze the comments for entities and attributes as before

Page 23: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

Data Modeling •  The central focus of this task in the life cycle is to

provide: –  A logical data model covering the scope of the

development project including relationships, cardinality, attributes, definitions, and candidate keys

–  A dimensional business model that diagrams the facts, dimensions, relationships, and candidate keys for the scope of development project

Page 24: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

3. Physical Database Design & Development

•  This phase of the decisions support life cycle covers database design and de-normalization

•  The focus is on: –  Designing the database, including fact tables,

relationship tables –  De-normalizing the data –  Identifying keys –  Creating indexing strategies –  Creating appropriate database objects

Page 25: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

3. Physical Database Design & Development

•  The concepts of hierarchies, facts, dimensions, decision support are also needed

Page 26: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

4. Data Sourcing, Integration, & Mapping

•  This phase is done in conjunction with database design phase, because targeted data warehouse database design for the source to target mapping is needed

•  This is most time consuming & encompasses locating the source of the data in the operational system, doing analysis to understand what type of data integration may be needed, writing integration specifications, and mapping the source data to target data warehouse database design.

•  This investigation is crucial to determine what data can actually be captured

Page 27: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

4. Data Sourcing, Integration, & Mapping

•  The following steps are required: –  Defining possible source systems –  Determining file layouts –  Performing data analysis to determine the best (and cleanest if

possible) source of data –  Performing data analysis to integrate the data –  Developing written data conversion specifications for each field

and refining integration strategy –  Mapping source to target data

Page 28: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

5. Populating Data Warehouse •  The focus is on:

–  Developing programs or using tools to extract and move data –  Developing load strategies –  Developing the procedures to load the data into the warehouse –  Developing programs or using data conversion tools to integrate

data –  Developing update/refresh strategies –  Testing extract, integration, and load programs and procedures

Page 29: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

6. Automating the Data Load Process

•  This phase is concerned with automating the extraction, integration, and loading of the data warehouse

•  This phase includes: –  Automating and scheduling the data extraction process –  Automating and scheduling the data conversion process –  Automating and scheduling the data load process –  Creating backup and recovery procedures –  Conducting a full test of all the automated procedures

Page 30: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

7. Creating the Starter Set of Reports

•  Development of a starter set of reports begin as soon as a test subset of data is loaded

•  Structured navigation paths to access pre- defined reports or data directly must be developed

•  This phase will also drive data validation and performance tuning

•  This phase includes: –  Creating set of reports –  Testing reports –  Documenting applications

Page 31: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

8. Data Validation & Testing •  This phase includes standard data validation

processes throughout the data extract, integration, and load development phases

•  In addition once the data access front end has been put in place, extra validation can occur

Page 32: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

9. Training and User support •  This phase focused on creating training programs for

the user community. •  Users of all levels will need to be trained in:

–  The scope of the data in the warehouse –  The front-end access tool and how it works –  How to access and navigate metadata to get information on the

data in the warehouse –  The DSS application or starter set of reports – the capabilities and

navigation paths –  Ongoing training/user assistance as the system evolves

Page 33: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS

10. Rollout •  This phase includes the necessary tasks for the

deployment of data warehouse to the user community

•  These may include:

Page 34: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS
Page 35: Data Warehouse and Data Mining -   · PDF fileData Warehouse and Data Mining ... • Capacity Planning • Data Integration Strategies ... DBMS Connectivity, DBMS