Upload
dodung
View
224
Download
6
Embed Size (px)
Citation preview
Naeem Ahmed
Department of Software Engineering Mehran Univeristy of Engineering and Technology Jamshoro
Email: [email protected]
Data Warehouse and Data Mining Lecture No. 10
Decision Support System and Life cycle
Decision Support System • Decision Support Systems (DSS) can defined in
two ways: – By describing the software tools or the technologies,
used to perform business decisions
– By describing the function or the concept of Decision Support Systems in a tool neutral way
Decision Support System • Erik Thomson definition:
– The activity of using logic-based fact-processing rules in combination with goal-oriented management rules with or without human intervention to translate larger sets of lower-level facts and fact relationships into smaller sets of higher-level facts and fact relationships (and vise versa).
Decision Support System • A DSS (Decision Support System) is a computer-
based information system that supports business or organizational decision-making activities
• A DSS is an Information technology to help the knowledge worker (executive, manager, analyst) make faster and better decisions. – Which orders should we fill to maximize revenues? – Will a 10% discount increase sales volume sufficiently?
Decision Support System
Decision Support System • DSS treats facts and rules differently. This
treatment groups DSS in the following categories: – Data Mining Tools - Discover rules that explain
relationships between facts – OLAP Tools - organize facts according to multiple
dimensions and they use powerful rules for combining those facts to form aggregate facts
– Business Modeling Tools - organize the management and the expert tools
– Data Visualization Tools - graphically illustrate relationships between facts
3 Tier Decision Support System Information Sources Data Warehouse
Server (Tier 1)
OLAP Servers (Tier 2)
Clients (Tier 3)
Operational DB’s
Semistructured Sources
extract transform load refresh etc.
Data Marts
Data Warehouse
e.g., MOLAP
e.g., ROLAP
serve
Analysis
Query/Reporting
Data Mining
serve
serve
Decision Support Life cycle ① Planning ② Gathering data requirements & modeling ③ Physical database design & development ④ Data sourcing, integration, and mapping ⑤ Populating the data warehouse ⑥ Automating the Data management process ⑦ Creating the starter set of reports ⑧ Data validation & Testing ⑨ Training ⑩ Rollout
1. Planning • Planning encompasses many of the same tasks as
any other type of system development project
1. Planning • Improper planning and inadequate project
management tend to result in failures – Major issue
• Following aspects need to be considered: – Determine if company really needs a data warehouse. Is
it really ready for one? – Decide on the type of data warehouse to be built and
where to keep it? – Who will be using the data warehouse and at what
times?
Technical considerations • Capacity Planning • Data Integration Strategies • Archiving Strategies • Procedures for end-user access to archived Data • Data refresh/update strategies • Operations and job scheduling strategies • Metadata management strategies • LAN/WAN technology, DBMS Connectivity, DBMS
Load Utilities, Selection of Platform • Front-end data access tools
2. Gathering Requirements and Modeling
• This phase of the life cycle is concerned with understanding the business needs and data requirements of the users of the system
Gathering Data Requirements • Gathering data requirements includes
understanding: – How the user does business – What the business drivers are – What attributes the user needs – Which attributes are absolutely required and which are a
“wish list” – What the business hierarchies are – What data users use now & what would they like to have – What levels of detail or the summary the user need – What type of front-end data access tool will be used – How the user expects to see the results of their queries
Requirement Analysis • There are three principal reasons for performing a
requirements analysis: – Getting the system right the first time. – Supporting easy user access to the system – Producing reasonable and reliable estimates of costs
• Without a requirements analysis, one cannot have any grasp of how much an IT project will cost.
Interviewing for Requirements • Interviewing enterprise management and end
users provides insights into the actual use of the data that cannot be obtained in any other way
• Among the factors that must be determined by the process are the following: – How best to input data – How best to report information from the database – How management interprets and analyzes the reports it
receives from the database
Determine who should be Interviewed
• What sorts of users should be included in interview pool? • The following list is a bare minimum:
– Speak with people involved in the process of the business. – Talk to vendors if plan is to use Supply Chain Management and
Customer Relationship Management applications. – Talk to enterprise management about their information needs
including what sorts of reports, queries, and data mining capabilities they require. The objective is not to determine how to get those answers, but to determine what sorts of answers they think they would like to get whether it is possible or not.
– Talk to end users about what they need, what problems they have with legacy systems, and obtain their wish lists of features, interfaces, and processes
Interviewing Techniques and Personal Agendas to Consider
• Always keep the objective of the interview in mind, which is to learn about the business
• Conduct user and management interviews separately – This should be done not only because of the unavoidable
intimidation factor, but also because the two groups have different perspectives on the issues
• When possible, arrange for 1:1 interviews. If not, keep interview groups as small as possible, preferably no more than five people – Keep the number of interviewers small: generally no more than two
per session • Approach the design as if no knowledge of the enterprise or
its data model is available • Listen actively
Interviewing Techniques and Personal Agendas to Consider
• Before beginning, explain what the interview process is. – Tell subjects what to expect and what is expected from them. Thank
them for their time and make certain they understand that their observations will be taken into consideration
• Have a prepared list of questions. Stay on track, but do not hesitate to ask aggressive follow-up questions where an answer raises an important issue. When finished, follow up where necessary for clarification and expansion.
• Ask open-ended questions. Not like: “Is the daily sales system a) not adequate b) barely adequate c) adequate d) excellent, but like: “How do you feel about the daily sales system?
• Take thorough notes or tape record the session
Interviewing Techniques and Personal Agendas to Consider
• If facilities are available and the procedure would not be obtrusive, consider videotaping sessions
• Review and summarize notes or taped transcription as soon as possible after the completion of each interview
• Take control of the interview and maintain it, arbitrating any disagreements among participants that may arise, and keeping things moving at all times
• Stay on track • When the interview is finished, thank the participants for
their assistance in the process and acknowledge the importance of their input
Analyzing a Legacy Database • To analyze the existing database for its strengths,
weaknesses, uses and to develop a list of what needs to be retained, what needs to be changed, and what needs to be eliminated
• Analyze the legacy system both from an end user perspective and from an IT perspective
General Procedure • The following procedure provides a general means
for conducting an analysis of a legacy database: – Capture all data collection screens (or any other means by which
data is inserted into the legacy database). – Capture all reports (including terminal-only reports). – With captured information presentation media in hand, conduct
interviews again, this time capturing how users use these media. Construct a list of both open and closed (specific) questions.
– Drill down by following up with detail-oriented questions aimed at capturing the details of all I/O transactions. Begin generally and drill down successively to more detailed questions about user tasks.
– Interview for types of data collected and input and how that data is used.
– Interview with data collection screens and reports in hand
Identify New Data and Reports Required by users
• How to perform a follow-up interview with a user with the intent of identifying new system inputs and outputs: – Reviewing input screens and reports; ask the user if there
is any other information that they think should be there that is not.
• Remember to ask if there is anything already included that should not be, either because it is not used or because nobody knows what it is for.
– Ask the user to talk about the proposed new information: why they think it is necessary, what reports it applies to, and so on.
– Analyze the comments for entities and attributes as before
Data Modeling • The central focus of this task in the life cycle is to
provide: – A logical data model covering the scope of the
development project including relationships, cardinality, attributes, definitions, and candidate keys
– A dimensional business model that diagrams the facts, dimensions, relationships, and candidate keys for the scope of development project
3. Physical Database Design & Development
• This phase of the decisions support life cycle covers database design and de-normalization
• The focus is on: – Designing the database, including fact tables,
relationship tables – De-normalizing the data – Identifying keys – Creating indexing strategies – Creating appropriate database objects
3. Physical Database Design & Development
• The concepts of hierarchies, facts, dimensions, decision support are also needed
4. Data Sourcing, Integration, & Mapping
• This phase is done in conjunction with database design phase, because targeted data warehouse database design for the source to target mapping is needed
• This is most time consuming & encompasses locating the source of the data in the operational system, doing analysis to understand what type of data integration may be needed, writing integration specifications, and mapping the source data to target data warehouse database design.
• This investigation is crucial to determine what data can actually be captured
4. Data Sourcing, Integration, & Mapping
• The following steps are required: – Defining possible source systems – Determining file layouts – Performing data analysis to determine the best (and cleanest if
possible) source of data – Performing data analysis to integrate the data – Developing written data conversion specifications for each field
and refining integration strategy – Mapping source to target data
5. Populating Data Warehouse • The focus is on:
– Developing programs or using tools to extract and move data – Developing load strategies – Developing the procedures to load the data into the warehouse – Developing programs or using data conversion tools to integrate
data – Developing update/refresh strategies – Testing extract, integration, and load programs and procedures
6. Automating the Data Load Process
• This phase is concerned with automating the extraction, integration, and loading of the data warehouse
• This phase includes: – Automating and scheduling the data extraction process – Automating and scheduling the data conversion process – Automating and scheduling the data load process – Creating backup and recovery procedures – Conducting a full test of all the automated procedures
7. Creating the Starter Set of Reports
• Development of a starter set of reports begin as soon as a test subset of data is loaded
• Structured navigation paths to access pre- defined reports or data directly must be developed
• This phase will also drive data validation and performance tuning
• This phase includes: – Creating set of reports – Testing reports – Documenting applications
8. Data Validation & Testing • This phase includes standard data validation
processes throughout the data extract, integration, and load development phases
• In addition once the data access front end has been put in place, extra validation can occur
9. Training and User support • This phase focused on creating training programs for
the user community. • Users of all levels will need to be trained in:
– The scope of the data in the warehouse – The front-end access tool and how it works – How to access and navigate metadata to get information on the
data in the warehouse – The DSS application or starter set of reports – the capabilities and
navigation paths – Ongoing training/user assistance as the system evolves
10. Rollout • This phase includes the necessary tasks for the
deployment of data warehouse to the user community
• These may include: