56
Data Component February 2013 Decision Support Systems Course .. Dr. Aref Rashad 1 Decision Support System Course Dr. Aref Rashad Part:3

Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Embed Size (px)

Citation preview

Page 1: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Data Component

February 2013Decision Support Systems Course .. Dr. Aref Rashad

1

Decision Support System Course

Dr. Aref Rashad

Part:3

Page 2: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Values Matrix

help designers of DSS to know what information to include

Page 3: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Characteristics of Useful Information

• Timeliness• Sufficiency• Level of Detail and Aggregation• Redundancy• Understandability• Freedom from Bias• Reliability• Decision Relevance• Cost Efficiency• Comparability • Quantifiability• Appropriateness of Format

Page 4: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Timeliness of DataTimeliness addresses whether the information is available to the decision maker soon enough for it to be meaningful

Page 5: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

5February 2013

whether the data are adequate to support the decision under consideration.

Sufficiency

Level of Detail

The aggregation level of the data is also an important factor for determining the usefulness of information in a DSS

Understandability

The key is to simplify the representation in the database without losing the meaning of the data.

Page 6: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

6February 2013

Freedom from BiasIt is not appropriate for the designer to bias the analyses if it can be avoided. Bias can be caused by a wide variety of problems in the data, such as non representativeness with regard to time horizon, variables, comparability, or sampling procedures

Decision RelevancePerhaps the most obvious issue to consider when building a database is the relevance of the information to the choices under consideration

ComparabilityWhen deciding whether data are valuable, we need to assess whether they can be compared to other relevant data. Comparable means that, in important ways, measurement conditions have been held constant

Page 7: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

7February 2013

ReliabilityDecision makers will assume that the data are correct if they are included in the database; designers therefore need to ensure that they are accurate. They should verify the input of data and the integrity of the database

RedundancyIn a perfect world, the less information is repeated, the less storage is used. This goal is laudable because it should not limit the user's ability to link data from multiple sources.

Cost EfficiencyThe benefit of improved decision-making capability must outweigh the cost of providing it or there is no advantage in the improvement. Said differently, data are only cost efficient in a database if there is positive value in the changed decision behavior associated with acting on the data in question after the cost of obtaining those data are subtracted.

Page 8: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

8February 2013

QuantifiabilityQuantifiability does not assume that all valuable measures are quantified. Rather, it means the data are quantified at the appropriate level and that only appropriate operations can be performed on them. The level of quantification, referred to as the scale, dictates the types of meaningful mathematical operations that can be performed with the data.

Appropriateness of FormatThe final determinant of the value of information is whether it is displayed in an appropriate fashion. This refers to the medium for their presentation, the ordering in which data arepresented to the decision maker and the amount of graphics that are used.

Page 9: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-9

Data, Information, Knowledge

• DataItems that are the most elementary descriptions of things,

events, activities, and transactionsMay be internal or external

• InformationOrganized data that has meaning and value

• KnowledgeProcessed data or information that conveys understanding

or learning applicable to a problem or activity

Page 10: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-10

Data • Raw data collected manually or by instruments• Quality is critical– Quality determines usefulness

• Contextual data quality• Intrinsic data quality• Accessibility data quality• Representation data quality

– Often neglected or casually handled– Problems exposed when data is summarized

Page 11: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-11

Data Sources

• Access needed to multiple sources– Often enterprise-wide – Disparate and heterogeneous databases– XML becoming language standard

• Web– Intelligent agents– Document management systems– Content management systems

• Commercial databases– Sell access to specialized databases

Page 12: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-12

Page 13: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

13February 2013

DatabasesThese databases are collections of interrelated data. The goal behind the database concept is to store related data together in a format independent of the DSS

These data are linked together so that information from different physical locations on the storage medium can be joined together for transmission to the users‘ screens with a minimum amount of trouble.

Page 14: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Evolution of Users’ Needs and DSS Capabilities

Page 15: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-15

Database Management Systems

• Software program• Supplements operating system• Manages data• Queries data and generates reports• Data security• Combines with modeling language for construction of

DSS

The DBMS serves as a buffer between the needs of the applications and the physical storage of the data. It captures and extracts data from the appropriate physical location and feeds it to the application program in the manner requested.

Page 16: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-16

Database Models• Hierarchical

– Top down, like inverted tree– Fields have only one “parent”, each “parent” can have multiple

“children”– Fast

• Network – Relationships created through linked lists, using pointers– “Children” can have multiple “parents”– Greater flexibility, substantial overhead

• Relational– Flat, two-dimensional tables with multiple access queries– Examines relations between multiple tables– Flexible, quick, and extendable with data independence

• Object oriented– Data analyzed at conceptual level– Inheritance, abstraction, encapsulation

Page 17: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-17

Page 18: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Enterprise Data Model

Page 19: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

19February 2013

A data warehouse is a database management system :

Exists separate from the operations systems. It is subject and time variant and integrated, as are the operational data. It is nonvolatile and hence able to support a variety of analyses consistently

The difficult steps in building the data warehouse:

What data are relevant to particular decisions, How the data should be represented and blended, How to ensure they are meaningful, consistent, and accurate

Data Warehouse

The goal of the data warehouse is to bring together data from a variety of sources and merge it in a way to make it useful for decision makers.

Page 20: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-20

Data Warehouse

• Subject oriented• Scrubbed so that data from heterogeneous sources are standardized• Time series; no current status• Nonvolatile

– Read only• Summarized• Not normalized; may be redundant• Data from both internal and external sources is present• Metadata included

– Data about data• Business metadata• Semantic metadata

Page 21: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Process of Building a Data Warehouse

Page 22: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-22

Migrating Data

• Business rules– Stored in metadata repository– Applied to data warehouse centrally

• Data extracted from all relevant sources– Loaded through data-transformation tools or programs– Separate operation and decision support environments

• Correct problems in quality before data stored– Cleanse and organize in consistent manner

Page 23: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

23February 2013

Data ScrubbingThe first step in building the data warehouse is to load data from the disparate data warehouses. The next step is to scrub or clean the data

• Eliminate problems of misspelling, transposition of letters, variations in spelling, and typographical errors.

• Identify records not using corporate standards for coding

• Identify poorly documented data.

• Remove duplicate records

• Remove obsolete data

Page 24: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

24February 2013

• Remove spurious and invalid records

• Validate data (especially with external databases

• Merge third-party information.

• Enrich data with attributes .

• Identify missing or inconsistent data.

• Identify and tag similar records suspected to be duplicates.

Data Scrubbing

Page 25: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

25February 2013

The goal of the data warehouse is to give users a nonvolatile view of the organization. This means that we need to know not only the data at any given point in time but also the relative data at any given point in time.

Currency is one of the factors that needs to be consistent in the data warehouse

Adjustment also includes provision of additional dimensions to the data that might make analyses richer.

Time is another important factor that needs to be included in the data warehouse

The goal across all of these adjustments is to provide the best picture of the organization; its customers, suppliers, and competitors; and as much other outside influences as possible so that the analyses are as reliable as possible.

Data Adjustment

Page 26: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Data Warehouse Tasks

Page 27: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-27

Architecture• May have one or more tiers

• Determined by warehouse, data acquisition (back end), and client (front end)

• One tier, where all run on same platform, is rare• Two tier usually combines DSS engine (client)

with warehouse–More economical

• Three tier separates these functional parts

Page 28: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-28

Page 29: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

5-29

Page 30: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

30February 2013

Online Analytical Processing (OLAP)Interactive analysis of data, allowing data to be summarized and

viewed in different ways online

Data that can be modeled as dimension attributes and measure attributes are called multidimensional data.

– Measure attributes • measure some value• can be aggregated upone.g. the attribute number of the sales relation

– Dimension attributes• define the dimensions on which measure attributes (or

aggregates thereof) are viewede.g. the attributes item_name, color, and size of the sales relation

Page 31: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

31February 2013

Dimensions:Time, Product, Store

Attributes:Product (upc, price, …)Store ……

Hierarchies:Product Brand …Day Week QuarterStore Region Country

Page 32: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

32February 2013

Online Analytical Processing

• Pivoting: changing the dimensions used in a cross-tabulation

• Dicing: defining dimension increments

• Slicing: creating a cross-tab for fixed values only

• Rollup: moving from finer-granularity data to a coarser granularity

• Drill down: The opposite operation - that of moving

from coarser-granularity data to finer-granularity data

Page 33: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

33February 2013

Page 34: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

34February 2013

OLAP Implementation

• OLAP implementations using only relational database features are called relational OLAP (ROLAP) systems

• OLAP systems used multidimensional arrays in memory to store data cubes are referred to as multidimensional OLAP (MOLAP) systems.

• Hybrid systems, which store some summaries in memory and store the base data and other summaries in a relational database, are called hybrid OLAP (HOLAP) systems.

Page 35: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Star Schema (in RDBMS)

Page 36: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Star Schema Example

Page 37: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Star Schema with Sample Data

Page 38: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Points to be noticed about ROLAP

• Defines complex, multi-dimensional data with simple model

• Reduces the number of joins a query has to process• Allows the data warehouse to evolve with relatively

low maintenance• Can contain both detailed and summarized data.• ROLAP is based on familiar, proven, and already

selected technologies.

Page 39: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

MOLAP: Dimensional Modeling Using the Multi Dimensional Model

• MDDB: a special-purpose data model• Facts stored in multi-dimensional arrays• Dimensions used to index array• Sometimes on top of relational DB• Products– Pilot, Arbor Essbase, Gentia

Page 40: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

40February 2013

MOLAP

Page 41: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Data Cube

Store

Prod

uct

Time

M T W Th F S S

Juice

Milk

Coke

Cream

Soap

Bread

NYSF

LA

10

34

56

32

12

56

56 units of bread sold in LA on M

Dimensions:Time, Product, Store

Attributes:Product (upc, price, …)Store ……

Hierarchies:Product Brand …Day Week QuarterStore Region Country

roll-up to week

roll-up to brand

roll-up to region

Can have n dimensions; Tables can be used as views on a data cube

Page 42: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

42February 2013

Dicing & slicing

Page 43: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Points to be noticed about MOLAP• Pre-calculating or pre-consolidating transactional data

improves speed. BUT

Fully pre-consolidating incoming data, MDDs require an enormous amount of overhead both in processing time and in storage. An input file of 200MB can easily expand to 5GB

• MDDs are great candidates for the <50GB department data marts.

• Rolling up and Drilling down through aggregate data.

Page 44: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

HOLAP : Hybrid OLAP

• HOLAP = Hybrid OLAP:

– Best of both worlds

– Storing detailed data in RDBMS

– Storing aggregated data in MDBMS

– User access via MOLAP tools

Page 45: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Multi-dimensional

access Multidimensional Viewer

RelationalViewer

ClientMDBMS Server

Multi-dimension

aldata

SQL-Read

RDBMS Server

Userdata Meta data

Deriveddata

SQL-Reach

Through

SQL-Read

Data Flow in HOLAP

Page 46: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

When deciding which technology to go for, consider:

1) Performance: • How fast will the system appear to the end-user? • MDD server vendors believe this is a key point in their favor.

2) Data volume and scalability: • While MDD servers can handle up to 50GB of storage, RDBMS

servers can handle hundreds of gigabytes and terabytes.

Page 47: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

What-if analysisIFA. You require write access B. Your data is under 50 GBC. Your timetable to implement is 60-90 daysD. Lowest level already aggregatedE. Data access on aggregated levelF. You’re developing a general-purpose application for inventory movement or assets management

THENConsider an MDD /MOLAP solution for your data mart

 IF

A. Your data is over 100 GBB. You have a "read-only" requirementC. Historical data at the lowest level of granularityD. Detailed access, long-running queriesE. Data assigned to lowest level elements

THENConsider an RDBMS/ROLAP solution for your data mart.

IFA. OLAP on aggregated and detailed dataB. Different user groupsC. Ease of use and detailed data

THENConsider an HOLAP for your data mart

Page 48: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Examples• ROLAP– Telecommunication startup: call data records (CDRs) – E-Commerce Site– Credit Card Company

• MOLAP– Analysis and budgeting in a financial department– Sales analysis

• HOLAP– Sales department of a multi-national company– Banks and Financial Service Providers

Page 49: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Tools available• ROLAP:

– ORACLE 8i– ORACLE Reports; ORACLE Discoverer– ORACLE Warehouse Builder– Arbors Software’s Essbase

• MOLAP:– ORACLE Express Server– ORACLE Express Clients (C/S and Web)– MicroStrategy’s DSS server– Platinum Technologies’ Plantinum InfoBeacon

• HOLAP:– ORACLE 8i– ORACLE Express Serve– ORACLE Relational Access Manager– ORACLE Express Clients (C/S and Web)

Page 50: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Conclusion• ROLAP: RDBMS -> star/snowflake schema

• MOLAP: MDD -> Cube structures

• ROLAP or MOLAP: Data models used play major role in performance differences

• MOLAP: for summarized and relatively lesser volumes of data (10-50GB)

• ROLAP: for detailed and larger volumes of data

• Both storage methods have strengths and weaknesses

• The choice is requirement specific, though currently data warehouses are predominantly built using RDBMSs/ROLAP.

Page 51: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Decision Support Systems Course .. Dr. Aref Rashad

51February 2013

Data Mining vs OLAP

Page 52: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Data Mining• Data mining is the process of semi-automatically analyzing large

databases to find useful patterns • Prediction based on past history

– Predict if a credit card applicant poses a good credit risk, based on some attributes (income, job type, age, ..) and past history

– Predict if a pattern of phone calling card usage is likely to be fraudulent

• Some examples of prediction mechanisms:– Classification

• Given a new item whose class is unknown, predict to which class it belongs

– Regression formulae• Given a set of mappings for an unknown function, predict

the function result for a new parameter value

Page 53: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

– Associations• Find books that are often bought by “similar” customers. If a

new such customer buys one such book, suggest the others too.

• Associations may be used as a first step in detecting causatione.g. association between exposure to chemical X and cancer,

– Clusters• e.g. typhoid cases were clustered in an area surrounding a

contaminated well• Detection of clusters remains important in detecting epidemics

Data Mining

Page 54: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

Other Types of Mining

• Text mining: application of data mining to textual documents– cluster Web pages to find related pages– cluster pages a user has visited to organize their visit history– classify Web pages automatically into a Web directory

• Data visualization systems help users examine large volumes of data and detect patterns visually– Can visually encode large amounts of information on a single

screen– Humans are very good a detecting visual patterns

Page 55: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-55

Data mining application classes of problems

– Classification– Clustering– Association– Sequencing– Regression– Forecasting– Hypothesis or discovery driven– ……..

Data Mining Applications

Page 56: Data Component February 2013Decision Support Systems Course.. Dr. Aref Rashad1 Decision Support System Course Dr. Aref Rashad Part:3

© 2005 Prentice Hall, Decision Support Systems and Intelligent Systems, 7th Edition, Turban, Aronson, and Liang

5-56

Tools and Techniques• Data mining– Statistical methods– Decision trees– Case based reasoning– Neural computing– Intelligent agents– Genetic algorithms

• Text Mining– Hidden content– Group by themes– Determine relationships