View
225
Download
0
Category
Tags:
Preview:
Citation preview
Decision Support Technology
DSS Reference Architecture
LanguageSystem Problem Processing
SystemKnowledgeSystem
PresentationSystem
Outline
• Knowledge system technology– data management– data warehousing– Data marts– online analytic processing (OLAP)
• ROLAP, MOLAP, WOLAP
Evolution of DSS• Transaction Processing Systems (TPS)
– Operational data stores and OLTP– Batch reports, hard to find and analyze
information, inflexible and expensive, reprogram every new request (circa 60’s)
• MIS– Management reporting from transactions in TPS– Still inflexible, not integrated with desktop tools
(circa 70’s)• DSS
– Combine data with analytic models or expert rules– Integration with desktop tools (80’s)
Evolution of DSS• Data Warehousing
– Data integrated after (cleaning and scrubbing) from multiple sources (both internal and external to the organization)
– OLAP is the technology used to study the data in terms of operations on a multi-dimensional data set
– Data warehousing also supports processing of data by analytic methods and permits data mining (90’s)
Applications
Retail - inventory management, promotions Manufacturing - order shipment Insurance – policy and claims tracking Telecommunications - call analysis Financial – account tracking CRM/eCRM – customer profiling, clickstream
analysis Healthcare – disease management, patient and
physician profiling
Disease Management Programs
Address health quality of population by identifying at-risk members and applying prevention programs
Challenge: identifying at-risk members Data warehouses can help with this effort Aetna U.S. Healthcare
Members with certain ailments are flagged using an algorithm that examines member’s diagnoses, procedures, laboratories, and pharmaceuticals
Data gathered from medical and pharmaceutical claims, member and provider profiles (nearly 3 terabytes)
Results: reduction in frequency of acute asthmatic episodes, improvements in vaccination rates
Data Model• A representation scheme with which to describe data: data
relationships, data semantics, and consistency constraints.
• Examples– ER (entity relationship model)
– Relational model
– Object-oriented model
• References:– http://www.smartdraw.com/resources/centers/software/erd.htm (ER
model)– http://www.fgcu.edu/support/office2000/index.html (MS
Office product tutorials)– http://mis.bus.sfu.ca/tutorials/MSAccess/tutorials.html (MS
Access tutorial)
E-R schema: A simple example
physician patientevaluatesm n
-Modeling the structure of data, not the processing of it
ER Schema: a detailed example
Relational model
• Primary model used today for data-processing applications
• Database systems like Oracle, Sybase, Informix, MS SQL server etc. support this model
• Based on a well understood
theoretical model
Example of a relational schema:
Physician(doc_id, d_name, specialty)
Patient(p_id, p_name)
Evaluates(doc_id, p_id, date, diagnosis)
An Example
Essential features of the
relational model
• A relational model schema consists of relations or tables
• Each table has a set of fields (columns) that are related to one another
• One or more fields whose values determine the value of other fields are called keys
• Tables are normalized in order to remove redundancies
New Data Types
• Text, images, video, audio, time series, spatial, ….
• Other, more exotic data types like fingerprints (an IBM Extender) and face recognition (an Informix DataBlade).
• Offered by IBM, Informix, Oracle
Object-oriented Databases
• Objects belong to object classes determined by the structure (variables) and behavior (methods) of the object - not easy to represent in a relational database
• Methods that impact the state of the variables of the object are encapsulated within the object and facilitate interaction between objects
• eg. Object “bank account” modifies “balance” by “amount”
• Potentially reduced development and maintenance time
From Databases to Transaction Processing
Change
Reality Abstraction
Transaction
Que
ry
AnswerDB'
DB
The real state is represented by an abstraction, called the database, and the transformation of the real state is mirrored by the execution of a program, called a transaction, that transforms the database.
Source: Jim Gray,MSFT
Examples
• Point of sale systems– credit card transactions
• ATM machines– all withdrawals and deposits
• E-commerce web sites
• Health Care– medical records, billing
Databases for Decision Support
• Transaction Processing systems are optimized for performance
• Data they capture are too detailed to be of use for decision support purposes
• Online Analytical Processing (OLAP) imposes very different demands on databases than does Online Transaction Processing (OLTP)
Data Analysis: An Example
Reimbursements
Net Profits
1997 1998
45.4 Mill 52.4 Mill
3.2 Mill 5.2 Mill
% Inc.
15.41
62.50
Reimbursements
Net Profits
1997 1998
45.4 Mill 52.4 Mill
3.2 Mill 5.2 Mill
% Inc.
15.41
62.50
North
South
East
West
Reimb. by region
7.2 Mill
13.4 Mill
18.4 Mill
6.4 Mill
7.5 Mill
18.4 Mill
17.4 Mill
9.1 Mill
4.17
37.31
-5.43
42.18
Data Analysis: A Hospital Example
Specialty A
Specialty B
Specialty C
Specialty D
Reimb. - South by specialty
2.65 Mill
6.45 Mill
3.1 Mill
1.2 Mill
2.70 Mill
7.10 Mill
4.0 Mill
4.60 Mill
1.90
10.08
29.03
283.33
North
South
East
West
Reimb. By region
7.2 Mill
13.4 Mill
18.4 Mill
6.4 Mill
7.5 Mill
18.4 Mill
17.4 Mill
9.1 Mill
4.17
37.31
-5.43
42.18
Data Analysis: An Example
Reimbursements
Net Profits
1997 1998
45.4 Mill 52.4 Mill
3.2 Mill 5.2 Mill
% Inc.
15.41
62.50
North
South
East
West
Profits by region
0.88 Mill
1.12 Mill
1.1 Mill
0.1 Mill
0.50 Mill
2.60 Mill
0.13 Mill
1.97 Mill
-43.18
132.1
-88.18
1870
Data Analysis: An Example
Reimbursements
Net Profits
1997 1998
45.4 Mill 52.4 Mill
3.2 Mill 5.2 Mill
% Inc.
15.41
62.50
Specialty A
Specialty B
Specialty C
Specialty D
Profits-South by specialty
0.20 Mill
0.60 Mill
0.22 Mill
0.10 Mill
0.10 Mill
1.20 Mill
0.40 Mill
0.90 Mill
-50.00
100.0
81.82
800.0
Data Analysis: An Example
Integration System
• Collects and combines information from disparate sources
• Provides integrated view, and a uniform user interface
• Supports sharing of data between entities
WorldWideWeb
Digital Libraries Scientific Databases
PersonalDatabases
Heterogeneous Database Integration
Why look at data in this way?
– What would be the demand for services (forecasting)?– Who are our key customers/patients, and
– What are the margins/outcomes? (profitable customers/satisfied patients)
– How do we market to them/treat them?– What pricing/treatment strategy is desirable?– What are their preferences?– What type of customer/patient services are required?– What services when packaged result in higher/better
sales/revenues/margins/outcomes, efficient workflow?– Which promotion/patient education/counseling works or does
not work and why?– What is the inventory/patient turnover?– Which channel/technology is more effective/profitable?– Why do margins/outcomes differ from one place to another or
one patient to another?
Customer Relationship Management
ANALYZE
TAKE ACTION
DISCOVER
Source: Pilot Software
Customer baseProfitabilityBuying patternSupport patternProductivity
Policies & ProceduresMarketing policiesSupport procedures
Trends in marketSelling opportunitiesOpp. For improvements
Business IntelligenceSoftware applications, technologies, and
analytical methodologies that perform data analysis
Often used as a broad term that includes OLAP, data mining, query, reporting tools and technologies
Business Intelligence Loop
Business Strategist
OLAP Data Mining Reports
Data Storage
Extraction, Transformation, & Cleansing
CRM Clinical IS Pharmacy Lab
DataWarehouse
DecisionSupport
Data Warehouse• A decision support database that is maintained
separately from the organization’s operational databases.
• A data warehouse is a
– subject-oriented,
– integrated,
– time-varying,
– non-volatile collection of data that is used primarily in
organizational decision making (W.H. Inmon)
Why Separate Data Warehouse?• Performance
Operational databases are designed & tuned for known transactions & workloads.
Complex OLAP queries would degrade performance for transactions.
Special data organization, access & implementation methods needed for multidimensional views & queries
Function– decision support requires historical data (up to 5 to 10
years of data)– consolidation of data from many operational systems and
external sources– data quality considerations (semantic and measurement
issues are resolved)
Local vs Central Data Repository
Local Central
2 3
1 4
Data WarehouseRepository
KnowledgeGeneration
Local Rules Base
Local Data
KnowledgeGenerationHIS
Data
OtherData
Sources
Central Data
FeedbackOption
Forward Data
Warehouse Database Schema
• ER design techniques not appropriate
• Design should reflect multidimensional view– Star Schema– Snowflake Schema– Fact Constellation Schema
Multidimensional Data Model
• Database is a set of facts (points) in a multidimensional space• A fact has a measure dimension
– quantity that is analyzed, e.g., sale, budget• A set of dimensions on which data is analyzed
– e.g. , store, product, date associated with a sale amount• Dimensions form a sparsely populated coordinate system• Each dimension has a set of attributes
– e.g., owner, city and county of store• Attributes of a dimension may be related by partial order
– Hierarchy: e.g., street > county >city– Lattice: e.g., date> month>year, date>week>year
Example: Patient profiling
A healthcare organization needed a longitudinal view of patients, including trends of services to patients
Model Facts include Healthcare (e.g., diagnosis, procedure),
Financial (e.g., amount billed, number of claims), Resources (e.g., number of bed-days, inpatient and outpatient visits)
Dimensions include Time, Provider, Claim Type, Demographics, Encounter Type, Diagnosis and Procedure, Person, Organization
Questions answered by system: Which individuals are eligible for services but not obtaining
them? Which individuals are registered for services, but not receiving
preventive healthcare?
Example of a Star Schema
Order NoOrder No
Order DateOrder Date
Customer NoCustomer No
Customer Customer NameName
Customer Customer AddressAddress
CityCity
SalespersonIDSalespersonID
SalespersonNaSalespersonNameme
CityCity
QuotaQuota
OrderNOOrderNO
SalespersonIDSalespersonID
CustomerNOCustomerNO
ProdNoProdNo
DateKeyDateKey
CityNameCityName
QuantityQuantity
Total Price
ProductNOProductNO
ProdNameProdName
ProdDescrProdDescr
CategoryCategory
CategoryDescriptionCategoryDescription
UnitPriceUnitPrice
DateKeyDateKey
DateDate
CityNameCityName
StateState
CountryCountry
OrderOrder
CustomerCustomer
SalespersoSalespersonn
CityCity
DateDate
ProductProduct
Fact TableFact Table
Star Schema
• A single fact table and a single table for each dimension
• Every fact points to one tuple in each of the dimensions and has additional attributes
• Does not capture hierarchies directly• Straightforward means of capturing a multiple
dimension data model using relations
Example of a Snowflake Schema
Order NoOrder No
Order DateOrder Date
Customer NoCustomer No
Customer Customer NameName
Customer Customer AddressAddress
CityCity
SalespersonIDSalespersonID
SalespersonNaSalespersonNameme
CityCity
QuotaQuota
OrderNOOrderNO
SalespersonIDSalespersonID
CustomerNOCustomerNO
ProdNoProdNo
DateKeyDateKey
CityNameCityName
QuantityQuantity
Total Price
ProductNOProductNO
ProdNameProdName
ProdDescrProdDescr
CategoryCategory
CategoryCategory
UnitPriceUnitPrice
DateKeyDateKey
DateDate
MonthMonth
CityNameCityName
StateState
CountryCountry
OrderOrder
CustomerCustomer
SalespersoSalespersonn
CityCity
DateDate
ProductProduct
Fact TableFact Table
CategoryNaCategoryNameme
CategoryDeCategoryDescrscr
MontMonthh
YearYear YearYear
StateNameStateName
CountryCountry
CategoryCategory
StateState
MonthMonth
YearYear
Snowflake Schema
• Represent dimensional hierarchy directly by normalizing the dimension tables
• Easy to maintain
• Saves storage, but may reduce effectiveness of browsing (Kimball)
Fact Constellation
Store Key
Product Key
Period Key
Units
Price
Store Dimension
Product Dimension
SalesFact Table
Store Key
Store Name
City
State
Region
Product Key
Product Desc
Shipper Key
Store Key
Product Key
Period Key
Units
Price
ShippingFact Table
Fact Constellation
• Multiple fact tables share dimension tables.
• This schema is viewed as collection of stars hence called galaxy schema or fact constellation.
• Sophisticated applications require such schema.
Data Warehouse vs. Data Marts• Enterprise warehouse: collects all
information about subjects (customers, products, sales, assets, personnel) that span the entire organization.– Requires extensive business modeling– May take years to design and build
• Data Marts: departmental subsets that focus on selected subjects: Marketing data mart: customer, products, sales.– Faster roll out, but complex integration in
the long run.
Virtual Warehouse
• Views over operational databases– Materialize some summary views for efficient
query processing, easier to build, requisite excess capacity on operational database servers
Data Warehouse Architecture
ClinicalSystemClinicalSystem
PayrollSystemPayrollSystem
Billing SystemBilling System
ExternalData
ExternalData
Other Internal
Data
Other Internal
Data
(e.g.,AS400)
OracleFinancialson HP 9000
Access,Files (Industry Reports)
TransformationIntegration
DataWarehouse
DataWarehouse
Meta-Data
Excel Web Other
DataMining Tools
OLAP OLAP serversservers
Extraction, Transformation, & Load (ETL)
ETL is a set of tools and techniques used to populate a data warehouse
Extraction Extract data from sources (e.g., operational
DBMSs, file systems, Web pages) Transformation
Clean data Convert from legacy/host format to
warehouse format (e.g., convert “surname” to “last name”)
Extraction, Transformation, & Load (ETL)
Load Sort, summarize, consolidate, compute views, check
integrity, build indexes, partition Huge volumes of data to be loaded, yet small time window
(usually at night) when the warehouse can be taken off-line Techniques: batch, sequential load often too slow;
incremental, parallel loading techniques may be used
Refresh Propagate updates from sources to the warehouse When to refresh - on every update, periodically (e.g., every
24 hours), or after “significant” events How to refresh – full extract from base tables vs. incremental
techniques
Metadata
Types of Metadata Administrative metadata
source databases and their contents gateway descriptions warehouse schema, view & derived data definitions dimensions, hierarchies pre-defined queries and reports data mart locations and contents data partitions data extraction, cleansing, transformation rules, defaults data refresh and purging rules user profiles, user groups security: user authorization, access control
Metadata …
Types (continued) Business data
business terms and definitions ownership of data charging policies
Operational metadata data lineage: history of migrated data and sequence of
transformations applied currency of data: active, archived, purged monitoring information: warehouse usage statistics, error
reports, audit trails
Tools: Platinum Repository (Computer Associates) Meta Directory (Information Builders)…
The Complete Decision Support System (Source: Franconi)
Information Sources Data Warehouse Server(Tier 1)
OLAP Servers(Tier 2)
Clients(Tier 3)
OperationalDB’s
SemistructuredSources
extracttransformloadrefreshetc.
Data Marts
DataWarehouse
e.g., MOLAP
e.g., ROLAP
serve
Analysis
Query/Reporting
Data Mining
serve
serve
Three-Tier Architecture• Warehouse database server
– Almost always a relational DBMS; rarely flat files
• OLAP servers– Relational OLAP (ROLAP): extended relational DBMS that
maps operations on multidimensional data to standard relational operations.
– Multidimensional OLAP (MOLAP): special purpose server that directly implements multidimensional data and operations.
• Clients– Query and reporting tools.– Analysis tools (excel)– Data mining tools (e.g., trend analysis, prediction)
Recommended