Upload
job-evans
View
213
Download
0
Embed Size (px)
Citation preview
Decision support Decision support systems for E-systems for E-
commercecommerce
Working Definition of DSSWorking Definition of DSS
A DSS is an integrated, interactive computer system, A DSS is an integrated, interactive computer system, consisting of analytical tools and information management consisting of analytical tools and information management capabilities, designed to aid decision makers in solving capabilities, designed to aid decision makers in solving relatively large, unstructured problemsrelatively large, unstructured problems
Decision Making samplesDecision Making samples
what were the sales volumes by region and product what were the sales volumes by region and product category for the last year?category for the last year?
How did the share price of computer manufacturers correlate How did the share price of computer manufacturers correlate with quarterly profits over the past 10 years?with quarterly profits over the past 10 years?
Central Issue in DSSCentral Issue in DSSsupport and improvement of decision makingsupport and improvement of decision making
Management Decision MakingManagement Decision MakingStrategicStrategic CEO, board of directors, top executivesCEO, board of directors, top executives Develop overall strategies of organizationDevelop overall strategies of organization
TacticalTactical Regional managers, plant managers, division Regional managers, plant managers, division
supervisorssupervisors Carry out strategic managers plansCarry out strategic managers plans
OperationalOperational Direct managers, team leadersDirect managers, team leaders Carry out tactical managers plansCarry out tactical managers plans
Different Technologies are invented to Different Technologies are invented to meet different Decision Making Goals!meet different Decision Making Goals!
The Big Picture: DBs, Data Warehouse, The Big Picture: DBs, Data Warehouse, & OLAP, Data Mining & OLAP, Data Mining
DataWarehouse
ExtractTransformLoadRefresh
OLAP Engine
AnalysisQueryReportsData miningServe
Operational DBs
other sources
Data Storage
OLAP Server
Front-End Tools
Evolutionary StepEvolutionary Step TechnologiesTechnologies ProvidersProviders
Data CollectionData Collection
(1960s)(1960s)
Computers, tapes, Computers, tapes, disksdisks
IBM, CDCIBM, CDC
Data AccessData Access
(1980s)(1980s)
Relational Relational databases, SQL, databases, SQL, ODBCODBC
Oracle, Sybase, Oracle, Sybase, Informix, IBM, Informix, IBM, MicrosoftMicrosoft
Data Warehousing Data Warehousing & Decision Support & Decision Support systemssystems
(1990s)(1990s)
On-line analytic On-line analytic Processing (OLAP), Processing (OLAP), Multidimensional Multidimensional databases (Cubes)databases (Cubes)
Cognos, Arbor, Cognos, Arbor, Pilot, Microstrategy, Pilot, Microstrategy, ORACLE, IBMORACLE, IBM
Data MiningData Mining
(Present)(Present)
Statistics, Machine Statistics, Machine Learning, AILearning, AI
SAS, SPSS, IBM, SAS, SPSS, IBM, ORACLE, Cognos, ORACLE, Cognos, MicrosoftMicrosoft
Why Build a Data Warehouse?Why Build a Data Warehouse?
Separate transactional and analysis systems :Separate transactional and analysis systems :
to make to make Tactical Tactical or evenor even Strategic decisions Strategic decisions for for Regional managers or CEOsRegional managers or CEOs
Easy formulation of complex queriesEasy formulation of complex queries Access to historical data (not in operational Access to historical data (not in operational
systems)systems) Improved data quality (fewer errors and missing Improved data quality (fewer errors and missing
values)values) Access to data from multiple sources, have a Access to data from multiple sources, have a
comprehensive data collectioncomprehensive data collection
Potential Applications of Data Potential Applications of Data Warehousing and Mining in ECWarehousing and Mining in EC
Analysis of user access patterns and buying patternsAnalysis of user access patterns and buying patternsCustomer segmentation and target marketingCustomer segmentation and target marketingCross selling and improved Web advertisementCross selling and improved Web advertisementPersonalizationPersonalizationAssociation (link) analysisAssociation (link) analysisCustomer classification and predictionCustomer classification and prediction
Time-series analysisTime-series analysis Typical event sequence and user behavior pattern Typical event sequence and user behavior pattern analysisanalysisTransition and trend analysisTransition and trend analysis
Data WarehousingData WarehousingThe phrase data warehouse was coined by The phrase data warehouse was coined by William Inmon in 1990William Inmon in 1990Data Warehouse is a decision support Data Warehouse is a decision support database that is maintained separately from database that is maintained separately from the organization’s operational databasethe organization’s operational databaseDefinition: A DW is a repository of integrated Definition: A DW is a repository of integrated information from distributed, autonomous, and information from distributed, autonomous, and possibly heterogeneous information sources possibly heterogeneous information sources for query, analysis, decision support, and data for query, analysis, decision support, and data mining purposesmining purposes
Characteristics (cont’d)Characteristics (cont’d)
IntegratedIntegrated No consistency in encoding, naming conventions, No consistency in encoding, naming conventions,
… among different application-oriented data from … among different application-oriented data from different legacy systems, different heterogeneous different legacy systems, different heterogeneous data sourcesdata sources
When data is moved to the warehouse, it is When data is moved to the warehouse, it is consolidated converted, and encodedconsolidated converted, and encoded
Characteristics (cont’d)Characteristics (cont’d)
Non-volatileNon-volatile New data is always appended to the New data is always appended to the
database, rather than replaceddatabase, rather than replaced The database continually absorbs new data, The database continually absorbs new data,
integrating it with the previous dataintegrating it with the previous data In contrast, operational data is regularly In contrast, operational data is regularly
accessed and manipulated a record at a time accessed and manipulated a record at a time and update is done to data in the operational and update is done to data in the operational environmentenvironment
Characteristics (cont’d)Characteristics (cont’d)Time-variantTime-variant
Operational database contain current value data. Operational database contain current value data. Operational data is valid only at the moment of access-Operational data is valid only at the moment of access-
capturing a moment in time. capturing a moment in time.
The time horizon for the data warehouse is significantly The time horizon for the data warehouse is significantly longer than that of operational systems.longer than that of operational systems.
Data warehouse data is nothing more than a sophisticated Data warehouse data is nothing more than a sophisticated series of snapshots, taken as of some moment in time.series of snapshots, taken as of some moment in time.
System ArchitectureSystem Architecture
Detector Detector Detector Detector
End UserEnd User
LegacyLegacy Flat-fileFlat-file RDBMSRDBMS OODBMSOODBMS. . .. . .
Analysis, Query Reports,Analysis, Query Reports,Data MiningData Mining
Data Warehouse Back-End Tools and UtilitiesData Warehouse Back-End Tools and Utilities
Data extraction:Data extraction: Extract data from multiple, heterogeneous, and external Extract data from multiple, heterogeneous, and external
sourcessources
Data cleaning (scrubbing):Data cleaning (scrubbing): Detect errors in the data and rectify them when possibleDetect errors in the data and rectify them when possible
Data converting:Data converting: Convert data from legacy or host format to warehouse Convert data from legacy or host format to warehouse
formatformat
Transforming:Transforming: Sort, summarize, compute views, check integrity, and Sort, summarize, compute views, check integrity, and
build indicesbuild indices
Refresh:Refresh: Propagate the updates from the data sources to the Propagate the updates from the data sources to the
warehousewarehouse
On-Line Analytical Processing (OLAP)On-Line Analytical Processing (OLAP)
Front-end to the data warehouse. Allowing Front-end to the data warehouse. Allowing easy data manipulation easy data manipulation
Allows conducting inquiries over the data at Allows conducting inquiries over the data at various levels of abstractionsvarious levels of abstractions
FastFast and and easyeasy because some aggregations because some aggregations
are computed in advance are computed in advance No need to formulate entire queryNo need to formulate entire query
OLAP: Data CubeOLAP: Data CubeOLAP uses data in multidimensional format (e.g., data cubes) OLAP uses data in multidimensional format (e.g., data cubes)
to facilitate query and response time.to facilitate query and response time.
Date
Product
Countrysum
sum TV
VCRPC
1Qtr 2Qtr 3Qtr 4Qtr
U.S.A
Canada
Mexico
sum
Overall sales of TV’s in the USin 3rd quarter
OLAP: Data Cube OperationsOLAP: Data Cube Operations
SlicingSlicing: : Selecting the dimensions of the cube to be viewed. Selecting the dimensions of the cube to be viewed.
Example: View “Sales volume” as a function of “Example: View “Sales volume” as a function of “Product ”Product ” by by ““CountryCountry “by “ “by “Quarter”Quarter”
DicingDicing: : Specifying the values along one or more Specifying the values along one or more
dimensions.dimensions. Example: View “Example: View “Sales volume” for “Product=PC” by Sales volume” for “Product=PC” by
““CountryCountry “by “Q “by “Quarter”uarter”
OLAP: Data Cube OperationsOLAP: Data Cube Operations
Drilling downDrilling down: : from higher level from higher level aggregation to lower level aggregation or aggregation to lower level aggregation or detailed data (Viewing by detailed data (Viewing by “state” after “state” after viewing by “region” )viewing by “region” )
Rolling-upRolling-up: Summarize data by climbing : Summarize data by climbing up hierarchy or by dimension reduction up hierarchy or by dimension reduction (E.g., viewing by “region” instead of by (E.g., viewing by “region” instead of by “state”)“state”)
Cube Operations IllustratedCube Operations Illustrated
Rolling up
Drilling down
Actual ApplicationActual Application
Com.1
Query:Query: ““overall & detail production performance”overall & detail production performance”
• manufacturer: Com1manufacturer: Com1
• products: all productsproducts: all products
• date interval: 01-Jan-94 until 01-Jan-1999date interval: 01-Jan-94 until 01-Jan-1999
• source: USDAsource: USDA
Com.1
Com.1
Com.1
Lot#1
Lot#2
Lot#3
Contract Number 1
Contract Number 2
Contract Number 3
Data MiningData Mining
“Data Mining is the exploration and analysis
by automatic or semi-automatic means,
of large or small quantities of data
in order to discover meaningful patterns, trends and rules.”
Data Mining
Data Analysis Database
Statistics AI & ML Data Warehouse OLAP
Data AnalysisData Analysis
Classification
Regression
Clustering
Association
Sequence Analysis
Data Analysis (cont.)Data Analysis (cont.)
fX1
X2
X3
Y2
Input Variablesor
Independent Variablesor
Attributes or Descriptors
Output Variablesor
Dependent Variablesor
Classes or Targets
Y1
Y3
Numeric
Categorical
Crisp
Numeric
Categorical
Crisp
Regression
Classification
3, 4.5, 102, …
hot, cold, high, low, …
0, 1, yes, no, …
Modeling
Linear Modelsor
Non-linear Modelsor
A set of rules
Data Analysis (cont.)Data Analysis (cont.)
Age
Income
Clustering
1, chips, coke, chocolate2, gum, chips3, chips, coke4, …
Probability (chips, coke) ?Probability (chips, gum) ?
Association
Sequence Analysis
…ATCTTTAAGGGACTAAAATGCCATAAAAATCCATGGGAGAGACCCAAAAAA…
Xt-1 XtT
Data Analysis (cont.)Data Analysis (cont.)
Linear Discriminant Analysis Naïve Bayes / Bayesian Network OneR Neural Networks Decision Tree (ID3, C4.5, …) K-Nearest Neighbors (IB) Support Vector Machines (SVM) …
K-Mean Clustering Self Organizing Map Bayesian Clustering COBWEB…
Multiple Linear Regression Principal Components Regression Partial Least Square Neural Networks Regression Tree (CART, MARS, …) K-Nearest Neighbors (LWR) Support Vector Machines (SVR) …
A Priori Markov Chain Hidden Markov Models …
Classification Regression
Clustering Association & Sequence Analysis
ChallengesChallenges
Faster, more accurate and more scalable techniques
Incremental, on-line and real-time learning algorithms
Parallel and distributed data processing techniques
Data mining is an exciting and challenging field with the ability to solve many complex scientific and business problems.
OpportunitiesOpportunities
Data mining is a ‘top ten’ emerging technology
Data mining is finding increasing acceptance in science and business areas which need to analyze large amounts of data to discover trends and patterns which they could not otherwise find.