Upload
marian-reed
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
MPIIIDatabase Technologies
Relational Concepts
Data Warehouses & Marts
Queries, OLAP, Data Mining
Terms/Examples
• Database– a collection of related data. Usually organized according to
topics: e.g. customer info, products, transactions
• Database Management System (DBMS)– a program for creating & managing databases; ex. Oracle, MS-
Access, Sybase
DBMS - the program. Manages interaction with databases.
database - the collection of data.Created and defined to meet theneeds of the organization.
Client - makes requests of the DBMS server
request
response
Server - responds to client requests
A Simple Database
• File/Table– Customers
• Field/Column– 5 shown: CUSTID, FIRST, LAST, CITY, STATE
• Record/Row– 5 shown: one for each customer
CUSTID FIRST LAST CITY STATE …2001 John Gallaugher Newton MA …2002 Abby Johnson Boston MA …2003 Warren Buffet Omaha NE …2004 Peter Lynch Rockport MA …2005 Charles Schwab San Francisco CA …
FIRST LAST CITY STATE BUY/SELLSTOCK SHARES PRICE DATE TIMEJohn Gallaugher Newton MA Buy MSFT 1000 90 1/4 12/24/96 12:01 PMJohn Gallaugher Newton MA Buy INTC 2400 80 1/8 7/3/97 10:51 AMJohn Gallaugher Newton MA Sell IBM 3000 114 3/8 7/1/97 9:03 AMAbby Johnson Boston MA Sell IBM 3000 110 1/8 6/30/97 4:53 PMAbby Johnson Boston MA Sell INTC 2000 94 7/8 8/30/97 3:15 PMWarren Buffet Omaha NE Buy INTC 1500 90 3/8 7/2/97 11:27 AMWarren Buffet Omaha NE Buy IBM 1700 101 7/8 1/4/97 2:02 PMWarren Buffet Omaha NE Sell AAPL 1900 18 1/2 2/14/97 5:00 PMPeter Lynch Rockport MA Buy AAPL 2000 19 2/14/97 5:30 PMPeter Lynch Rockport MA Sell AAPL 10000 21 7/8 3/15/97 11:44 AMCharles Schwab San Francisco CA Buy MSFT 4500 101 1/8 1/15/97 12:38 AMCharles Schwab San Francisco CA Buy INTC 17000 80 1/8 7/2/97 4:53 PM
A More Complex Example
• Entry & Maintenance is complicated– redundant data exists, increases chance of error,
complicates updates/changes, takes up space
CUSTID FIRST LAST CITY STATE2001 John Gallaugher Newton MA2002 Abby Johnson Boston MA2003 Warren Buffet Omaha NE2004 Peter Lynch Rockport MA2005 Charles Schwab San Francisco CA
Normalize Data - Remove Redundancy
One
Many
CUSTID BUY/SELLSTOCK SHARES PRICE DATE TIME2001 Buy MSFT 1000 90 1/4 12/24/96 12:01 PM2001 Buy INTC 2400 80 1/8 7/3/97 10:51 AM2001 Sell IBM 3000 114 3/8 7/1/97 9:03 AM2002 Sell IBM 3000 110 1/8 6/30/97 4:53 PM2002 Sell INTC 2000 94 7/8 8/30/97 3:15 PM2003 Buy INTC 1500 90 3/8 7/2/97 11:27 AM2003 Buy IBM 1700 101 7/8 1/4/97 2:02 PM2003 Sell AAPL 1900 18 1/2 2/14/97 5:00 PM2004 Buy AAPL 2000 19 2/14/97 5:30 PM2004 Sell AAPL 10000 21 7/8 3/15/97 11:44 AM2005 Buy MSFT 4500 101 1/8 1/15/97 12:38 AM2005 Buy INTC 17000 80 1/8 7/2/97 4:53 PM
Customer Table
Transaction Table
Key Terms• Relational DBMS
– manages databases as a collection of files/tables in which all data relationships are represented by common values in related tables (referred to as keys).
– a relational system has the flexibility to take multiple files and generate a new file from the records that meet the matching criteria (join).
• SQL - Structured Query Language– Most popular relational database standard. Includes a
language for creating & manipulating data.
CUSTID FIRST LAST CITY STATE2001 John Gallaugher Newton MA2002 Abby Johnson Boston MA2003 Warren Buffet Omaha NE2004 Peter Lynch Rockport MA2005 Charles Schwab San Francisco CA
Now With More Data
One
Many
BROKID FIRST LAST …B001 Ivan Boesky …B002 Dennis Levine …B003 Michael Milken …
CUSTID BROKID BUY/SELLSTOCK SHARES PRICE DATE TIME2001 B003 Buy MSFT 1000 90 1/4 12/24/96 12:01 PM2001 B001 Buy INTC 2400 80 1/8 7/3/97 10:51 AM2001 B003 Sell IBM 3000 114 3/8 7/1/97 9:03 AM2002 B001 Sell IBM 3000 110 1/8 6/30/97 4:53 PM2002 B003 Sell INTC 2000 94 7/8 8/30/97 3:15 PM
… … … … … … … …
One
Many
Meta-Data
• Data that describes the characteristics of stored data• Enterprise Data Model
– consistent, cross-functional, shareable meta-data model– standardization increases flexibility & use (data to info)– facilitates the creation of data warehouses
Col. Name Length Type …CUSTID 4 Char …FIRST 10 Char …LAST 15 Char …CITY 15 Char …STATE 2 Char …… … … …
Col. Name Length Type …CUSTID 4 Char …BROKID 4 CharBUY/SELL 1 Bool …STOCK 4 Char …SHARES 8 Num …PRICE 6.2 Money …… … … …
Col. Name Length Type …BROKID 4 CharFIRST 10 Char …LAST 15 Char …… … … …
1
1
mm
Customer Table Transaction Table
Broker Table
Management Levels of IS
DSS
MIS
TPS
Strategic Planning
Management Control
Operational Control
Warehouses & Marts
• Data Warehouse– a database designed to support decision-making in an organization.
It is batch-updated and structured for fast online queries and exploration. Data warehouses may aggregate enormous amounts of data from many different operational systems.
• Data Mart– a database focused on addressing the concerns of a specific
problem or business unit (e.g. Marketing, Engineering). Size doesn’t define data marts, but they tend to be smaller than data warehouses.
Data Warehouses & Data Marts
TPS& other
operational systems
DataWarehouse
Data Mart(Marketing)
Data Mart(Engineering)
3rd party data
= query, OLAP, mining, etc.
= operational clients
Differing System Demands
network traffic & processor
demands
time
network traffic & processor
demands
time
Managerial Systems
Operational Systems
Transform Data from TPS to Warehouse
• Consolidate data– e.g. from multiple TPS around the country/world
• “Scrub” the data– keep definitions consistent (e.g. translate part
numbers/product names if they differ per country)
• Calculate fields (decrease processor load)• Summarize fields (decrease processor load)• De-normalize data (ease of use)
Calculated Fields
Customer Date Stock Shares Price TotalGallaugher 3/25/98 INTC 1000 76 1/2 76,500$ Johnson 3/26/98 AAPL 2500 23 1/4 58,125$ Buffet 3/27/98 MSFT 3000 84 252,000$
Customer Service Application:Customer support personTPS - focuses on customer infoTotal is calculated on the fly
Database Query Application:Marketing managerAggregate reporting of business intelligenceTotal calculated in advance
Query Tools & OLAP
• Query Tools– user-lead discovery. Can return individual records or summaries.
Requests are formulated in advance (e.g. “show me all delinquent accounts in the northeast region during Q1”).
• OLAP - Online Analytical Processing– user-lead discovery. Data is explored via “drill down” into the data
by selecting variables to summarize on. Results are usually reported in a cross-tab report or graph (e.g. “show me a tabular breakdown of sales by business unit, product type, and year”).
OLAP
• Online Analytical Processing. (example of cross-tab results presented below)
1. business unit
2. product type 3. year
Data Mining
• automated information discovery process, uncovers important patterns in existing data– can use neural networks or other approaches.
Requires ‘clean’, reliable, consistent data. Historical data must reflect the current environment.
• e.g. “What are the characteristics that identify when we are likely to lose a customer?”
Data Mining Uses
• Market Segmentation - e.g. Dayton Hudson
• Direct Marketing - e.g. Chase
• Market basket analysis - e.g. Wal-Mart
• Customer Churn - e.g. Fleet Bank
• Fraud Detection - e.g. Bank of America
• Cost Reduction Prospecting - e.g. Merk Medco.
Stupid Data-Miner Tricks
• Ad-Hoc Theories– when an oddity jumps out of the data, it’s tempting to develop
a theory for it. Sometimes findings are just statistical flukes.
• Using Too Many Variables– the more factors considered, the more likely a relationship
will be found - valid or not.
• Not Taking No for an Answer– it’s OK to stop looking if you can’t find anything. There are
no silver bullets.
MPIII
Internal & External Integration
Enterprise Resource Planning (ERP)
Challenges Facing IS Depts.
• Y2K & Legacy Systems
• Globalization (euro, currency issues)
• Rapid Technology Advancement– e.g. Client/Server & Internet
• IS Staffing & Retention
• Changing Organizational Structures– e.g. Owens Corning
• Tighter Integration with Buyers & Suppliers
Legacy SystemsMany firms have limited to no integration across
geographic areasfunctional areas (v-chain)
products, plants, & business units
Inbound Operations Outbound Marketing Servicelogistics logistics & Sales
Infrastructure: general mgmt, planning, finance, ISHRM: recruiting, hiring, training, and developmentTech. Development: R&DProcurement
BuyersSuppliers
External Integration
• EDI - Electronic Data Interchange– uses standard formats to pass data between disparate systems– US format - X.12, European format - UN/EDIFACT
• Cost Savings– paper order = $50 - $70– EDI order = $2.50 (VANs / private networks)– I-EDI order = less than $1 (Internet)
• XML - eXtensible Markup Language– tagging language for the web
What is ERP?• ERP - Enterprise Resource Planning Software
– sometimes called Enterprise Applications, Enterprise Packages, Enterprise Suites, or Enterprise Systems
– connects all of the information which flows through a company to a single integrated set of systems
– implemented in modules which can be integrated (all at once or at a later date) e.g. Financials, Logistics, HR
– may work with a wide variety of databases, hardware, and operating systems
• Leading Vendors– SAP, Oracle, JD Edwards, Baan, Peoplesoft
ERP in Action
SalesInventoryProduction
Staffing
PurchasingOrder Tracking Planning
Source: BusinessWeek Int’l, 1997
The Benefits
• Internal & external integration– squeeze out waste & enable strategies
• Standard software enables - – inter-organizational systems (easier if buyers & suppliers use
the same system, e.g. petrochem. ind.)– broad selection of add-on packages (e.g. data warehouses, etc.)
• Package upgrading and new technology development is handled by vendor
• Speed of deployment
The Risks
• Staff retention (e.g. Grace case)
• Tied to a single vendor
• Flexibility limited by options offered by the vendor– may inappropriately force generic processes
– may inappropriate force structure
• Complexity - particularly regarding mapping and standardizing processes across the organization.
Make vs. Buy
Adapted from Applegate et al., p. 61.
Make BuyComp. Adv. Will the proposed system offer
proprietary comp. adv.?Yes No
Security Is the process or data highly confidential?
Yes No
IT Competency Is IT a core competency? Yes NoTech. Skill Does the firm have sufficient
expertise with tech.?Yes No
Suitability/Fit Is a suitable partner/package available?
No Yes
Cost/Benefit Is the package cheaper than in-house dev.?
No Yes
Time Is there sufficient time to develop the system
Yes No
Successful Deployment of ERP• Business Case
– benchmark, cost justify (e.g. unplug mainframes)
• Leadership– from the highest levels (e.g. success at Owens Corning, failure at
Westinghouse)
• Staffing– largely from business, not IT (users know the process)– ‘compensation handcuffs’ (e.g. end of deployment bonuses, training
payback agreements)– experienced consultants - check refs., clients
• Execute with proven methodologies