Upload
alisha-garrett
View
217
Download
0
Tags:
Embed Size (px)
Citation preview
Information Granularity
Refers to the level of detail of information Detailed (POS transaction) Course (Global sales totals)
Transactional vs. Analytical Information
Transactional information comes from a business process A bank deposit A credit card charge
Analytical information uses transactional data for the purposes of decision making Account balance trends Using credit card history to detect fraud
Information Dimensions
Information timeliness Obsolete information is useless Today’s information needs to be
provided in real time or near real time Information quality
Wrong information is useless Redundant information can be the
cause of errors Information must be complete
Data inconsistency and data integrity
Database Management
Characteristics Complex Databases often spread across multiple
servers Databases often spread across multiple
physical disks Fault tolerance is critical
Databases may be distributed
Database Vendors
The industry has consolidated IBM
DB2 Universal Oracle Microsoft
SQL Server Access
Sun (MySQL) Is now Oracle
Database Performance
Transaction Processing Performance Council provides standard benchmarks
TPC-C – Online transaction processing
TPC-E – Online brokerage transactions
TPC-H – Ad-hoc decision support TPC-W – Web / E-commerce
Database Performance (TPC-C)
Multiple transaction types Independent of software and
hardware Scalable Basis is online transaction
processing (OLTP)
1960s Data Management
These are legacy systems Batch processing
Characterized by traditional file processing
Data processing was sequential Not possible to directly locate a particular file
record Data dependent on the programs that
used the data Program data dependence
1970s Data Management
Batch processing gives way to on line transaction processing Files stored on disk rather than tape Any record can be located in the same amount
of time Technologies
Indexed Sequential Access Method (ISAM) Virtual Sequential Access Method (VSAM)
Direct Access files Use a hashing function to derive record keys
1980s Data Management
Databases are becoming commonplace
Personal computer databases are evolving DBase R-Base
1990s Data Management
Huge data stores and transaction processing capabilities
Distributed databases Object-oriented databases 6 Million+ transactions per second
Realities of a DBMS
Data centric rather than application centric Can be a repository for all an organization’s
data Databases tend to be centralized Queries get data from a DBMS
SQL is the standard query language Report generators create printed and Web-
based reports Applications interface with DBMS
Types of Databases
Database models include: Hierarchical database model – A tree-
based structure Network database model –
Mathematically, a directed graph Relational database model – stores
information in the form of logically related two-dimensional tables
Object-oriented databases
Elements of a Database
Logical view and physical view Users see and work with the logical
view Physical view is controlled by the
database management system itself
Entities and Attributes
Relational databases store information in tables (entities) Customer / order / product
Tables contain fields (attributes) Customer name, address
Keys
Each table has a primary key that uniquely identifies each record Natural keys have some meaning (stock
symbol) Artificial keys have no intrinsic meaning
(your R number) Foreign keys are used to link tables
in one-to-many relationships
Advantages of an RDMS (Scalability)
Database can scale to the terabyte or petabyte range NSA maintains 1.9 trillion telephone call
records Large databases can span several
servers and storage devices
Advantages of an RDBMS (Redundancy)
Databases can be configured to write duplicate (redundant) information Citibank
Journaling and checkpointing are supported
Advantages of an RDBMS (Integrity)
Relational integrity constraints are rules that apply to the relationships between tables
Business integrity constraints enforce business rules Not really a part of the DBMS itself
Advantages of an RDBMS (Information Security)
A DBMS supports advanced access rights By table and fields By time of day By location By row information
Data-driven Web Sites
Nearly all transactional Web sites rely on a database Amazon Your bank Any shopping cart application Ebay or Craig’s List Facebook and You Tube
Database Integration
Databases often need to be integrated Because of mergers and acquisitions Because of organizational changes
We are referring to connections to multiple databases
Data Warehouses (Introduction)
Central source for clean data May contain internal or external data
Use to spot hidden patterns in data May be integrated with operational
database Parts of a data warehouse are called data
marts Data warehouses contain an analytical
component
Cleansing Data
Data is often obtained from a myriad of sources External lists Internal databases Other databases
This data must be cleansed and sanitized to remove Redundancy / errors / etc…
Multidimensional Analysis
Data are often analyzed as 3-dimensional cubes
Cubes are then ‘sliced and diced’ to look at various layers
Database Design (Introduction)
In the systems process, we design before we implement Requirements specification Conceptual design Logical design Physical Design
Database Design Tools
Unified Modeling Language (UML) Visio Rational Rose
Entity relationship diagrams describes relationships between data
Normalization eliminates redundant data
Database Management HR
Database administrators Data managers Programmers and systems analysts Data security
Business Intelligence (Introduction)
Simply put, it’s internal and external data used to support better decision making
It’s challenging to sift through the mountains of data
It requires cross-functional collaboration between systems
More in the next chapter but we use ERP systems to improve business intelligence
Business Intelligence (Industries)
BI applies to all industries Retail and sales
Understanding procurement and distribution (SCM) / customers (CRM)
Banking Understand credit worthiness / fraud
behavior Insurance
Forecast claim risk and understand at – risk customers
Business Intelligence (Industries)
Airlines Routing planes / minimize turnaround
time (Southwest) Marketing
Demographics Sell based on known customer behavior
(Harrah’s) Amazon
Business Intelligence (Levels)
Operational Day-to-day operations (building a Dell)
Tactical Short term (Dell ordering supplies)
Strategic Long term organizational goals
The systems that provide BI typically do so at all levels
BI and Latency
From the time of acquisition, how long does it take to analyze (analysis latency)
Time to make a decision based on the analysis E-transactions significantly reduce
latency
Data Mining (Introduction)
Data gets mined (analyzed) from data contained in a data warehouse or data mart
Specialized tools are used to analyze data for ‘interesting nuggets’
Ways to mine Drill down (general to specific) Drill up (specific to general)
Data Mining (Clustering)
Cluster analysis groups data by trait or traits
Examples Don’t drink the water in Fallon Segment customers by zip codes
Data Mining (Association)
Answers the question “What traits are associated with other traits” When I stay at Harrah’s,
I gamble I eat at the Sage room
When I stay in Vegas, I gamble more
Data Mining (Statistical Analysis)
It’s basic statistics Analysis of variance Correlation coefficients Etc…
BI Benefits
We can understand what’s happening inside and outside a department Sales knows about product inventory
levels and production schedules Production knows about sales and sales
forecasts Finance knows about the sales
forecasts too This information is provided in near
real time