Upload
melvin-evans
View
219
Download
5
Embed Size (px)
Citation preview
2
Learning Objectives
Explain the importance of implementing data resource management processes and technologies in an organization.
Understand the advantages of a database management approach to managing the data resources of a business.
3
Learning Objectives (continued)
Explain how database management software helps business professionals and supports the operations and management of a business.
Illustrate each of the following concepts:Major types of databasesData warehouses and data miningLogical data elementsFundamental database structuresDatabase access methodsDatabase development
5
Data Resource Management
A managerial activityApplies information systems technology to
managing data resources to meet needs of business stakeholders.
6
Foundation Data Concepts
Levels of dataCharacter
Single alphabetical, numeric, or other symbol
FieldGroupings of charactersRepresents an attribute of some entity
7
Foundation Data Concepts (continued)
RecordsRelated fields of dataCollection of attributes that describe an
entityFixed-length or variable-length
8
Foundation Data Concepts (continued)
Files (table)A group of related recordsClassified by
Primary useType of datapermanence
9
Foundation Data Concepts (continued)
DatabaseIntegrated collection of logically related
data elementsConsolidates records into a common pool
of data elementsData is independent of the application
program using them and type of storage device
11
Types of Databases
OperationalSupports business processes and operationsAlso called subject-area databases,
transaction databases, and production databases
12
Types of Databases (continued)
DistributedReplicated and distributed copies or parts of
databases on network servers at a variety of sites.
Done to improve database performance and security
13
Types of Databases (continued)
ExternalAvailable for a fee from commercial sources
or with or without charge on the Internet or World Wide Web
HypermediaHyperlinked pages of multimedia
14
Data Warehouses and Data Mining
Data warehouseStores data extracted from operational,
external, or other databases of an organization
Central source of “structured” dataMay be subdivided into data marts
15
Data Warehouses and Data Mining (continued)
Data miningA major use of data warehouse databasesData is analyzed to reveal hidden
correlations, patterns, and trends
16
Database Management Approach
Consolidates data records and objects into databases that can be accessed by many different application programs
17
Database Management Approach (continued)
Database Management SystemSoftware interface between users and
databasesControls creation, maintenance, and use of
the database
19
Database Management Approach (continued)
Database InterrogationQuery
Supports ad hoc requestsTells the software how you want to
organize the dataSQL queriesGraphical (GUI) & natural queries
20
Database Management Approach (continued)
Report GeneratorTurns results of query into a useable
report
Database MaintenanceUpdating and correcting data
21
Database Management Approach (continued)
Application DevelopmentData manipulation languageData entry screens, forms, reports, or web
pages
22
Implementing Data Resource Management
Database AdministrationDevelop and maintain the data dictionaryDesign and monitor performance of
databasesEnforce database use and security standards
23
Implementing Data Resource Management (continued)
Data PlanningCorporate planning and analysis functionDeveloping the overall data architecture
24
Implementing Data Resource Management (continued)
Data AdministrationStandardize collection, storage, and
dissemination of data to end usersFocused on supporting business processes
and strategic business objectivesMay include developing policy and setting
standards
25
Implementing Data Resource Management (continued)
ChallengesTechnologically complexVast amounts of dataVulnerability to fraud, errors, and failures
27
Database Structures
HierarchicalTreelikeOne-to-many relationshipUsed for structured, routine types of
transaction processing
28
Database Structures (continued)
NetworkMore complexMany-to-many relationshipMore flexible but doesn’t support ad hoc
requests well
29
Database Structures (continued)
RelationalData elements stored in simple tablesCan link data elements from various tablesVery supportive of ad hoc requests but
slower at processing large amounts of data than hierarchical or network models
30
Database Structures (continued)
Multi-DimensionalA variation of the relational modelCubes of data and cubes within cubesPopular for online analytical processing
(OLAP) applications
32
Database Structures (continued)
Object-orientedKey technology of multimedia web-based
applicationsGood for complex, high-volume applications
34
Accessing Databases
Key fields (primary key)A field unique to each record so it can be
distinguished from all other records in a table
35
Accessing Databases (continued)
Sequential accessData is stored and accessed in a sequence
according to a key fieldGood for periodic processing of a large
volume of data, but updating with new transactions can be troublesome
36
Accessing Databases (continued)
Direct accessMethods
Key transformationIndexIndexed sequential access
37
Database Development
Data dictionaryDirectory containing metadata (data about
data)StructureData elementsInterrelationshipsInformation regarding access and useMaintenance & security issues
38
Database Development (continued)
Data Planning & Database DesignPlanning & Design Process
Enterprise modelEntity relationship diagrams (ERDs)Data modeling
Develop logical framework for the physical design
39
Discussion Questions
How should an e-business enterprise store, access, and distribute data & information about their internal operations & external environment?
What roles do database management, data administration, and data planning play in managing data as a business resource?
40
Discussion Questions (continued)
What are the advantages of a database management approach to organizing, accessing, and managing an organization’s data resources?
What is the role of a database management system in an e-business information system?
41
Discussion Questions (continued)
Databases of information about a firm’s internal operations were formerly the only databases that were considered to be important to a business. What other kinds of databases are important for a business today?
What are the benefits and limitations of the relational database model for business applications?
42
Discussion Questions (continued)
Why is the object-oriented database model gaining acceptance for developing applications and managing the hypermedia databases at business websites?
How have the Internet, intranets, extranets, and the World Wide Web affected the types and uses of data resources available to business end users?
43
Real World Case 1 – IBM versus Oracle
What key business strategies did Janet Perna implement to help IBM catch up to Oracle in the database management software market?
What is the business case for both IBM’s and Oracle’s product strategy for their database software?
44
Real World Case 1 (continued)
Which approach would you recommend to a company seeking a database system today?
What do you see as the key factor to IBM’s success?
45
Real World Case 1 (continued)
The case states that “database software has become more of a commodity.” Do you agree?
46
Real World Case 2 – Experian Automotive
How do the database software tools discussed in this case help companies exploit their data resources?
What is the business value of the automotive database created by Experian?
47
Real World Case 2 (continued)
What other business opportunities could you recommend to Experian that would capitalize on their automotive database?
The case states that Experian’s automotive database “has raised the hackles of privacy advocates.” What legitimate privacy concerns and safeguard suggestions might be raised about this database and its use?
48
Real World Case 3 – Shell Exploration
Why do companies still have problems with the quality of the data resources stored in their business information systems?
What is a “data silo?”
49
Real World Case 3 (continued)
How do data warehouse approaches help companies like Shell and OshKosh meet their data resource management challenges?
What business benefits can companies derive from a data warehouse approach?
50
Real World Case 4 – BlueCross BlueShield & Warner Bros.
What is a storage area network? Why are so many companies installing SANs?
What are the reasons for the quick payback on SAN investments?
51
Real World Case 4 (continued)
What are the challenges and alternatives to SANs as a data storage technology?
What are some advantages of SANs?
52
Real World Case 5 – Sherwin-Williams & Krispy Kreme
Tips for Managing External DataPurchase external data from a reliable
source that will do most of the refining for you and will work with you on contingency plans.
Run a test load first. A load of test data can pave the way for accurate production loads.
53
Real World Case 5 (continued)
Managing external data (continued)Don’t collect data until business and IT staff
have agreed on the amount, frequency, format, and content of the data you need.
Don’t acquire more data or use more data sources than you really need.
54
Real World Case 5 (continued)
Managing external data (continued)Don’t mingle external and homegrown data
without adding unique identifiers to each record, in case you need to pull it out.
Don’t overestimate the data’s integrity. Nothing beats direct customer contact and tactical details behind the data.
55
Real World Case 5 (continued)
What challenges in acquiring and using data from external sources are identified in this case?
Do you prefer the Sherwin-Williams or Krispy Kreme approach to acquiring external data?
56
Real World Case 5 (continued)
What other sources of external data might a business use to gain valuable marketing and competitive intelligence?
58
What Is a Database System?
Database: a very large, integrated collection of data.
Models a real-world enterprise Entities (e.g., teams, games) Relationships (e.g., The
Forty-Niners are playing in The Superbowl) More recently, also includes active components , often called
“business logic”. (e.g., the BCS ranking system)
A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases.
61
Other Ways Databases Make Life Better?“Players could finally
sign up for the Star Wars Galaxies game last week as Sony opened up registration to the public.”
“Once players got in to the game they found that the game servers were offline because of database problems.”
“Some players spent hours tuning their in-game characters only to find that crashes deleted all their hard work.”
Source: BBC News Online, July 1, 2003.
63
Is the WWW a DBMS? Fairly sophisticated search available
crawler indexes pages on the web Keyword-based search for pages
But, currently data is mostly unstructured and untyped search only:
can’t modify the data can’t get summaries, complex combinations of data
few guarantees provided for freshness of data, consistency across data items, fault tolerance, …
Web sites typically have a DBMS in the background to provide these functions.
The picture is changing New standards e.g., XML, Semantic Web can help data modeling Research groups (e.g., at Berkeley) are working on providing some
of this functionality across multiple web sites.
=
64
“Search” vs. Query
What if you wanted to find out which actors donated to John Kerry’s presidential campaign?
Try “actors donated to john kerry” in your favorite search engine.
66
Why Study Databases??
Shift from computation to information always true for corporate computing Web made this point for personal computing more and more true for scientific computing
Need for DBMS has exploded in the last years Corporate: retail swipe/clickstreams, “customer relationship
mgmt”, “supply chain mgmt”, “data warehouses”, etc. Scientific: digital libraries, Human Genome project, NASA
Mission to Planet Earth, physical sensors, grid physics network DBMS encompasses much of CS in a practical discipline
OS, languages, theory, AI, multimedia, logic Yet traditional focus on real-world apps
?
67
What’s the intellectual content?
representing informationdata modeling
languages and systems for querying datacomplex queries with real semantics*over massive data sets
concurrency control for data manipulationcontrolling concurrent access ensuring transactional semantics
reliable data storagemaintain data semantics even if you pull the plug
* semantics: the meaning or relationship of meanings of a sign or set of signs
68
Describing Data: Data ModelsA data model is a collection of concepts for
describing data.
A schema is a description of a particular collection of data, using a given data model.
The relational model of data is the most widely used model today.Main concept: relation, basically a table with
rows and columns.Every relation has a schema, which describes
the columns, or fields.
69
Levels of Abstraction
Views describe how users see the data.
Conceptual schema defines
logical structure
Physical schema describes the files and indexes used.
(sometimes called the ANSI/SPARC model)
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
Users
70
Example: University Database
Conceptual schema: Students(sid: string, name: string,
login: string, age: integer, gpa:real) Courses(cid: string, cname:string,
credits:integer) Enrolled(sid:string, cid:string,
grade:string)External Schema (View):
Course_info(cid:string,enrollment:integer)Physical schema:
Relations stored as unordered files. Index on first column of Students.
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
71
Data IndependenceApplications insulated from
how data is structured and stored.
Logical data independence: Protection from changes in logical structure of data.
Physical data independence: Protection from changes in physical structure of data.
Q: Why are these particularly important for DBMS?
Physical Schema
Conceptual Schema
View 1 View 2 View 3
DB
72
Queries, Query Plans, and Operators
System handles query plan generation & optimization; ensures correct execution.
SELECT eid, ename, titleFROM Emp EWHERE E.sal > $50K
SELECT E.loc, AVG(E.sal)FROM Emp EGROUP BY E.locHAVING Count(*) > 5
SELECT COUNT DISTINCT (E.eid)FROM Emp E, Proj P, Asgn AWHERE E.eid = A.eid
AND P.pid = A.pidAND E.loc <> P.loc
• Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …
EmployeesEmployeesProjectsProjects
AssignmentsAssignments
EmpEmp
SelectSelect
EmpEmp
Group(agg)Group(agg)
HavingHaving
EmpEmp
Count distinctCount distinct
AsgnAsgn
JoinJoin
JoinJoin
ProjProj
73
Concurrency Control
Concurrent execution of user programs: key to good DBMS performance. Disk accesses frequent, pretty slow Keep the CPU working on several programs concurrently.
Interleaving actions of different programs: trouble! e.g., account-transfer & print statement at same time
DBMS ensures such problems don’t arise. Users/programmers can pretend they are using a single-user system.
(called “Isolation”) Thank goodness! Don’t have to program “very, very carefully”.
74
Transactions: ACID Properties Key concept is a transaction: a sequence of database actions
(reads/writes).
DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact.
Each transaction, executed completely, must take the DB between consistent states or must not run at all.
DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes. Note: can specify simple integrity constraints on the data. The DBMS
enforces these. Beyond this, the DBMS does not understand the semantics of the
data. Ensuring that a single transaction (run alone) preserves consistency is
largely the user’s responsibility!
77
Structure of a DBMS
A typical DBMS has a layered architecture.
The figure does not show the concurrency control and recovery components.
Each database system has its own variations.
Query Optimizationand Execution
Relational Operators
Files and Access Methods
Buffer Management
Disk Space Management
DB
These layersmust considerconcurrencycontrol andrecovery
78
Advantages of a DBMS
Data independence Efficient data access Data integrity & security Data administration Concurrent access, crash recovery Reduced application development time So why not use them always?
Expensive/complicated to set up & maintain This cost & complexity must be offset by need General-purpose, not suited for special-purpose tasks (e.g. text search!)
79
…must understand how a DBMS works
Databases make these folks happy ... DBMS vendors, programmers
Oracle, IBM, MS, Sybase, … End users in many fields
Business, education, science, … DB application programmers
Build enterprise applications on top of DBMSs Build web services that run off DBMSs
Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve
80
Summary (part 1)
DBMS used to maintain, query large datasets. can manipulate data and exploit semantics
Other benefits include: recovery from system crashes, concurrent access, quick application development, data integrity and security.
Levels of abstraction provide data independence Key when dapp/dt << dplatform/dt