79
Module 3 The concept of data processing Major issues in database management

1 Module 3 The concept of data processing Major issues in database management

Embed Size (px)

Citation preview

1

Module

3

The concept of data processing

Major issues in database management

2

Learning Objectives

Explain the importance of implementing data resource management processes and technologies in an organization.

Understand the advantages of a database management approach to managing the data resources of a business.

3

Learning Objectives (continued)

Explain how database management software helps business professionals and supports the operations and management of a business.

Illustrate each of the following concepts:Major types of databasesData warehouses and data miningLogical data elementsFundamental database structuresDatabase access methodsDatabase development

4

Section I

Managing Data Resources

5

Data Resource Management

A managerial activityApplies information systems technology to

managing data resources to meet needs of business stakeholders.

6

Foundation Data Concepts

Levels of dataCharacter

Single alphabetical, numeric, or other symbol

FieldGroupings of charactersRepresents an attribute of some entity

7

Foundation Data Concepts (continued)

RecordsRelated fields of dataCollection of attributes that describe an

entityFixed-length or variable-length

8

Foundation Data Concepts (continued)

Files (table)A group of related recordsClassified by

Primary useType of datapermanence

9

Foundation Data Concepts (continued)

DatabaseIntegrated collection of logically related

data elementsConsolidates records into a common pool

of data elementsData is independent of the application

program using them and type of storage device

10

Foundation Data Concepts (continued)

Logical Data Elements

11

Types of Databases

OperationalSupports business processes and operationsAlso called subject-area databases,

transaction databases, and production databases

12

Types of Databases (continued)

DistributedReplicated and distributed copies or parts of

databases on network servers at a variety of sites.

Done to improve database performance and security

13

Types of Databases (continued)

ExternalAvailable for a fee from commercial sources

or with or without charge on the Internet or World Wide Web

HypermediaHyperlinked pages of multimedia

14

Data Warehouses and Data Mining

Data warehouseStores data extracted from operational,

external, or other databases of an organization

Central source of “structured” dataMay be subdivided into data marts

15

Data Warehouses and Data Mining (continued)

Data miningA major use of data warehouse databasesData is analyzed to reveal hidden

correlations, patterns, and trends

16

Database Management Approach

Consolidates data records and objects into databases that can be accessed by many different application programs

17

Database Management Approach (continued)

Database Management SystemSoftware interface between users and

databasesControls creation, maintenance, and use of

the database

18

Database Management Approach (continued)

19

Database Management Approach (continued)

Database InterrogationQuery

Supports ad hoc requestsTells the software how you want to

organize the dataSQL queriesGraphical (GUI) & natural queries

20

Database Management Approach (continued)

Report GeneratorTurns results of query into a useable

report

Database MaintenanceUpdating and correcting data

21

Database Management Approach (continued)

Application DevelopmentData manipulation languageData entry screens, forms, reports, or web

pages

22

Implementing Data Resource Management

Database AdministrationDevelop and maintain the data dictionaryDesign and monitor performance of

databasesEnforce database use and security standards

23

Implementing Data Resource Management (continued)

Data PlanningCorporate planning and analysis functionDeveloping the overall data architecture

24

Implementing Data Resource Management (continued)

Data AdministrationStandardize collection, storage, and

dissemination of data to end usersFocused on supporting business processes

and strategic business objectivesMay include developing policy and setting

standards

25

Implementing Data Resource Management (continued)

ChallengesTechnologically complexVast amounts of dataVulnerability to fraud, errors, and failures

26

Section II

Technical Foundations of Database Management

27

Database Structures

HierarchicalTreelikeOne-to-many relationshipUsed for structured, routine types of

transaction processing

28

Database Structures (continued)

NetworkMore complexMany-to-many relationshipMore flexible but doesn’t support ad hoc

requests well

29

Database Structures (continued)

RelationalData elements stored in simple tablesCan link data elements from various tablesVery supportive of ad hoc requests but

slower at processing large amounts of data than hierarchical or network models

30

Database Structures (continued)

Multi-DimensionalA variation of the relational modelCubes of data and cubes within cubesPopular for online analytical processing

(OLAP) applications

31

Database Structures (continued)

32

Database Structures (continued)

Object-orientedKey technology of multimedia web-based

applicationsGood for complex, high-volume applications

33

Database Structures (continued)

34

Accessing Databases

Key fields (primary key)A field unique to each record so it can be

distinguished from all other records in a table

35

Accessing Databases (continued)

Sequential accessData is stored and accessed in a sequence

according to a key fieldGood for periodic processing of a large

volume of data, but updating with new transactions can be troublesome

36

Accessing Databases (continued)

Direct accessMethods

Key transformationIndexIndexed sequential access

37

Database Development

Data dictionaryDirectory containing metadata (data about

data)StructureData elementsInterrelationshipsInformation regarding access and useMaintenance & security issues

38

Database Development (continued)

Data Planning & Database DesignPlanning & Design Process

Enterprise modelEntity relationship diagrams (ERDs)Data modeling

Develop logical framework for the physical design

39

Discussion Questions

How should an e-business enterprise store, access, and distribute data & information about their internal operations & external environment?

What roles do database management, data administration, and data planning play in managing data as a business resource?

40

Discussion Questions (continued)

What are the advantages of a database management approach to organizing, accessing, and managing an organization’s data resources?

What is the role of a database management system in an e-business information system?

41

Discussion Questions (continued)

Databases of information about a firm’s internal operations were formerly the only databases that were considered to be important to a business. What other kinds of databases are important for a business today?

What are the benefits and limitations of the relational database model for business applications?

42

Discussion Questions (continued)

Why is the object-oriented database model gaining acceptance for developing applications and managing the hypermedia databases at business websites?

How have the Internet, intranets, extranets, and the World Wide Web affected the types and uses of data resources available to business end users?

43

Real World Case 1 – IBM versus Oracle

What key business strategies did Janet Perna implement to help IBM catch up to Oracle in the database management software market?

What is the business case for both IBM’s and Oracle’s product strategy for their database software?

44

Real World Case 1 (continued)

Which approach would you recommend to a company seeking a database system today?

What do you see as the key factor to IBM’s success?

45

Real World Case 1 (continued)

The case states that “database software has become more of a commodity.” Do you agree?

46

Real World Case 2 – Experian Automotive

How do the database software tools discussed in this case help companies exploit their data resources?

What is the business value of the automotive database created by Experian?

47

Real World Case 2 (continued)

What other business opportunities could you recommend to Experian that would capitalize on their automotive database?

The case states that Experian’s automotive database “has raised the hackles of privacy advocates.” What legitimate privacy concerns and safeguard suggestions might be raised about this database and its use?

48

Real World Case 3 – Shell Exploration

Why do companies still have problems with the quality of the data resources stored in their business information systems?

What is a “data silo?”

49

Real World Case 3 (continued)

How do data warehouse approaches help companies like Shell and OshKosh meet their data resource management challenges?

What business benefits can companies derive from a data warehouse approach?

50

Real World Case 4 – BlueCross BlueShield & Warner Bros.

What is a storage area network? Why are so many companies installing SANs?

What are the reasons for the quick payback on SAN investments?

51

Real World Case 4 (continued)

What are the challenges and alternatives to SANs as a data storage technology?

What are some advantages of SANs?

52

Real World Case 5 – Sherwin-Williams & Krispy Kreme

Tips for Managing External DataPurchase external data from a reliable

source that will do most of the refining for you and will work with you on contingency plans.

Run a test load first. A load of test data can pave the way for accurate production loads.

53

Real World Case 5 (continued)

Managing external data (continued)Don’t collect data until business and IT staff

have agreed on the amount, frequency, format, and content of the data you need.

Don’t acquire more data or use more data sources than you really need.

54

Real World Case 5 (continued)

Managing external data (continued)Don’t mingle external and homegrown data

without adding unique identifiers to each record, in case you need to pull it out.

Don’t overestimate the data’s integrity. Nothing beats direct customer contact and tactical details behind the data.

55

Real World Case 5 (continued)

What challenges in acquiring and using data from external sources are identified in this case?

Do you prefer the Sherwin-Williams or Krispy Kreme approach to acquiring external data?

56

Real World Case 5 (continued)

What other sources of external data might a business use to gain valuable marketing and competitive intelligence?

57

CS 317 - Data Management and Information Processing

58

What Is a Database System?

Database: a very large, integrated collection of data.

Models a real-world enterprise Entities (e.g., teams, games) Relationships (e.g., The

Forty-Niners are playing in The Superbowl) More recently, also includes active components , often called

“business logic”. (e.g., the BCS ranking system)

A Database Management System (DBMS) is a software system designed to store, manage, and facilitate access to databases.

59

Database Systems: Then

60

Database Systems: Today

From Friendster.com on-line tour

61

Other Ways Databases Make Life Better?“Players could finally

sign up for the Star Wars Galaxies game last week as Sony opened up registration to the public.”

“Once players got in to the game they found that the game servers were offline because of database problems.”

“Some players spent hours tuning their in-game characters only to find that crashes deleted all their hard work.”

Source: BBC News Online, July 1, 2003.

62

Other databases you may use

63

Is the WWW a DBMS? Fairly sophisticated search available

crawler indexes pages on the web Keyword-based search for pages

But, currently data is mostly unstructured and untyped search only:

can’t modify the data can’t get summaries, complex combinations of data

few guarantees provided for freshness of data, consistency across data items, fault tolerance, …

Web sites typically have a DBMS in the background to provide these functions.

The picture is changing New standards e.g., XML, Semantic Web can help data modeling Research groups (e.g., at Berkeley) are working on providing some

of this functionality across multiple web sites.

=

64

“Search” vs. Query

What if you wanted to find out which actors donated to John Kerry’s presidential campaign?

Try “actors donated to john kerry” in your favorite search engine.

65

A “Database Query” Approach

66

Why Study Databases??

Shift from computation to information always true for corporate computing Web made this point for personal computing more and more true for scientific computing

Need for DBMS has exploded in the last years Corporate: retail swipe/clickstreams, “customer relationship

mgmt”, “supply chain mgmt”, “data warehouses”, etc. Scientific: digital libraries, Human Genome project, NASA

Mission to Planet Earth, physical sensors, grid physics network DBMS encompasses much of CS in a practical discipline

OS, languages, theory, AI, multimedia, logic Yet traditional focus on real-world apps

?

67

What’s the intellectual content?

representing informationdata modeling

languages and systems for querying datacomplex queries with real semantics*over massive data sets

concurrency control for data manipulationcontrolling concurrent access ensuring transactional semantics

reliable data storagemaintain data semantics even if you pull the plug

* semantics: the meaning or relationship of meanings of a sign or set of signs

68

Describing Data: Data ModelsA data model is a collection of concepts for

describing data.

A schema is a description of a particular collection of data, using a given data model.

The relational model of data is the most widely used model today.Main concept: relation, basically a table with

rows and columns.Every relation has a schema, which describes

the columns, or fields.

69

Levels of Abstraction

Views describe how users see the data.

Conceptual schema defines

logical structure

Physical schema describes the files and indexes used.

(sometimes called the ANSI/SPARC model)

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

Users

70

Example: University Database

Conceptual schema: Students(sid: string, name: string,

login: string, age: integer, gpa:real) Courses(cid: string, cname:string,

credits:integer) Enrolled(sid:string, cid:string,

grade:string)External Schema (View):

Course_info(cid:string,enrollment:integer)Physical schema:

Relations stored as unordered files. Index on first column of Students.

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

71

Data IndependenceApplications insulated from

how data is structured and stored.

Logical data independence: Protection from changes in logical structure of data.

Physical data independence: Protection from changes in physical structure of data.

Q: Why are these particularly important for DBMS?

Physical Schema

Conceptual Schema

View 1 View 2 View 3

DB

72

Queries, Query Plans, and Operators

System handles query plan generation & optimization; ensures correct execution.

SELECT eid, ename, titleFROM Emp EWHERE E.sal > $50K

SELECT E.loc, AVG(E.sal)FROM Emp EGROUP BY E.locHAVING Count(*) > 5

SELECT COUNT DISTINCT (E.eid)FROM Emp E, Proj P, Asgn AWHERE E.eid = A.eid

AND P.pid = A.pidAND E.loc <> P.loc

• Issues: view reconciliation, operator ordering, physical operator choice, memory management, access path (index) use, …

EmployeesEmployeesProjectsProjects

AssignmentsAssignments

EmpEmp

SelectSelect

EmpEmp

Group(agg)Group(agg)

HavingHaving

EmpEmp

Count distinctCount distinct

AsgnAsgn

JoinJoin

JoinJoin

ProjProj

73

Concurrency Control

Concurrent execution of user programs: key to good DBMS performance. Disk accesses frequent, pretty slow Keep the CPU working on several programs concurrently.

Interleaving actions of different programs: trouble! e.g., account-transfer & print statement at same time

DBMS ensures such problems don’t arise. Users/programmers can pretend they are using a single-user system.

(called “Isolation”) Thank goodness! Don’t have to program “very, very carefully”.

74

Transactions: ACID Properties Key concept is a transaction: a sequence of database actions

(reads/writes).

DBMS ensures atomicity (all-or-nothing property) even if system crashes in the middle of a Xact.

Each transaction, executed completely, must take the DB between consistent states or must not run at all.

DBMS ensures that concurrent transactions appear to run in isolation. DBMS ensures durability of committed Xacts even if system crashes. Note: can specify simple integrity constraints on the data. The DBMS

enforces these. Beyond this, the DBMS does not understand the semantics of the

data. Ensuring that a single transaction (run alone) preserves consistency is

largely the user’s responsibility!

77

Structure of a DBMS

A typical DBMS has a layered architecture.

The figure does not show the concurrency control and recovery components.

Each database system has its own variations.

Query Optimizationand Execution

Relational Operators

Files and Access Methods

Buffer Management

Disk Space Management

DB

These layersmust considerconcurrencycontrol andrecovery

78

Advantages of a DBMS

Data independence Efficient data access Data integrity & security Data administration Concurrent access, crash recovery Reduced application development time So why not use them always?

Expensive/complicated to set up & maintain This cost & complexity must be offset by need General-purpose, not suited for special-purpose tasks (e.g. text search!)

79

…must understand how a DBMS works

Databases make these folks happy ... DBMS vendors, programmers

Oracle, IBM, MS, Sybase, … End users in many fields

Business, education, science, … DB application programmers

Build enterprise applications on top of DBMSs Build web services that run off DBMSs

Database administrators (DBAs) Design logical/physical schemas Handle security and authorization Data availability, crash recovery Database tuning as needs evolve

80

Summary (part 1)

DBMS used to maintain, query large datasets. can manipulate data and exploit semantics

Other benefits include: recovery from system crashes, concurrent access, quick application development, data integrity and security.

Levels of abstraction provide data independence Key when dapp/dt << dplatform/dt

81

Summary, cont.

DBAs, DB developers the bedrock of the informationeconomy

• DBMS R&D represents a broad, fundamental branch of the science of computation