28
PPT Slides by Dr. Craig Tyran & Kraig Pencil Data Management & Data Warehouses MIS 320 Kraig Pencil Summer 2014 1

Data Management & Data Warehouses

  • Upload
    deliz

  • View
    42

  • Download
    0

Embed Size (px)

DESCRIPTION

Data Management & Data Warehouses. MIS 320 Kraig Pencil Summer 2014. Game Plan. Introduction Why use a relational database? Database management systems Data warehouses Data mining Data marts. A. Why use a relational database?. - PowerPoint PPT Presentation

Citation preview

Page 1: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Data Management & Data Warehouses

MIS 320Kraig Pencil

Summer 2014

1

Page 2: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Game Plan

• Introduction• Why use a relational database?• Database management systems• Data warehouses• Data mining• Data marts

2

Page 3: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

A. Why use a relational database?1. A database sounds great, but why don’t we just store all our data in one

big table in an Excel spreadsheet?– Example: Can you foresee any hassles or potential difficulties associated

with entering/storing order information in the following Excel table?

3

Page 4: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

A. Why use a relational database?1. Why don’t we just store our data in one spreadsheet table?

(cont.)– Potential problems

• May have “redundant” data entry• Potential for data entry errors (different/wrong phone

numbers)• Updates can be a hassle/inefficient (e.g., change phone no)

– Solution• “Normalize” the data … Break up the table into a set of linked tables in a data

base (instead of having one spreadsheet)– See example 4

Page 5: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Example: Normalized Tables(and the advantages of a database)

Questions:

a) Any unneeded redundancy?

b) Is it now efficient to update customer info?

c) Where is the foreign key?

5

Page 6: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Example: Non-Normalized Data Table for an Auto Shop (Rainer & Turban, Fig 4.6)

Examples of redundancy

Page 7: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

B. Database Management Systems1. What is a “database management system”

(DBMS)? SW that allows one to create, store, organize,

manage, and use data• Example of a DBMS?

2. Key components– Data Definition subsystem– Data Manipulation subsystem– Application Generation subsystem– Data Administration subsystem– DBMS engine

7

Page 8: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

DBMS Components

8

Lab Tutorials 1,2

Lab Tutorials 3,5

Lab Tutorials 4,6

Page 9: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

B. Database Management Systems3. Examples of DBMS components in Access

Data Definition subsystem– Data dictionary (“Design view” for a table)

Data Manipulation subsystem: Move, change, and “ask questions”

– View of a table (“Datasheet view” for a table)

– Query-by-example (QBE) tool– Structured query language (SQL)

Application Generation subsystem: the “front end”– Design of forms and reports

Data Administration subsystem– Optimize query performance– Security settings with password

9

Page 10: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

B. Database Management Systems4. What aspects of data need to be specified?

– Lots of aspects!!!• Recall table creation in MS Access (Tutorials 1 & 2)

– Common data properties• Data “type” (number, text, date, etc.)• Description • Field size• Required/not required• Etc.

– An important reference for a database system: Data dictionary– Stores information about the data in a database

10

Page 11: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Access Example:

Information about the “Gender” field is specified in “Field Properties” section

Data “type” (number, text, date, etc.)

Description

Field size

Required/not required

Page 12: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Access Example: Data Manipulation Subsystem (Low Stock Products query)

QBE or SQL may be used to prepare a query. Which approach would be easier for most people?

Page 13: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Access Example: Application Generation Subsystem (Employer Information Form)

Page 14: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Access Example: Data Administration (Performance Analysis for a Database)

Page 15: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

B. Database Management Systems (cont.)

5. DBMS: Example products– You are very likely to work with – and

possibly help develop a database– using one or more of the following:• Small-Midsize DBMS: Microsoft Access,

dBase, Paradox• Mid-to-Large DBMS: Microsoft SQL

Server, Oracle, DB2, Informix, IMS

15

Page 16: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

C. Data Warehouses1. Business problem:

• Difficult for larger organizations to analyze organizational data from multiple sources

• Solution: Data warehouse

2. Gather/integrate information from existing operational databases into a “warehouse”

• Create “Business Intelligence” system

• See next figure 16

Page 17: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Create a Data Warehouse from Operational Databases

From Haag, et al., MIS for the Information Age, 2004

17

Page 18: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

C. Data Warehouses (cont.)3. Data warehouse features

• Designed to support business decision making• Not transactions!• Supports OLAP

– On-line Analytical Processing• Crosses functional boundaries of an organization• Can be very large • Note: Warehouse is “read only”

• Why?• Can be a significant strategic resource for a company

Can yield a high ROI

4. Examples• ??? 18

Page 19: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

C. Data Warehouses (cont.)

5. Implementation issues• People may be reluctant to

share information• “ETL” process is not easy

• Extraction, transformation, load• Expensive

19

Page 20: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

D. Data Mining1. Provides a means to extract patterns and

relationships from large amount of data (e.g., a data warehouse)

2. Mining analogy – Sift through raw dirt/rock to find something of

value– Large volumes of data are sifted in an attempt

to find something worthwhile

3. Example: market basket analysis– Identify products that may be attractive to a

customer• See next slide: Amazon.com buyer suggestions

20

Page 21: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Data Mining: Example of pattern discovered via mining

Page 22: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

D. Data mining (cont.)4. Identify previously unknown patterns

– e.g., What are characteristics of customers likely to default on a bank loan?“Target knows before it shows”

– How Target Figured Out A Teen Girl Was Pregnant Before Her Father Did

– How Companies Learn Your Secrets: NYTimes

– e.g., Suppose you discovered that beer and diapers*were often found in the same purchase?• “Market basket analysis”• What could you do with that information to improve sales of one, the other or both?

*This is a common example, not an actual case.

22

Page 23: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

E. Data Marts5. Data marts

• Warehouses can be overwhelming/difficult to implement …

Some organizations create “data marts”

• A subset of a data warehouse• Simpler, scaled-down version• Focuses on/Integrates a specific

area (e.g., Sales department)• Provides useful decision making

tools

Haggen photo from: www.callhugh.com/ ferndale.php MiniMart photo from: http://www.ae.gatech.edu/research/controls/pictures/f020801_gtar/Mini%20Mart.JPG

23

Page 24: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Data Marts: Subsets of Data Warehouse

24

Page 25: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Data Mining – Business Intelligence

• A few videos to watch and think about …• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu

re=pyv&ad=4782573666&kw=CRM• http://www.youtube.com/user/ibm?#p/c/13/fFdITHMuy2w• http://www.youtube.com/user/SASsoftware?v=2677nWVNg9M&feat

ure=pyv&ad=4782551166&kw=business%20analytics• http://www.youtube.com/watch?v=El_lSd6G5WU• http://www.youtube.com/watch?v=uP89kaDU40c• http://www.youtube.com/user/SASsoftware?v=C14GVhNt7Do&featu

re=pyv&ad=4782573666&kw=CRM#p/u/35/ecqk0JUKvAI

25

Page 26: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Big Data• Big data[1][2] is a collection

of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. (Wikipedia)

• (Image)

26

Page 27: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Global Big Data: + 2.5 exobytes/day

• The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s[15] 

• As of 2012, every day 2.5 quintillion (2.5×1018) bytes of data were created.[16] 

• (Wikipedia)• (Image)

27

Page 28: Data Management  & Data Warehouses

PPT Slides by Dr. Craig Tyran & Kraig Pencil

Big Data• The next frontier in data?• http://www.eweek.com/c/a/Data-Storage/Big-Data-Analytics-Is

-Just-Starting-to-Reach-Its-Potential-10-Reasons-Why-457684/?kc=EWKNLEAU07102012STR1

• Some terms:– Hadoop (distributed file

organization)– Distributed databases and

server clusters– Cassandra (No only SQL

DBMS)– MapReduce (breaking

computation into smaller pieced, then combining the results of each computation) 281993 2000 2007 2014

-

500,000,000,000

1,000,000,000,000

1,500,000,000,000

2,000,000,000,000

2,500,000,000,000

3,000,000,000,000

Total World Data Storage Capac-ity

(in CDs @ 730MB/CD)