22
Midterm 2: April 28th Material: Query processing and Optimization, Chapters 12 and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) Transactions, Chapter 14 Concurrency Control, Chapter 15, ignore 15.7 to 15.10 Recovery System, Chapter 16, ignore 16.8 and 16.9 Google File System LRU-K, article by O’Neils and Weikum Continuous Media, article by Ghandeharizadeh & Muntz (1 st 11 pages) COSAR-CQN

Midterm 2: April 28th Material: Query processing and Optimization, Chapters 12 and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) Transactions, Chapter

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

Midterm 2: April 28th

Material: Query processing and Optimization, Chapters 12

and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) Transactions, Chapter 14 Concurrency Control, Chapter 15, ignore 15.7 to

15.10 Recovery System, Chapter 16, ignore 16.8 and

16.9 Google File System LRU-K, article by O’Neils and Weikum Continuous Media, article by Ghandeharizadeh &

Muntz (1st 11 pages) COSAR-CQN

Enterprise Data Mangement

Shahram Ghandeharizadeh

Computer Science Department

University of Southern California

Challenge: Managing Data is Expensive

Cost of Managing Data is $100K/TB/Year: Down time is estimated at thousands of dollars per minute. Loss of data results in lost productivity:

20 Megabytes of accounting data requires 21 days and costs $19K to reproduce.

50% of companies that lose their data due to a disaster never re-open; 90% go out of business in 2 years!

Centralize Management of Storage

Before Data stored locally.

After Data stored across

the network at a central location.

Network

Data

Data

Centralize Management of Storage

Advantages: Many clients share

storage and data: data remains available when a client fails.

Network

Data

Centralize Management of Storage

Advantages: Many clients share

storage and data. Redundancy is

implemented in one place protecting all clients from disk failure.

Network

Centralize Management of Storage

Advantages: Many clients share

storage and data. Redundancy is

implemented in one place protecting all clients from disk failure.

Centralized backup: The administrator does not care/know how many clients are on the network sharing storage.

Network

Centralize Management of Storage

Advantages: Many clients share

storage and data. Redundancy is

implemented in one place protecting all clients from disk failure.

Centralized backup: The administrator does not care/know how many clients are on the network sharing storage.

Network

HighAvailability

DataBackup

DataSharing

Network failures What about network failures?

Two host bus adapters per server, Each server connected to a different

switch.

Centralize Management of Storage

Storage Area Network (SAN): Block level access, Write to storage is

immediate, Specialized

hardware including switches, host bus adapters, disk chassis, battery backed caches, etc.

Expensive Supports

transaction processing systems.

Network Attached Storage (NAS): File level access, Write to storage

might be delayed, Generic hardware, In-expensive, Not appropriate for

transaction processing systems.

Storage Area Network

Centralize management of storage: Storage Area

Networks (SANs),

Redundancy in data to tolerate disk failures,

Regular backup, Disaster recovery.

Concepts and Terminology

Virtualization: Available storage is represented as one

HUGE disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage, Available storage is partitioned into Logical

Unit Numbers (LUNs), A LUN is presented to one or more servers, A LUN appears as a disk drive to a server.

SAN places blocks across physical disks intelligently to balance load.

Question

Is it possible to present the same LUN to two different servers simultaneously?

Question

Is it possible to present the same LUN to two different servers simultaneously? YES!

Can two different servers read and write the files stored on the presented LUN?

Question

Is it possible to present the same LUN to two different servers simultaneously? YES!

Can two different servers read and write the files stored on the presented LUN? Yes! What are the consequences?

Concepts: Backup

Snapshot: State of a LUN at one instance in time.

Copy-on-write: A snapshot consists of the original blocks

of a LUN, Every time an application writes a block,

SAN generates a new copy for the current LUN (snapshot maintains the original),

Advantage: copy of blocks in support of backup is generated on-demand.

Copy-on-Write

Original LUN and Snapshot taken midnight Sunday morning.

1 2 3 4 5 6 7

Copy-on-Write

Original LUN and Snapshot taken midnight Sunday morning.

Write block 5 changes the current LUN to:

As blocks are written, the physical blocks of the snapshot materialize.

1 2 3 4 6 7Old5

5

Hot Standby

An in-expensive server that is maintained on the side to assume responsibility for a failed server.

Goal: Minimize downtime.

Summary

SAN and NAS are shared-disk architecture,

SAN is appropriate for transaction processing systems,

Hardware alone is not a substitute for a parallel, high performance transaction processing system, e.g., Teradata, Oracle RAC, etc.