View
216
Download
0
Tags:
Embed Size (px)
Citation preview
Midterm 2: April 28th
Material: Query processing and Optimization, Chapters 12
and 13 (ignore 12.5.5, 12.7, 13.4.4 and 13.5) Transactions, Chapter 14 Concurrency Control, Chapter 15, ignore 15.7 to
15.10 Recovery System, Chapter 16, ignore 16.8 and
16.9 Google File System LRU-K, article by O’Neils and Weikum Continuous Media, article by Ghandeharizadeh &
Muntz (1st 11 pages) COSAR-CQN
Enterprise Data Mangement
Shahram Ghandeharizadeh
Computer Science Department
University of Southern California
Challenge: Managing Data is Expensive
Cost of Managing Data is $100K/TB/Year: Down time is estimated at thousands of dollars per minute. Loss of data results in lost productivity:
20 Megabytes of accounting data requires 21 days and costs $19K to reproduce.
50% of companies that lose their data due to a disaster never re-open; 90% go out of business in 2 years!
Centralize Management of Storage
Before Data stored locally.
After Data stored across
the network at a central location.
Network
Data
Data
Centralize Management of Storage
Advantages: Many clients share
storage and data: data remains available when a client fails.
Network
Data
Centralize Management of Storage
Advantages: Many clients share
storage and data. Redundancy is
implemented in one place protecting all clients from disk failure.
Network
Centralize Management of Storage
Advantages: Many clients share
storage and data. Redundancy is
implemented in one place protecting all clients from disk failure.
Centralized backup: The administrator does not care/know how many clients are on the network sharing storage.
Network
Centralize Management of Storage
Advantages: Many clients share
storage and data. Redundancy is
implemented in one place protecting all clients from disk failure.
Centralized backup: The administrator does not care/know how many clients are on the network sharing storage.
Network
HighAvailability
DataBackup
DataSharing
Network failures What about network failures?
Two host bus adapters per server, Each server connected to a different
switch.
Centralize Management of Storage
Storage Area Network (SAN): Block level access, Write to storage is
immediate, Specialized
hardware including switches, host bus adapters, disk chassis, battery backed caches, etc.
Expensive Supports
transaction processing systems.
Network Attached Storage (NAS): File level access, Write to storage
might be delayed, Generic hardware, In-expensive, Not appropriate for
transaction processing systems.
Storage Area Network
Centralize management of storage: Storage Area
Networks (SANs),
Redundancy in data to tolerate disk failures,
Regular backup, Disaster recovery.
Concepts and Terminology
Virtualization: Available storage is represented as one
HUGE disk drive, e.g., a SAN with a thousand 1.5 TB disk provides 1 Petabyte of storage, Available storage is partitioned into Logical
Unit Numbers (LUNs), A LUN is presented to one or more servers, A LUN appears as a disk drive to a server.
SAN places blocks across physical disks intelligently to balance load.
Question
Is it possible to present the same LUN to two different servers simultaneously? YES!
Can two different servers read and write the files stored on the presented LUN?
Question
Is it possible to present the same LUN to two different servers simultaneously? YES!
Can two different servers read and write the files stored on the presented LUN? Yes! What are the consequences?
Concepts: Backup
Snapshot: State of a LUN at one instance in time.
Copy-on-write: A snapshot consists of the original blocks
of a LUN, Every time an application writes a block,
SAN generates a new copy for the current LUN (snapshot maintains the original),
Advantage: copy of blocks in support of backup is generated on-demand.
Copy-on-Write
Original LUN and Snapshot taken midnight Sunday morning.
Write block 5 changes the current LUN to:
As blocks are written, the physical blocks of the snapshot materialize.
1 2 3 4 6 7Old5
5
Hot Standby
An in-expensive server that is maintained on the side to assume responsibility for a failed server.
Goal: Minimize downtime.