Data Storage CPTE 433 John Beckett. The Paradox “If I can go to a computer store and buy 1000...

Data Storage

CPTE 433John Beckett

The Paradox

• “If I can go to a computer store and buy 1000 gigabytes for $50, why does it cost more in your server farm?”

• It isn’t about having storage• It’s about managing data through its

life-cycle• The new measurement is price per

gigabyte-month

Definitions

• Spindle, platters, heads– Physical arrangement of disk– Of little interest to us, except to help us

understand how new technologies will impact us

• Drive controller– On the hard drive itself– Connected to…

• Host Bus Adapter

Raid Level

Methods Characteristics

0 Stripe data acrossmultiple drives

Faster reads and writes; poor reliability

1 Mirrors copy of dataacross two drives

Faster reads; good reliability; failures tend to be catastrophic (JB & SA)

5 Distributed parityAny single disk may fail without loss

Faster reads; slower writes; more economical

10 Mirrored stripesRaid 0 group mirrored onto another group

Faster reads; best reliability; most expensive

Table 25.1

Dilemma: Can you add hardware without subtracting from reliability? (Only by using very high-quality hardware)

Where Is the Data?

• DAS – Directly Attached Storage (IBM: DASD), connected directly to the server– May be a RAID array

• NAS – Network-Attached storage– Uses a protocol to transfer data

• SAN – Storage-Area Network– Separate network segment for storage,

connecting servers and drives

A SAN is usually made out of NAS devices

Structure of a SAN

SANCtrlr

(Server)

SAN backbone

NASNAS

SANCtrlr

(Server)

Managing Storage

• Think of storage as a community resource– If it’s personal, does it have any

business on company equipment?• Determine storage needs of the

group• Identify an architecture that will

satisfy that need• Plan an upgrade path for growth in

the future• Implement inventory and spares

policy

Standardization

• Disk drives are as important to standardize as any other component– Spares issue– Warrantee service procedure– Ability to use obsoleted drives

• Drive lifetime issue:– A drive motor may become unreliable

after so many revolutions

The Storage SLA

• Availability• Response time

• Reliability is increased by RAID > 0– …only if monitored and maintained– …only if RAID method is preserved

• Network is a part of the reliability picture

Backup and RAID

• RAID is not a backup strategy• If >n drives fail, you lose data• Controller failure can cause data loss

• One possibility: RAID mirror as a backup– Requires disconnecting other drive on failure

• How about: Spare drive, auto backup each night– Maybe including incremental backups

Using RAID mirror to effectively speed-up backup

• Break the RAID pair• Back up• Re-connect the RAID pair

Monitoring

• How full– Rate of change

• Broken drives• How busy (especially network on

NAS)• Unused

SAN Caveats

• Benchmarks are problematical• Useful versus physical storage size• Product life-cycle issues

Pipeline Optimization

• Read – buffered and available immediately

• Write – buffered and done at leisure– Dangerous if drive fails before update is

posted

• Early versions of an OS usually don’t sync properly if shut down during “quiet” time– Novell – unscheduled shutdown could be

catastrophic– Windows – learned some lessons from

others

• Is it safe to turn off power during operation?– A mainframe will be able to handle this

Performance

• Locate simultaneously-used data on different spindles to minimize head thrashing– The more complex your data, the harder

this is to do– Restrict this technique to very heavily-

used data• Beware of compression

– Assumes your data is organized a certain way

– Assumes your CPU has spare time to spend

Disk Access Density

• I/O Operations per second per gigabyte of capacity

• How fast can you move the entire drive of data?

Fragmentation

• Don’t fill up your drives! • That makes defragging slow• Also slows online attempts at limiting

fragments

Continuous Data Protection

• Send a log of all changes somewhere other than your disk drive– Tape– Over the network to another location– Another disk drive

• Back-out and forward recovery

Data Storage CPTE 433 John Beckett. The Paradox “If I can go to a computer store and buy 1000...

Documents

Web Services CPTE 433 John Beckett. Players Server – provides resources in terms of “pages” Client – –Browser on a PC –Browser on a smaller device –Current

Chapter 7 - Networks CPTE 433 John Beckett. OSI Model 7 Application – SNMP, HTTP, FTP, etc. 6 Presentation – Data formats, encoding, encryption 5 Session

Organizational Structures CPTE 433 John Beckett. Sizing Base on business need Challenges: –Demonstrating need –Fractional people –Intra-unit communication

Beckett samuel-three-dialogues-samuel-beckett-and-georges-duthuit

Backup and Restore CPTE 433 John Beckett. Why Back Up? So you can restore later! SLA Restore Policy Backup Policy Backup Schedule

Email Service CPTE 433 John Beckett. The Fundamentals Reliable Scalable –Issue is speed Flexible –Clients, locations Growing issue: Spam control Growing

Web Access Chain of Events CPTE 212 John Beckett

Ethics CPTE 433 John Beckett. Ethics & Morals Morals tell us what is right and good. –Religious people believe morals come from God –SAs often say something

A Guide for Technical Managers CPTE 433 John Beckett

Samuel Beckett

Servers CPTE 433 John Beckett. Server Hardware Extensible More CPU performance High-performance I/O Upgrade options Rack mountable No side-access needs

Troubleshooting Tools & Tactics CPTE 433 – Chapter 15 John Beckett

Beckett Samuel Three Dialogues Samuel Beckett and Georges Duthuit

CPTE 209 Software Engineering Summary and Review

Beckett. Detritus

Beckett Vogue

Software Depot Service CPTE 433 John Beckett. What? A centralized source for software in your organization. Managed by the SA group. Provides supported

Beckett Online Training | Series 3: Beckett Controls

Beckett College

1 SAMUEL BECKETT AND MATHEMATICS Beckett studied