37
1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by Moshell et al . Imagery is fromWikimedia except where marked with *.

1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

Embed Size (px)

Citation preview

Page 1: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

1

Information Management

DIG 3563 – Lecture 17File Structures

and

Cloud Computing

J. Michael Moshell

University of Central Florida

Original image* by Moshell et al .

Imagery is fromWikimedia except where marked with *.

Page 2: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-2 -

File System Organization

recovermyfiles.com

* Disks have sectors; each sector

has an address (integer)

* A file is a collection of sectors. They can

be contiguous or fragmented.

* To find the sectors comprising a file,

we need a directory.

* The directory system records which

sectors belong to each file.

* The Operating System has software

to manage directories & files.

planetoftunes.com

Page 3: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-3 -

Formatting a Disk

* factory (low level) format:

- timing tracks, etc.

"marks in the parking lot"

- usually not re-doable

* local reformatting:

* Check for read/write errors

* Mark good sectors and bad ones

* Create a list of available sectors

* Set up file structure:

- directory

- boot sector (for bootable drives)

stripespls.com

Page 4: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-4 -

File System Organization

recovermyfiles.com

* A simple (conceptual) architecture:

Directory:

* at sectors 22010, 22021 we have records:

So the file is a linked list (like a treasure

hunt) through the disk's sectors.

(Not all disks are organized this way.) planetoftunes.com

Dirnum Filename Filesize Headsector       

1 addresses.doc 144300 220102 employees.doc 99800 335003 payroll.xls 17100 334824 etc    

block dirnum nextblock data ...22010 1 22021 Adams, John \t 222 West ...22021 22040 Wilson, Steve \t 333 East ...

Page 5: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-5 -

File System Errors

* Disk drive hardware checks parity when reading sectors

* If a parity error occurs, data may have been lost

* Usually this just reports a failure to the OS and you're stuck.

However – the actual disk drive hardware can probably still

read the data; it just doesn't LIKE it.

So, specialized software can sometimes get this "bad checksum" data and display it ... we discuss this shortly.

Page 6: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-6 -

File System Organization

recovermyfiles.com

* Deleting a file:

The OS keeps an available sector list

of sectors that can be reused.

To DELETE a file, the system just

changes its first and last links. (Think of out-of-service boxcars).

The data is not gone, it's just unlinked.

It will be overwritten, when (and if)

the OS needs more space.

tdc.ca

Page 7: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-7 -

Losing and Recovering Datarecovermyfiles.com

Now what if the directory or a sector gets

screwed up?

a) software error: erase the pointer or link to

a file.

or

b) hardware error: part of directory or sector gets corrupted

The data is still out there, but OS can't find it.

If you can directly READ THE SECTORS, you will find

broken strands of spaghetti ... with clues in 'em.

restaurantwidow.com

Page 8: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-8 -

Recovering Data

What clues exist?

Links (obviously) if it's a linked system

Try to reconstruct the files, or fragments of them

Directory item numbers, if these exist

Try to "work backwards" and reconstruct the directory

The data itself (e. g. search for "Adams")

Use syntactic knowledge to match up partial sentences

in blocks. Which block might match that one?

.. and we re nguins live in Antarc...

spect the opinions of...

492.7 \t 333.9e14 ...

Page 9: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-9 -

Recovering Data

If you have 'bad sectors' (i. e. bad checksums)

Read the data and override the parity error messages

Humans are normally required to look at the data and piece it

back together.

Success is not guaranteed.

Formatting a drive writes 0 in all the sectors. SOME claim they

can recover what was there before (maybe NSA can?)

But it is not a high-percentage bet.

Page 10: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-10 -

Forensics: Finding Hidden Stuff

* simplest cases: just "erased" your files?

- straightforward disk recovery may work.

* the famous photocopier story.

- copiers have hard drives and remember what was copied.

http://www.cbsnews.com/stories/2010/04/19/eveningnews/main6412439.shtml

* RAMsticks are just like hard drives; "delete" does not empty.

(Nonvolatile RAM versus volatile RAM.

Why isn't it ALL nonvolatile?)

macforensiclabs.com

Page 11: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-11 -

Forensics: Finding Hidden Stuff

* virtual memory: copies part of your RAM

into hard drive on computer.

* those images may include print queues and other information

that can be recovered.

* backup systems may not have been reformatted even if the main

hard drive was reformatted.

* offsite backup probably was NOT reformatted; old sectors may

have copies of data you wanted to make disappear.

macforensiclabs.com

Page 12: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-12 -

File Structures: Summary

* vocabulary terms throughout lecture

* backup/archive/redundant storage

* criteria for choice of offsite backup

* understand and explain disk organization

* understand how disk errors occur

* analyze what data could be recovered from a particular accident

* discuss forensic issues concerning disk data erasure and recovery

motifake.com

Page 13: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-13 -

Cloud Computing and

Digital Asset Management

• First let's look at the Cloud

- Where did it come from?

- What is it?

- How can it help me?

- What new skills will I need to use it?

- What effect does Cloud have on DAM?

Page 14: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-14 -

As of the Year 2000 ...

• Most Internet Service Providers sold ( ... rented ...)• dedicated hosting

One website: delivered by 1 computermystore.com

• shared virtual hosting yourstore.com

N websites each got 1/Nth computerhistore.com

herstore.com

Page 15: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

Built giant 'ad hoc'

systems with

thousands of CPUs

and petabytes of

storage.

-15 -

phaseoneenterprises.com

And a few giants (Yahoo, Google,

Amazon)

Page 16: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

And a few giants (Yahoo, Google,

Amazon)

Built giant 'ad hoc'

systems with

thousands of CPUs

and petabytes of

storage.

Amazon noticed ...

less than 10% of their capacity was being used

most of the time. -16 -

phaseoneenterprises.com

Page 17: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-17 -

en.wikimedia.org

... and in 2006 launched

Amazon Web Services The 'utility model': power plants

have capacity to meet

AVERAGE demand

and so can

deliver UNLIMITED*

power to some customers

when needed.

(*"Unlimited" as long as << total capacity)

Page 18: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

Astronomers worldwide

now schedule time on

big telescopes

through the Internet

and don't have to go to a cold mountaintop

and stay up all night

to capture imagery.

-18 -

as.utexas.edu

The Shared Telescope Model

Page 19: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

NASA released NEBULA in 2008,

to share research computers

instead of building additional

data centers.

NEBULA is an open source cloud management

system.

-19 -

The Shared Computing Model

Page 20: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

Before PCs, we

programmed on punch-cards

-20 -

as.utexas.edu

... resembles the old Mainframe

Timeshare model

Page 21: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

Before PCs, we

programmed on punch-cards

and thought it was a

great INNOVATION

when time-sharing

became possible.

-21 -

as.utexas.edu

... resembles the old Mainframe

Timeshare model

Page 22: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

In 1965 this was SCARCE

and we were NUMEROUS

(relatively)

(Skilled specialists who wanted to use computers) -22 -

as.utexas.edu

But with one fundamental difference:

redlinecs.com.au

Page 23: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

In 2012 this is ABUNDANT

and we are

EVERYONE

-23 -

allthingsdistributed.com

But with one fundamental difference:

reuters.com

Page 24: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

... may reduce your company's IT costs

* software is expensive – so RENT it

* hardware is expensive to update – so RENT it

* buildings are expensive – so share them

* land is expensive – build in rural areas

-24 -

... relies on fast, reliable networks

Page 25: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

1. Agility through dynamic provisioning

- Order up "supercomputer for an hour"

2. API Accessibility

- Your program can specify the needed QOS*

QOS: Quality of Service:

- Maximum guaranteed latency (e. g. <1ms)

- Minimum guaranteed CPU (e. g. >1 petaflop) -25 -

Key Cloud Concepts:

Page 26: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

"Floating Point Operations" like x=239.44*456.3733

per second

Math models (physics, stock market, statistics)

may need tera = billion*billion of flops

giga = 109

tera = 1012

peta = 1015

exa = 1018

-26 -

What's a flop?

Page 27: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

1. Agility through dynamic provisioning

- Order up "supercomputer for an hour"

2. API Accessibility

- Your program can specify the needed QOS*

3. Virtualization

- You "THINK" you have your own machine

- Protection models don't need to be reinvented

http://www.vmware.com/virtualization/ -27 -

Key Cloud Concepts:

Page 28: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

SECURITY.

(I know this guy)

http://www.acsac.org/2012/workshops/ccw/

One solution (for larger firms): Build your own Cloud.

http://www.enterprisenetworkingplanet.com/ebooks/50950510/95900/4190310/ -28 -

One Key Cloud Concern:

Page 29: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

bigbird.com

cookie.com

elmo.com

kermit.com

piggie.com

-29 -

Quickly, web-hosts realized that they

could virtualize their service

Page 30: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-30 -

Software as a Service (SaaS)

The 800 pound anthropoid:

Salesforce.com

http://www.salesforce.com

sales cloud (CRM systems)

force.com – build your own

pin.primate.wisc.edu

Page 31: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-31 -

Digital Asset Management

in the Cloud

pin.primate.wisc.edu

1. Simple: Dropbox

2. Specialized for software: Github

3. Rich metadata -> DAM (e. g. AlienBrain)

Media Valet - http://www.mediavalet.co/home.aspx

Widen

Fordela

Page 32: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-32 -

Digital Asset Management

in the Cloud

pin.primate.wisc.edu

1. Simple: Dropbox

2. Specialized for software: Github

3. Rich metadata -> DAM (e. g. AlienBrain)

Media Valet - http://www.mediavalet.co/home.aspx

"CMIS compliant?"

Page 33: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-33 -

Content Management

Interoperability Standard

http://en.wikipedia.org/wiki/Content_Management_Interoperability_Services

CMIS is an open standard that defines how DAM

systems can manage metadata ("generic properties")

for files and folders.

Adobe, HP, IBM, Microsoft, Oracle + + +

Page 34: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-34 -

Digital Asset Management

in the Cloud

pin.primate.wisc.edu

1. Simple: Dropbox

2. Specialized for software: Github

3. Rich metadata -> DAM (e. g. AlienBrain)

Media ValetWiden - http://www.widen.com/

Fordela

Page 35: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-35 -

Digital Asset Management

in the Cloud

pin.primate.wisc.edu

1. Simple: Dropbox

2. Specialized for software: Github

3. Rich metadata -> DAM (e. g. AlienBrain)

Media ValetWiden

Fordela http://www.fordela.com/ - VIDEO focus

(started by LucasArts veterans)

Page 36: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-36 -

Choosing a DAM System

pin.primate.wisc.edu

Here's a logically organized Buyer's Guide

http://www.datamation.com/storage/digital-asset-management-buying-guide-1.html

Page 37: 1 Information Management DIG 3563 – Lecture 17 File Structures and Cloud Computing J. Michael Moshell University of Central Florida Original image* by

-37 -

Choosing a DAM System

pin.primate.wisc.edu

Here's a logically organized Buyer's Guide

http://www.datamation.com/storage/digital-asset-management-buying-guide-1.html

End of lecture ... End of lectureS.

When we return ... Project Show-and-tell!