67
Penn ESE250 S'12 -- Kod & DeHon 1 ESE250: Digital Audio Basics Week 10: March 22, 2012 File System

Penn ESE250 S'12 -- Kod & DeHon 1 ESE250: Digital Audio Basics Week 10: March 22, 2012 File System

Embed Size (px)

Citation preview

1Penn ESE250 S'12 -- Kod & DeHon

ESE250:Digital Audio Basics

Week 10: March 22, 2012

File System

2Penn ESE250 S'12 -- Kod & DeHon

Review

• Everything reduces to bits– Songs digitized and encoded– Machine code bit encoding for machine

instructions– Ebook text, Homework PDF, movies, …

• Memories store bits– Non-volatile memories store them

persistently (when the power goes off)

3Penn ESE250 S'12 -- Kod & DeHon

Persistent Storage Questions

• How do we save data across boots – When the computer is off

• How do we save data to move between machines?

• How do we organize our data so we can find it again?– Tell others to find it?

4Penn ESE250 S'12 -- Kod & DeHon

Strawman #1

• Guess a random address and write data there• Write down addresses on paper• To playback song

– Run program located at address 42000– On data located at address 81736

• How do I know if I can use the block at address 90,000?

• Why might this be problematic?

5

Course Map

Numbers correspond to course weeks

2,5 6

11

13

12

Today: file system

Penn ESE250 S'12 -- Kod & DeHon

6Penn ESE250 S'12 -- Kod & DeHon

Outline

• What does technology give us?• Requirements?• Interlude• File System Sketch

7

Technology

Penn ESE250 S'12 -- Kod & DeHon

8Penn ESE250 S'12 -- Kod & DeHon

Hard Disk

• Disc with magnetic material on its surface

9

Hard Disk

• Disc with magnetic material on its surface

• Divided into tracks (circles)

• Modern disks – 300,000 TPI– TPI = Tracks Per

InchPenn ESE250 S'12 -- Kod & DeHon

10

Hard Disk

• Disc with magnetic material on its surface

• Divided into bit regions

• Modern disks– 1.5M BPI– BPI= bits per inch

Penn ESE250 S'12 -- Kod & DeHon

11

Hard Disk

• Each bit located at a position (R,)• R = select track• = select bit from track• Disc spins

– Traces through Q

Penn ESE250 S'12 -- Kod & DeHon

12Penn ESE250 S'12 -- Kod & DeHon

Hard Disk

• Each bit located at a position (R,)• Add arm to move head

13Penn ESE250 S'12 -- Kod & DeHon

Hard Disk• Each bit located

at a position (R,)• Head arm moves

– Varies R

14

Disk Bandwidth

• Typical Disk speed?– 15,000 RPM– One rotation every

• 60s/15,000=4ms

• At R=1 inch and 1.5M BPI, How many bits/second?– 2 p 1 in 1.5MB/in / 4ms– 9Mbits/4ms = 2.25Gb/s

• ≈ 280 MB/s

Penn ESE250 S'12 -- Kod & DeHon

15Penn ESE250 S'12 -- Kod & DeHon

Disk Speed

• Move head in R?–Also a few

milliseconds• Typical Data

access: ~10ms– E.g. 4ms rotate

+6ms seek

16Penn ESE250 S'12 -- Kod & DeHon

Throughput and Implications

• Disk throughput faster than access time– 10ms latency– 280MB/s throughput (~1B/4ns)

• What does this drive us to?– 10ms seek Random byte access 100B/s– Sequential access 280MB/s – Want to exploit sequential access!

• Read blocks of data

17Penn ESE250 S'12 -- Kod & DeHon

Read Data Blocks

• How many sequential bytes can read in 1ms? 280MB/s 0.001 s = 280KB

Can read 280KB in the same time as 1Byte 6ms seek, 4ms rotation, 1ms data read

18Penn ESE250 S'12 -- Kod & DeHon

Seagate 2.5” Disk Drive

http://www.seagate.com/docs/pdf/datasheet/disc/ds_momentus_5400_psd.pdf

5400 RPM

19

FLASH Memory• Exploit tunneling• Use high-voltage to reduce barrier

– Tunnel charge onto floating node– Charge trapped on node

• Use field from floating node to modulate conduction

http://commons.wikimedia.org/wiki/File:Flash-Programming.png

Week 8

Penn ESE250 S'12 -- Kod & DeHon

20Penn ESE250 S'12 -- Kod & DeHon

Flash

• NOR -- Read like other memories• NAND – Sequential read within “page”

– Denser than NOR

• Can only “erase” in blocks– 4KB, 64KB256KB

• Once erased can write byte (page) at a time– Write time variable– Typically need feedback to sense when written

bitline

bselect

gndselect

w0

w1

w2

w3

w0

w1

w2

w3

bitline

21Penn ESE250 S'12 -- Kod & DeHon

Samsung 256Mx8 NAND Flash

http://www.datasheetcatalog.com/datasheets_pdf/K/9/E/2/K9E2G08U0M.shtml

22Penn ESE250 S'12 -- Kod & DeHon

Intel Solid-State Drive (SSD)

SSDhttp://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-datasheet.pdf

35,000/s x 4KB = 140MB/s

23Penn ESE250 S'12 -- Kod & DeHon

Requirements

24Penn ESE250 S'12 -- Kod & DeHon

File

• File – sequence of bits that go together– MP3 encoding– Executable for mp3player– Picture in JPEG– PDF for your lab writeup

• How big is a file?

25Penn ESE250 S'12 -- Kod & DeHon

File• File – sequence of bits that go together

– Like an object• A base address• Length or extent• Generally has a type…but only used weakly

– Mp3, WAV, x86 executable, …

– On unix/linux• an array of unsigned char• with a length • Magic number tries to convey type

– Inband, in the file (first word?)

26Penn ESE250 S'12 -- Kod & DeHon

Create a File?

• What do I need to do to create/store a new file?– Allocate/reserve space for it– Give it a name?– Make a record somewhere of mapping

between name and location

27Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

free_address

28Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

• When create file, give it space at free_address– Increment free_address by length

free_address

29Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

• When create file, give it space at free_address– Increment free_address by length

• Store name & address in table– Maybe put table at high addresses

free_address

File1: 0

30Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

• When create file, give it space at free_address– Increment free_address by length

• Store name & address in table– Maybe put table at high addresses

free_address

File1: 0File2: 250

31Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

• When create file, give it space at free_address– Increment free_address by length

• Store name & address in table– Maybe put table at high addresses

• When free_address+len>=table_base– Device is full

free_address

File1: 0File2: 250

32Penn ESE250 S'12 -- Kod & DeHon

Strawman #2

• Keep track of next free address – free_address – initialized to 0

• When create file, give it space at free_address– Increment free_address by length

• Store name & address in table– Maybe put table at high addresses

• When free_address+len>=table_base– Device is full

free_addressFile1: 0File2: 250File3: 350File4: 900File5: 1100

33Penn ESE250 S'12 -- Kod & DeHon

Evaluating Strawman #2

• Good+ Accommodates variable length files+ Allows contiguous access

• Bad– What happens when

delete a file?• How reuse space?

– Add data to files?– Table gets big– All filenames have to be

unique?• Demands coordination

between

– users?– Programs?– Programs from

different vendors?

34Penn ESE250 S'12 -- Kod & DeHon

Files Grow and Shrink

• Essay/homework gets longer as write it – Don’t know how long it will be when start

• Database of checks written grows• TODO before end-of-term list shrinks?• Where put additional space?

– Allocate new space at end of disk?

35Penn ESE250 S'12 -- Kod & DeHon

Delete Files

• Don’t need lame .o files once build executable

• Replace false start• That was a bad picture of me

– Now have better• Don’t want anyone to see my secret

plans to take over the world• …want the space back because drive

filling up

36Penn ESE250 S'12 -- Kod & DeHon

Repurposing Space

• How reclaim space?• With single free_address pointer

– can’t keep track of all the places where there is space

• What can do?– Keep a list of free regions– Try to find a region where will fit

37Penn ESE250 S'12 -- Kod & DeHon

Finding Contiguous Space

• What if our disk looks like this

• …and we want to allocate a large file?• Disk has capacity

– But cannot allocate because not contiguous

38Penn ESE250 S'12 -- Kod & DeHon

Bad Sectors

• Portions of a disk head may be bad– At manufacture time– Go bad during use

• Portions of Flash RAM may be bad– Manufacturing defects– Limited number of write cycles

• Also inhibits contiguous allocation

39Penn ESE250 S'12 -- Kod & DeHon

Naming Conflicts

• How solve naming conflicts?– Provide separate contexts

• E.g. separate space of names – for each user– for each program

– Typically with a directory structure

/home /andre /lab9 /solutions.pdf /lab10 /solutions.pdf /bgojman /lab9 /solutions.pdf /lab10 /solutions.pdf

40Penn ESE250 S'12 -- Kod & DeHon

Directory

• Special file that contains name to location mappings

• Once a file, we can easily allow hierarchy – Directories can contain directories

File1: 0File2: 250File3: 350File4: 900File5: 1100

41Penn ESE250 S'12 -- Kod & DeHon

Requirement Roundup• Find things easily and quickly

– Minimize what we need to look at to find data• Portable (is the sole state holder)

– Self describing• Fast read

– Attempt to layout contiguous files• Fast write

– Not take too long to find space for file• Support deletion (repurpose capacity)• Use (most) of capacity

– Allow files to be non-contiguous• Tolerate errors in media

– Don’t depend on contiguous blocks to be good• Isolate/differentiate who can access what

Challenge: both asymptotics and constants (e.g. 280MB/s vs. 10ms random access) matter.

42Penn ESE250 S'12 -- Kod & DeHon

Interlude

43Penn ESE250 S'12 -- Kod & DeHon

• Jurassic Park Unix-navigation-sequence• From: “you can’t hold it by yourself”

– (around 1:52 into movie)• To: “…security systems; you name it we got it.”

– (around 1:54) – less than 2 minute total– YouTube clip that is a bit shorter:

http://www.youtube.com/watch?v=dFUlAQZB9Ng– One that is a bit longer (superset of intent):

http://www.dailymotion.com/video/x4tbis_jurassic-park-unix-system-scene_tech

44Penn ESE250 S'12 -- Kod & DeHon

Disk Data Security

• How is security enforced?– OS demands credentials for login– User doesn’t get direct access to hardware– OS intermediates

45Penn ESE250 S'12 -- Kod & DeHon

Physical Disk Access

• What happens if the disk is removed from the physical machine?– Plugged into another machine that

• Someone else has administrator access on?• Doesn’t respect the users/isolation?

46Penn ESE250 S'12 -- Kod & DeHon

Common News Item

Computer hard drive sold on eBay 'had details of top secret U.S. missile defence system'

• By Daily Mail Reporter

Last updated at 11:08 AM on 07th May 2009•

Highly sensitive details of a US military missile air defence system were found on a second-hand hard drive bought on eBay.

• The test launch procedures were found on a hard disk for the THAAD (Terminal High Altitude Area Defence) ground to air missile defence system, used to shoot down Scud missiles in Iraq.

• The disk also contained security policies, blueprints of facilities and personal information on employees including social security numbers, belonging to technology company Lockheed Martin - who designed and built the system.

•Read more: http://www.dailymail.co.uk/news/article-1178239/Computer-hard-drive-sold-eBay-details-secret-U-S-missile-defence-system.html#ixzz0Wxa60PT9

47Penn ESE250 S'12 -- Kod & DeHon

…all too common

VA Update on Missing Hard Drive in Birmingham, Ala.• February 10, 2007 Printable Version

• Investigation Yielding Additional Information • WASHINGTON -- The Department of Veterans Affairs (VA) today issued an update on the

information potentially contained on a missing government-owned, portable hard drive used by a VA employee at a Department facility in Birmingham, Ala.

• “Our investigation into this incident continues, but I believe it is important to provide the public additional details as quickly as we can,” said Jim Nicholson, Secretary of Veterans Affairs.  “I am concerned and will remain so until we have notified those potentially affected and get to the bottom of what happened.

• “VA will continue working around the clock to determine every possible detail we can,” Nicholson said.  

• VA and VA’s Office of Inspector General have learned that data files the employee was working with may have included sensitive VA-related information on approximately 535,000 individuals.  The investigation has also determined that information on approximately 1.3 million non-VA physicians – both living and deceased – could have been stored on the missing hard drive.  It is believed though, that most of the physician information is readily available to the public.  Some of the files, however, may contain sensitive information.

48Penn ESE250 S'12 -- Kod & DeHon

Still Happening

Probe Targets Archives’ Handling of Data on 70 Million Vets• By Ryan Singel October 1, 2009  • The inspector general of the National Archives and Records Administration is

investigating a potential data breach affecting tens of millions of records about U.S. military veterans, Wired.com has learned. The issue involves a defective hard drive the agency sent back to its vendor for repair and recycling without first destroying the data. ....

• The incident was reported to NARA’s inspector general by Hank Bellomy, a NARA IT manager, who charges that the move put 70 million veterans at risk of identity theft, and that NARA’s practice of returning hard drives unsanitized was symptomatic of an irresponsible security mindset unbecoming to America’s record-keeping agency.

• “This is the single largest release of personally identifiable information by the government ever,” Bellomy told Wired.com. “When the USDA did the same thing, they provided credit monitoring for all their employees. We leaked 70 million records, and no one has heard a word of it.”

http://www.wired.com/threatlevel/2009/10/probe-targets-archives-handling-of-data-on-70-million-vets/

49Penn ESE250 S'12 -- Kod & DeHon

Caveats• On standard unix/windows setups

– Without the OS to providing protection, all the data is accessible• Sometimes good for recovery

– On standard unix/windows setups• Rm/del doesn’t make the data go away

– Also sometimes useful for recovery

– Even format not guarantee data overwritten

• See: Remembrance of Data Passed: A Study of Disk Sanitization Practices– IEEE Security and Privacy, v1n1p17—27 (linked from today’s reading)

50Penn ESE250 S'12 -- Kod & DeHon

File System Sketch

51Penn ESE250 S'12 -- Kod & DeHon

Sketch

• Manage the disk at the level of blocks of fixed size (bnodes)

• Format disk for bnodes• File is a collection of bnodes• Directory is a kind of file• Root of system bnode in known location

52Penn ESE250 S'12 -- Kod & DeHon

bnode

• Fixed-size block of data• Minimum unit of storage allocation • bnodes map to physical addresses

– E.g. bnode 76 address 764096 = 311296 • Or bnode 76 R=1.012in, theta=32.07 degrees

• Address physical resources through bnodes

53Penn ESE250 S'12 -- Kod & DeHon

bnode Size• How big should a bnode be?

– Needs to be bigger than the block address– Problems with small blocks?

• Longer addresses• Can address smaller file size w/ fixed # address bits

– Problems with large blocks?• Minimum allocation increment• Internal fragmentation

– Typical values 4KB, 1KB, 256B• Trending toward larger these days

– Intel 4KB SSD, 64KB for some flash

54Penn ESE250 S'12 -- Kod & DeHon

Files from bnodes

• Use bnodes as file handle– How we address the file

• bnode contains metadata– Block type, File type, length

• Small file: all data in single bnode

76: obj, 3172

obj, 3172

3172 BytesFile contents

24 Bytes metadata

900 Bytes unused

4096 Byte bnode

55Penn ESE250 S'12 -- Kod & DeHon

Files from bnodes

• Large file: tree of bnodes76: 77:

78:

79:

80:

obj, 15,791

56Penn ESE250 S'12 -- Kod & DeHon

Files from bnodes

• Large file: tree of bnodes– Multi-level if necessary

76: 77:

78:

79:

80:

obj, 15,791

mp3, 12MB

102437253:

57Penn ESE250 S'12 -- Kod & DeHon

Files from bnodes

• Large file: tree of bnodes– Multi-level if necessary

• Overhead for tree structure?– 4KB pages

• About 1000-way tree• 1000KB tree needs 1001 pages• 8KB tree needs 3 pages (50% overhead)

– In practice inodes avoid this worst-case

76: 77:

78:

79:

80:

obj, 15,791

58Penn ESE250 S'12 -- Kod & DeHon

EXT2 inode

[Source: http://www.tldp.org/LDP/tlk/fs/filesystem.html]

(12 of these) indirect blocks

59Penn ESE250 S'12 -- Kod & DeHon

File Expansion with bnodes

• Expand file– Add bnodes to file

76: 77:

78:

79:

80:

obj, 15,791 76: obj, 18,357 77:

78:

79:

80:

12374:

60Penn ESE250 S'12 -- Kod & DeHon

Directory• File

– With type directory– contains name/bnode pairs

• Small– Fits in one bnode

• Large– Tree of bnodes

• Just like file

Directory, 234

Lab1, 76Lab2, 98Lab3, 1034Lab4, 267Lab5, 2053….

61Penn ESE250 S'12 -- Kod & DeHon

Free bnodes• Keep track of free and usable bnodes• Grouped by contiguous set of free blocks

• Allocation – try to find contiguous set of bnodes to satisfy file need– …and try not to breakup large contiguous block

unnecessarily• Deletion – try to reassemble free blocks

– E.g. delete 13 make 10—14 length 5 block

01234567891011121314151617181920212223

Length 1: 1, 14Length 2: 3-4, 7-8, Length 3: 10-12, 16-18Length 4: 20--23

62Penn ESE250 S'12 -- Kod & DeHon

Superblock

• For bootstrapping and file system management– Each file system has a master block in a canonical

location (first block on device)– Describes file-system type– Root bnode – Keeps track of free lists …at least the head

pointers to (bnodes, blocks)• Corruption on superblock makes file system

unreadable– Store backup copies on disk

63Penn ESE250 S'12 -- Kod & DeHon

Format disk

• Identify all non-defective bnodes– Defective blocks skipped– those addresses not assigned to bnodes

• Create free bnode data structure• Create superblock

64Penn ESE250 S'12 -- Kod & DeHon

Review Sketch

• Manage the disk at the level of bnodes• Format disk for bnodes• File is a collection of bnodes• Directory is a kind of file• Root of system bnode in known location

65Penn ESE250 S'12 -- Kod & DeHon

Requirement Review• Find things easily and quickly

– Minimize what we need to look at to find data• Directory structure

• Portable (is the sole state holder)– Self describing superblock, metadata

• Fast read– Attempt to layout contiguous files

• Fast write– Not take too long to find space for file efficient free structure

• Support deletion (repurpose capacity)– Return bnodes to free list

• Use (most) of capacity– Allow files to be non-contiguous bnodes

• Tolerate errors in media– Don’t depend on contiguous blocks to be good bnodes

• Isolate/differentiate who can access what

66Penn ESE250 S'12 -- Kod & DeHon

Learn More

• Online reading/pointers– Unix File System Tutorial– Flash, SSD, Hard drive data sheets– Data found on hard drive articles

• Courses– CIS121 – efficient data structures– CIS380 – operating systems

67Penn ESE250 S'12 -- Kod & DeHon

Big Ideas

• Persistence/Volatility• Self-describing

– Every disk different (at least due to media defects)– Only place to save data across boots

• Naming– Must have canonical way of referencing data

• Indirection – Build logically contiguous region from non-contiguous

physical regions– Deal with growing, variable size files and errors in

media