Upload
preston-campbell
View
215
Download
1
Embed Size (px)
Citation preview
2Penn ESE250 S'12 -- Kod & DeHon
Review
• Everything reduces to bits– Songs digitized and encoded– Machine code bit encoding for machine
instructions– Ebook text, Homework PDF, movies, …
• Memories store bits– Non-volatile memories store them
persistently (when the power goes off)
3Penn ESE250 S'12 -- Kod & DeHon
Persistent Storage Questions
• How do we save data across boots – When the computer is off
• How do we save data to move between machines?
• How do we organize our data so we can find it again?– Tell others to find it?
4Penn ESE250 S'12 -- Kod & DeHon
Strawman #1
• Guess a random address and write data there• Write down addresses on paper• To playback song
– Run program located at address 42000– On data located at address 81736
• How do I know if I can use the block at address 90,000?
• Why might this be problematic?
5
Course Map
Numbers correspond to course weeks
2,5 6
11
13
12
Today: file system
Penn ESE250 S'12 -- Kod & DeHon
6Penn ESE250 S'12 -- Kod & DeHon
Outline
• What does technology give us?• Requirements?• Interlude• File System Sketch
9
Hard Disk
• Disc with magnetic material on its surface
• Divided into tracks (circles)
• Modern disks – 300,000 TPI– TPI = Tracks Per
InchPenn ESE250 S'12 -- Kod & DeHon
10
Hard Disk
• Disc with magnetic material on its surface
• Divided into bit regions
• Modern disks– 1.5M BPI– BPI= bits per inch
Penn ESE250 S'12 -- Kod & DeHon
11
Hard Disk
• Each bit located at a position (R,)• R = select track• = select bit from track• Disc spins
– Traces through Q
Penn ESE250 S'12 -- Kod & DeHon
12Penn ESE250 S'12 -- Kod & DeHon
Hard Disk
• Each bit located at a position (R,)• Add arm to move head
13Penn ESE250 S'12 -- Kod & DeHon
Hard Disk• Each bit located
at a position (R,)• Head arm moves
– Varies R
14
Disk Bandwidth
• Typical Disk speed?– 15,000 RPM– One rotation every
• 60s/15,000=4ms
• At R=1 inch and 1.5M BPI, How many bits/second?– 2 p 1 in 1.5MB/in / 4ms– 9Mbits/4ms = 2.25Gb/s
• ≈ 280 MB/s
Penn ESE250 S'12 -- Kod & DeHon
15Penn ESE250 S'12 -- Kod & DeHon
Disk Speed
• Move head in R?–Also a few
milliseconds• Typical Data
access: ~10ms– E.g. 4ms rotate
+6ms seek
16Penn ESE250 S'12 -- Kod & DeHon
Throughput and Implications
• Disk throughput faster than access time– 10ms latency– 280MB/s throughput (~1B/4ns)
• What does this drive us to?– 10ms seek Random byte access 100B/s– Sequential access 280MB/s – Want to exploit sequential access!
• Read blocks of data
17Penn ESE250 S'12 -- Kod & DeHon
Read Data Blocks
• How many sequential bytes can read in 1ms? 280MB/s 0.001 s = 280KB
Can read 280KB in the same time as 1Byte 6ms seek, 4ms rotation, 1ms data read
18Penn ESE250 S'12 -- Kod & DeHon
Seagate 2.5” Disk Drive
http://www.seagate.com/docs/pdf/datasheet/disc/ds_momentus_5400_psd.pdf
5400 RPM
19
FLASH Memory• Exploit tunneling• Use high-voltage to reduce barrier
– Tunnel charge onto floating node– Charge trapped on node
• Use field from floating node to modulate conduction
http://commons.wikimedia.org/wiki/File:Flash-Programming.png
Week 8
Penn ESE250 S'12 -- Kod & DeHon
20Penn ESE250 S'12 -- Kod & DeHon
Flash
• NOR -- Read like other memories• NAND – Sequential read within “page”
– Denser than NOR
• Can only “erase” in blocks– 4KB, 64KB256KB
• Once erased can write byte (page) at a time– Write time variable– Typically need feedback to sense when written
bitline
bselect
gndselect
w0
w1
w2
w3
w0
w1
w2
w3
bitline
21Penn ESE250 S'12 -- Kod & DeHon
Samsung 256Mx8 NAND Flash
http://www.datasheetcatalog.com/datasheets_pdf/K/9/E/2/K9E2G08U0M.shtml
22Penn ESE250 S'12 -- Kod & DeHon
Intel Solid-State Drive (SSD)
SSDhttp://download.intel.com/design/flash/nand/extreme/extreme-sata-ssd-datasheet.pdf
35,000/s x 4KB = 140MB/s
24Penn ESE250 S'12 -- Kod & DeHon
File
• File – sequence of bits that go together– MP3 encoding– Executable for mp3player– Picture in JPEG– PDF for your lab writeup
• How big is a file?
25Penn ESE250 S'12 -- Kod & DeHon
File• File – sequence of bits that go together
– Like an object• A base address• Length or extent• Generally has a type…but only used weakly
– Mp3, WAV, x86 executable, …
– On unix/linux• an array of unsigned char• with a length • Magic number tries to convey type
– Inband, in the file (first word?)
26Penn ESE250 S'12 -- Kod & DeHon
Create a File?
• What do I need to do to create/store a new file?– Allocate/reserve space for it– Give it a name?– Make a record somewhere of mapping
between name and location
27Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
free_address
28Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
• When create file, give it space at free_address– Increment free_address by length
free_address
29Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
• When create file, give it space at free_address– Increment free_address by length
• Store name & address in table– Maybe put table at high addresses
free_address
File1: 0
30Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
• When create file, give it space at free_address– Increment free_address by length
• Store name & address in table– Maybe put table at high addresses
free_address
File1: 0File2: 250
31Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
• When create file, give it space at free_address– Increment free_address by length
• Store name & address in table– Maybe put table at high addresses
• When free_address+len>=table_base– Device is full
free_address
File1: 0File2: 250
32Penn ESE250 S'12 -- Kod & DeHon
Strawman #2
• Keep track of next free address – free_address – initialized to 0
• When create file, give it space at free_address– Increment free_address by length
• Store name & address in table– Maybe put table at high addresses
• When free_address+len>=table_base– Device is full
free_addressFile1: 0File2: 250File3: 350File4: 900File5: 1100
33Penn ESE250 S'12 -- Kod & DeHon
Evaluating Strawman #2
• Good+ Accommodates variable length files+ Allows contiguous access
• Bad– What happens when
delete a file?• How reuse space?
– Add data to files?– Table gets big– All filenames have to be
unique?• Demands coordination
between
– users?– Programs?– Programs from
different vendors?
34Penn ESE250 S'12 -- Kod & DeHon
Files Grow and Shrink
• Essay/homework gets longer as write it – Don’t know how long it will be when start
• Database of checks written grows• TODO before end-of-term list shrinks?• Where put additional space?
– Allocate new space at end of disk?
35Penn ESE250 S'12 -- Kod & DeHon
Delete Files
• Don’t need lame .o files once build executable
• Replace false start• That was a bad picture of me
– Now have better• Don’t want anyone to see my secret
plans to take over the world• …want the space back because drive
filling up
36Penn ESE250 S'12 -- Kod & DeHon
Repurposing Space
• How reclaim space?• With single free_address pointer
– can’t keep track of all the places where there is space
• What can do?– Keep a list of free regions– Try to find a region where will fit
37Penn ESE250 S'12 -- Kod & DeHon
Finding Contiguous Space
• What if our disk looks like this
• …and we want to allocate a large file?• Disk has capacity
– But cannot allocate because not contiguous
38Penn ESE250 S'12 -- Kod & DeHon
Bad Sectors
• Portions of a disk head may be bad– At manufacture time– Go bad during use
• Portions of Flash RAM may be bad– Manufacturing defects– Limited number of write cycles
• Also inhibits contiguous allocation
39Penn ESE250 S'12 -- Kod & DeHon
Naming Conflicts
• How solve naming conflicts?– Provide separate contexts
• E.g. separate space of names – for each user– for each program
– Typically with a directory structure
/home /andre /lab9 /solutions.pdf /lab10 /solutions.pdf /bgojman /lab9 /solutions.pdf /lab10 /solutions.pdf
40Penn ESE250 S'12 -- Kod & DeHon
Directory
• Special file that contains name to location mappings
• Once a file, we can easily allow hierarchy – Directories can contain directories
File1: 0File2: 250File3: 350File4: 900File5: 1100
41Penn ESE250 S'12 -- Kod & DeHon
Requirement Roundup• Find things easily and quickly
– Minimize what we need to look at to find data• Portable (is the sole state holder)
– Self describing• Fast read
– Attempt to layout contiguous files• Fast write
– Not take too long to find space for file• Support deletion (repurpose capacity)• Use (most) of capacity
– Allow files to be non-contiguous• Tolerate errors in media
– Don’t depend on contiguous blocks to be good• Isolate/differentiate who can access what
Challenge: both asymptotics and constants (e.g. 280MB/s vs. 10ms random access) matter.
43Penn ESE250 S'12 -- Kod & DeHon
• Jurassic Park Unix-navigation-sequence• From: “you can’t hold it by yourself”
– (around 1:52 into movie)• To: “…security systems; you name it we got it.”
– (around 1:54) – less than 2 minute total– YouTube clip that is a bit shorter:
http://www.youtube.com/watch?v=dFUlAQZB9Ng– One that is a bit longer (superset of intent):
http://www.dailymotion.com/video/x4tbis_jurassic-park-unix-system-scene_tech
44Penn ESE250 S'12 -- Kod & DeHon
Disk Data Security
• How is security enforced?– OS demands credentials for login– User doesn’t get direct access to hardware– OS intermediates
45Penn ESE250 S'12 -- Kod & DeHon
Physical Disk Access
• What happens if the disk is removed from the physical machine?– Plugged into another machine that
• Someone else has administrator access on?• Doesn’t respect the users/isolation?
46Penn ESE250 S'12 -- Kod & DeHon
Common News Item
Computer hard drive sold on eBay 'had details of top secret U.S. missile defence system'
• By Daily Mail Reporter
Last updated at 11:08 AM on 07th May 2009•
Highly sensitive details of a US military missile air defence system were found on a second-hand hard drive bought on eBay.
• The test launch procedures were found on a hard disk for the THAAD (Terminal High Altitude Area Defence) ground to air missile defence system, used to shoot down Scud missiles in Iraq.
• The disk also contained security policies, blueprints of facilities and personal information on employees including social security numbers, belonging to technology company Lockheed Martin - who designed and built the system.
•Read more: http://www.dailymail.co.uk/news/article-1178239/Computer-hard-drive-sold-eBay-details-secret-U-S-missile-defence-system.html#ixzz0Wxa60PT9
47Penn ESE250 S'12 -- Kod & DeHon
…all too common
VA Update on Missing Hard Drive in Birmingham, Ala.• February 10, 2007 Printable Version
• Investigation Yielding Additional Information • WASHINGTON -- The Department of Veterans Affairs (VA) today issued an update on the
information potentially contained on a missing government-owned, portable hard drive used by a VA employee at a Department facility in Birmingham, Ala.
• “Our investigation into this incident continues, but I believe it is important to provide the public additional details as quickly as we can,” said Jim Nicholson, Secretary of Veterans Affairs. “I am concerned and will remain so until we have notified those potentially affected and get to the bottom of what happened.
• “VA will continue working around the clock to determine every possible detail we can,” Nicholson said.
• VA and VA’s Office of Inspector General have learned that data files the employee was working with may have included sensitive VA-related information on approximately 535,000 individuals. The investigation has also determined that information on approximately 1.3 million non-VA physicians – both living and deceased – could have been stored on the missing hard drive. It is believed though, that most of the physician information is readily available to the public. Some of the files, however, may contain sensitive information.
48Penn ESE250 S'12 -- Kod & DeHon
Still Happening
Probe Targets Archives’ Handling of Data on 70 Million Vets• By Ryan Singel October 1, 2009 • The inspector general of the National Archives and Records Administration is
investigating a potential data breach affecting tens of millions of records about U.S. military veterans, Wired.com has learned. The issue involves a defective hard drive the agency sent back to its vendor for repair and recycling without first destroying the data. ....
• The incident was reported to NARA’s inspector general by Hank Bellomy, a NARA IT manager, who charges that the move put 70 million veterans at risk of identity theft, and that NARA’s practice of returning hard drives unsanitized was symptomatic of an irresponsible security mindset unbecoming to America’s record-keeping agency.
• “This is the single largest release of personally identifiable information by the government ever,” Bellomy told Wired.com. “When the USDA did the same thing, they provided credit monitoring for all their employees. We leaked 70 million records, and no one has heard a word of it.”
http://www.wired.com/threatlevel/2009/10/probe-targets-archives-handling-of-data-on-70-million-vets/
49Penn ESE250 S'12 -- Kod & DeHon
Caveats• On standard unix/windows setups
– Without the OS to providing protection, all the data is accessible• Sometimes good for recovery
– On standard unix/windows setups• Rm/del doesn’t make the data go away
– Also sometimes useful for recovery
– Even format not guarantee data overwritten
• See: Remembrance of Data Passed: A Study of Disk Sanitization Practices– IEEE Security and Privacy, v1n1p17—27 (linked from today’s reading)
51Penn ESE250 S'12 -- Kod & DeHon
Sketch
• Manage the disk at the level of blocks of fixed size (bnodes)
• Format disk for bnodes• File is a collection of bnodes• Directory is a kind of file• Root of system bnode in known location
52Penn ESE250 S'12 -- Kod & DeHon
bnode
• Fixed-size block of data• Minimum unit of storage allocation • bnodes map to physical addresses
– E.g. bnode 76 address 764096 = 311296 • Or bnode 76 R=1.012in, theta=32.07 degrees
• Address physical resources through bnodes
53Penn ESE250 S'12 -- Kod & DeHon
bnode Size• How big should a bnode be?
– Needs to be bigger than the block address– Problems with small blocks?
• Longer addresses• Can address smaller file size w/ fixed # address bits
– Problems with large blocks?• Minimum allocation increment• Internal fragmentation
– Typical values 4KB, 1KB, 256B• Trending toward larger these days
– Intel 4KB SSD, 64KB for some flash
54Penn ESE250 S'12 -- Kod & DeHon
Files from bnodes
• Use bnodes as file handle– How we address the file
• bnode contains metadata– Block type, File type, length
• Small file: all data in single bnode
76: obj, 3172
obj, 3172
3172 BytesFile contents
24 Bytes metadata
900 Bytes unused
4096 Byte bnode
55Penn ESE250 S'12 -- Kod & DeHon
Files from bnodes
• Large file: tree of bnodes76: 77:
78:
79:
80:
obj, 15,791
56Penn ESE250 S'12 -- Kod & DeHon
Files from bnodes
• Large file: tree of bnodes– Multi-level if necessary
76: 77:
78:
79:
80:
obj, 15,791
mp3, 12MB
102437253:
57Penn ESE250 S'12 -- Kod & DeHon
Files from bnodes
• Large file: tree of bnodes– Multi-level if necessary
• Overhead for tree structure?– 4KB pages
• About 1000-way tree• 1000KB tree needs 1001 pages• 8KB tree needs 3 pages (50% overhead)
– In practice inodes avoid this worst-case
76: 77:
78:
79:
80:
obj, 15,791
58Penn ESE250 S'12 -- Kod & DeHon
EXT2 inode
[Source: http://www.tldp.org/LDP/tlk/fs/filesystem.html]
(12 of these) indirect blocks
59Penn ESE250 S'12 -- Kod & DeHon
File Expansion with bnodes
• Expand file– Add bnodes to file
76: 77:
78:
79:
80:
obj, 15,791 76: obj, 18,357 77:
78:
79:
80:
12374:
60Penn ESE250 S'12 -- Kod & DeHon
Directory• File
– With type directory– contains name/bnode pairs
• Small– Fits in one bnode
• Large– Tree of bnodes
• Just like file
Directory, 234
Lab1, 76Lab2, 98Lab3, 1034Lab4, 267Lab5, 2053….
61Penn ESE250 S'12 -- Kod & DeHon
Free bnodes• Keep track of free and usable bnodes• Grouped by contiguous set of free blocks
• Allocation – try to find contiguous set of bnodes to satisfy file need– …and try not to breakup large contiguous block
unnecessarily• Deletion – try to reassemble free blocks
– E.g. delete 13 make 10—14 length 5 block
01234567891011121314151617181920212223
Length 1: 1, 14Length 2: 3-4, 7-8, Length 3: 10-12, 16-18Length 4: 20--23
62Penn ESE250 S'12 -- Kod & DeHon
Superblock
• For bootstrapping and file system management– Each file system has a master block in a canonical
location (first block on device)– Describes file-system type– Root bnode – Keeps track of free lists …at least the head
pointers to (bnodes, blocks)• Corruption on superblock makes file system
unreadable– Store backup copies on disk
63Penn ESE250 S'12 -- Kod & DeHon
Format disk
• Identify all non-defective bnodes– Defective blocks skipped– those addresses not assigned to bnodes
• Create free bnode data structure• Create superblock
64Penn ESE250 S'12 -- Kod & DeHon
Review Sketch
• Manage the disk at the level of bnodes• Format disk for bnodes• File is a collection of bnodes• Directory is a kind of file• Root of system bnode in known location
65Penn ESE250 S'12 -- Kod & DeHon
Requirement Review• Find things easily and quickly
– Minimize what we need to look at to find data• Directory structure
• Portable (is the sole state holder)– Self describing superblock, metadata
• Fast read– Attempt to layout contiguous files
• Fast write– Not take too long to find space for file efficient free structure
• Support deletion (repurpose capacity)– Return bnodes to free list
• Use (most) of capacity– Allow files to be non-contiguous bnodes
• Tolerate errors in media– Don’t depend on contiguous blocks to be good bnodes
• Isolate/differentiate who can access what
66Penn ESE250 S'12 -- Kod & DeHon
Learn More
• Online reading/pointers– Unix File System Tutorial– Flash, SSD, Hard drive data sheets– Data found on hard drive articles
• Courses– CIS121 – efficient data structures– CIS380 – operating systems
67Penn ESE250 S'12 -- Kod & DeHon
Big Ideas
• Persistence/Volatility• Self-describing
– Every disk different (at least due to media defects)– Only place to save data across boots
• Naming– Must have canonical way of referencing data
• Indirection – Build logically contiguous region from non-contiguous
physical regions– Deal with growing, variable size files and errors in
media