102
Naming 1 Principles of Computer System (2012 Fall) Bus & File System

Naming

  • Upload
    shiela

  • View
    33

  • Download
    0

Embed Size (px)

DESCRIPTION

Naming. Bus & File System. Principles of Computer System (2012 Fall). naming model. Module use of names . Two ways modules uses a named object: By value Module gets a copy of the named object By reference Module operates directly on the named object Purpose #1: Sharing and Organization - PowerPoint PPT Presentation

Citation preview

Page 1: Naming

1

Naming

Principles of Computer System (2012 Fall)

Bus & File System

Page 2: Naming

2

NAMING MODEL

Page 3: Naming

Module use of names • Two ways modules uses a named object:– By value• Module gets a copy of the named object

– By reference• Module operates directly on the named object

• Purpose #1: Sharing and Organization– Most communication happens using names

• Purpose #2: Delayed binding to an object– Supports replaceability and indirection

3

Page 4: Naming

Modules and names

4

Page 5: Naming

Naming schemes• Three parts– Name space: Symbols and syntax rules for

generating names– Name-mapping algorithm: Maps names to values– Universe of values: All possible of values

• Terminology– Binding – A mapping from a name to value– A name that has a mapping is bound– A name mapping algorithm resolves a name

5

Page 6: Naming

Naming model

6

Page 7: Naming

Naming Context

• Name lookup typical done in a context– Examples: • Mail [email protected] • Dial: 51355355• “Hey, you!”

• Name spaces with only one possible context are called universal name spaces– Example: US social security numbers

7

Page 8: Naming

Determining Context - 1• Hard code it in the resolver– Examples: Many universal name spaces work this

way• Embedded in name itself– [email protected] : • Name = “cse”• Context = “sjtu.edu.cn”

– /ipads.se.sjtu.edu.cn/courses/cse-g/2012f/README : • Name = “README” • Context = “/ipads.se.sjtu.edu.cn/courses/cse-g/2012f”

8

Page 9: Naming

Determining Context - 2• Taken from environment (Dynamic)– Unix cmd: “rm foo”: • Name = “foo”, • context is current dir

– Read 0x7c911109: • Name = “0x7c911109”, • context is thread’s address space

• Many errors in systems due to using wrong context

9

Page 10: Naming

Interpreter naming API• value ← RESOLVE(name, context)– Return the mapping of name in the context

• status ← BIND(name, value, context)– Establish a name to value mapping in the context

• status ← UNBIND(name, context)– Delete name from context

• list ← ENUMERATE(context)– Return a list of all bindings

• result ← COMPARE(name1, name2)– Check if name1 and name2 are equal

10

Page 11: Naming

Uniqueness

• Many naming systems are not unique– Names can map to 0, 1, or many values• RESOLVE can return NULL or a list of values

– A value may have 0, 1, or many names• Reverse RESOLVE can return NULL or a list

• Unique identity name space– Never reused – called a Stable Binding• Your SJTU student number • Many billing systems have “Customer #s”

11

Page 12: Naming

Name mapping algorithms

• Table lookup– Find name in a table• Examples: Phone

book, old /etc/hosts– Context: Specify which

table to use• Recursive lookup• Multiple lookup

12

Page 13: Naming

Addresses

• Addresses are used as both names and locators– e.g. IP address: 171.64.64.64, 1950s phone number, I/O

device address– Highly useful but fragility

• Work-around when object moves– Change all references - can be painful– Make it work for both new and old– Have client search if resolve fail

• Indirection is frequently the solution– Update indirection map to handle moves– Examples: host names, cell phone numbers, etc.

13

Page 14: Naming

14

BUS : A HARDWARE LAYER

Page 15: Naming

Booting

• 3 Abstractions in Computer– Interpreter– Memory– Communication link

• Naming in booting– Linux booting sequence– Bus address– Memory load– Mmap I/O & DMA

15

Page 16: Naming

Keep two questions in mind

• What is the memory, the interpreter, and the communication link respectively?

• What is the name, the context, the name mapping algorithm?

16

Page 17: Naming

Linux booting: 5 stages

17

System startup

Stage 1 bootloader

Stage 2 bootloader

Kernel

Init

BIOS on Flash

GRUB on MBR (Disk)

GRUB/LILO on Disk

Linux on Disk

User app on Disk

Page 18: Naming

1. BIOS

• BIOS’s job– 1st instruction: 0xFFFF0– POST (Power-On Self Test)– Manage resource: name space– Enumerate bus device– Load boot loader into memory & give control to it

• Three abstractions– Interpreter: CPU, BIOS controller, memory controller– Memory: flash memory & RAM– Communication link: system bus

18

Page 19: Naming

2. Bootloader stage 1 (MBR)

• MBR (Master Boot Record)– First 512-byte on the disk (the first block)

• Bootloader stage 1’s job– Load stage 2 into memory

& give control to it• Three abstractions– I: CPU & DC & MC – M: disk & RAM– C: system bus

19

Bootloader

PartitionTable

Magic num51

2-by

te

MBR

Page 20: Naming

3. Bootloader stage 2

• Bootloader stage 2’s job– List boot menu– Load user-selected kernel into memory

& give control to it• Three abstractions– I: CPU & DC & MC– M: disk & RAM– C: system bus

20

Page 21: Naming

4. Kernel

• Kernel’s job– Change CPU to protected mode– Initialize system…– Load init into memory and run it

• Three abstractions– I: CPU & DC & MC– M: disk & RAM– C: system bus

21

Page 22: Naming

22

5. init• init process– The first user space program, pid=1– The root & parent of all other processes

• init’s job– Run /etc/rc.d/rc.sysinit

• Start system processes in /etc/inittab• Start multiple “getty” which waits for console logins

• Three abstractions– I: CPU & DC & MC– M: disk & RAM– C: system bus

Page 23: Naming

23

Question

• How does CPU find the 1st instruction on BIOS?– Hard wire 0xFFFF0 to PC after reset

• What happens during a memory load?

Page 24: Naming

24

Booting sequence

• Three Abstractions– Interpreter: CPU, memory controller, disk controller– Memory: BIOS’s flash memory, RAM, disk– Communication link: System bus

• Common Patterns– Processor read from memory (LOAD) and interpret

• Memory cell naming: bus address– I/O devices transfer data to memory

• Disk sector naming: block number• DMA & Memory-mapped I/O

Page 25: Naming

25

Specific Operations

Processor

Memory I/O Device

load/store PIO/MMIO

DMA

• Memory Load/Store– Between CPU and memory– Physical memory address space

• I/O Operations– MMIO: map device memory

and registers into physical address space

– E.g., frame buffer• DMA– Also using physical address

Page 26: Naming

A Hardware Layer: the bus

26

Page 27: Naming

Bus: Hardware Layer

• Bus feature– A set of wires: comprising addr, data, control lines

that connect to a bus interface on each module– Bus arbitration protocol: decide which module may

send or receive a message at any particular time• Bus arbiter (optional): a circuit to choose which

modules can use the bus

– Broadcast link: every module hears every message• Bus address: identify the intended recipient

27

Page 28: Naming

Split-transaction

1. Source module requires exclusive use of the bus2. Source module places a bus address of the destine

module on the bus3. Source module signals READY wire to alert the

other module4. The destine module singles ACKNOWLEDGE wire

after copied the data– If synchronized, then READY & ACKNOWLEDGE are not

needed, just check the address lines on each clock cycle5. Source module releases the bus

28

Page 29: Naming

Memory load example: LOAD 1742, R1

Processor #2 => all bus modules: {1742, READ, 102}

29

Page 30: Naming

Memory load example: LOAD 1742, R1

• Memory1 recognizes the address is within its range– By examining just a few high-order address bits

30

Page 31: Naming

Memory load example: LOAD 1742, R1

• Memory1 acknowledges and processor2 releases the bus• Memory1 performs the internal operation to get the value– value <- READ (1742)

31

Page 32: Naming

Memory load example: LOAD 1742, R1

• Memory1 => all bus modules: {102, value}

32

Page 33: Naming

Memory load example: LOAD 1742, R1

• Processor2 is waiting for this result, just copies the data on the bus to its register R1

33

Page 34: Naming

Memory load example: LOAD 1742, R1

• Processor2 acknowledges and memory1 releases the bus

34

Page 35: Naming

Bus Address

• Bus address space (physical address)– Each module has its own bus address range– BIOS is in charge of managing at booting time– 1MB in the past, 4GB today, larger in the future– Basic unit: byte

• Each module examines the bus address field – For every message – Ignores those not intended for it– What about sniffering?

35

Page 36: Naming

Simple I/O Device in a Similar Way

• Example: Keyboard– When user depresses a key, keyboard SENDs a

message to the processor containing the key value– As the processor is not ready, its bus interface:• copies the data into a temporary register, • acknowledges the keyboard, • SENDs an interrupt signal to the processor

– The processor handles the interrupt in next cycle• SENDs the value over the bus to memory module

– Suitable for slow device, not suitable for disk

36

Page 37: Naming

37

DMA for Disk Device

• DMA (Direct Memory Access)– A processor SENDs a request to a disk controller to

READ a block of data– Including the address of a buffer in memory

• The disk SENDs the data directly to memory– Incrementing the memory address appropriately

Page 38: Naming

38

DMA for Disk Device

• Benefits of DMA– Relieve the CPU’s load to execute other program– Reduce one transfer (original two)– Take better advantage of long message if the bus

supports– Amortize the overhead of the bus protocol

Page 39: Naming

Memory Mapped I/O

• Use LOAD and STORE instructions to address the register and buffer of the I/O modules– Just like access memory– Address is overloaded name with location info

• Provide a uniform interface to bus modules– MMU translates virtual addr to physical addr• Physical address is system bus address

– I/O modules translate bus address to register address internally

39

Page 40: Naming

Memory Mapped I/O

40

Processor

MMU

Virtual address

Physical address (System bus address)

Memory Disk Keyboard

Internally translateto register address

Page 41: Naming

41

Volatile Address#include <stdio.h>void main(){ int i = 10; int a = i; printf("i= %d\n",a);

// Change value of i __asm { mov dword ptr [ebp-4], 20h } int b = i; printf("i= %d\n",b); }      

Page 42: Naming

Memory Mapped I/O combined with DMA

42

Page 43: Naming

DMA example

43

BIOS Memory Disk

Processor 1101

256-511 3072-4095 121-124

• Processor #1 => all bus modules: {121, WRITE, 11742}– Disk acknowledge and write the value 11742 to its control register

• Processor #1 => all bus modules: {122, WRITE, 3328}• Processor #1 => all bus modules: {123, WRITE, 256}• Processor #1 => all bus modules: {124, WRITE, 1}

102

Processor 2

Page 44: Naming

DMA example

44

BIOS Memory Disk

Processor 1101

256-511 3072-4095 121-124

• Disk => all bus modules: {3328, WRITE, data[11742]}– Memory acknowledge and save data[11742]

• Disk => all bus modules: {3329, WRITE, data[11743]}• ... (loop)• Disk => all bus modules: {3583, WRITE, data[11997]}

102

Processor 2

Page 45: Naming

DMA example

45

BIOS Memory Disk

Processor 1101

256-511 3072-4095 121-124

• When transferring is finished, disk controller SENDs message to the processor– Just like keyboard controller does when press a key

• Processor will enter interrupt handler next cycle• Now the processor knows that the DMA is done

102

Processor 2

Page 46: Naming

Questions

• Why not map the whole disk to memory?– So that the CPU can access a byte on the disk

directly by system bus– 1. Too large– 2. Too slow

46

The principle of least astonishment:

People are part of the system. The design should match the user’s experience, expectations, and mental models

Page 47: Naming

47

FILE SYSTEM: A SOFTWARE LAYER

Page 48: Naming

Outline

• UNIX file system– 7 layers in file system (3 + 1 + 3)

• FS API implementation– OPEN, READ, WRITE, CLOSE, FSYNC

• UNIX shell– Implied context, search path, name discovery

• Review of naming model

48

Page 49: Naming

File• File is a high-level version of the memory abstraction• A file has two key properties– It is durable & has a name

• The system layer implements files using modules from the hardware layer– Divide-and-conquer strategy– Makes use of several hidden layers of machine-oriented

names (addresses), one on another, to implement files– Maps user-friendly names to these files

• In UNIX, everything is a file - KISS

50

Page 50: Naming

API of the UNIX file system

• OPEN, READ, WRITE, SEEK, CLOSE• FSYNC• STAT, CHMOD, CHOWN• RENAME, LINK, UNLINK, SYMLINK• MKDIR, CHDIR, CHROOT• MOUNT, UNMOUNT

51

Page 51: Naming

The naming layers of the UNIX file system (version 6)

52

Page 52: Naming

Disk structure

53

track0

platters

track2track1

head 0

head 1

head 2

Cylinder 0

Cylinder 1

Sector 0Sector 1

• Platter• Track• Sector• Head• Cylinder

Page 53: Naming

Block layer

54

• Block size: a trade-off– Neither too small or too big

• Name mapping: block number -> block• Context: the storage device (e.g. disk) itself– Binds block numbers to physical blocks

• Name-mapping algorithm–

• Name discovery: super block– Keep track of block usage: e.g. free list, bitmap

BlockBlock num

Page 54: Naming

Super block

• One superblock per file system– Kernel reads superblock when mount the FS

• Superblock contains

55

– Size of the blocks– Number of free blocks– A list of free blocks– Index to next free block

– Lock field for free block and free inode lists– Flag to indicate modification of superblock

– Size of the inode list– Number of free inodes– A list of free inodes– Index to next free inode

BlockBlock num

Page 55: Naming

File layer

• File requirements– Store items that are larger than one block– May grow or shrink over time– A file is a linear array of bytes of arbitrary length– Record which blocks belong to each file

• Inode (index node)– A container for metadata about the file–

56

BlockBlock num

File(inode)

Page 56: Naming

File layer

• Name mapping: index number -> block number• Context: the inode itself• Name mapping algorithm

• Max length of an offset is 3 bytes in UNIX version 6• What about large files?

57

BlockBlock num

File(inode)

Page 57: Naming

Inode for larger files

58

inode

indirect blockdouble indirect block

block

BlockBlock num

File(inode)

Max length of an offset is 3 bytes in UNIX version 6

Page 58: Naming

Inode number layer

59

• Name mapping: inode number -> inode• Context: the inode table• Name-mapping algorithm: inode table– At a fixed location on storage

• Name discovery– Track which inode number are in use– E.g. free list, a field in inode

BlockBlock num

File(inode)

Inode num

Page 59: Naming

Put layers so far together

60

• Needs more user friendly name– Numbers are convenient names only for computer

• Numbers change on different storage device

BlockBlock num

File(inode)

Inode num

Page 60: Naming

File name layer

61

• File name– Hide metadata of file management– Files and I/O devices

• Name mapping algorithm– Mapping table saved in directory– Default context: current working directory– Context reference is also inode number

• The directory itself is a file–

– Max length of a name is 14 bytes in UNIX version 6

BlockBlock num

File(inode)

Inode num

Filename

Page 61: Naming

LOOKUP in a directory

• Name compare method: STRING_MATCH• LOOKUP(“program”, dir) will return 10

62

BlockBlock num

File(inode)

Inode num

Filename

Page 62: Naming

Path name layer

• Hierarchy of directories and files– Structured naming: E.g. “projects/paper”

• Name-mapping algorithm–

– PLAIN_NAME returns true if no ‘/’ in the path• Context: the working directory

63

BlockBlock num

File(inode)

Inode num

FilenamePath name

Page 63: Naming

Links• LINK: shortcut for long names– LINK(“Mail/inbox/new-assignment”, “assignment”)– Turns strict hierarchy into a directed graph

• Users cannot create links to directories -> acyclic graph– Different names, same inode number

• UNLINK– Remove the binding of filename to inode number– If UNLINK last binding, put inode/blocks to free-list

• A reference counter is needed

64

BlockBlock num

File(inode)

Inode num

FilenamePath name

Page 64: Naming

Links

• Reference count– An inode can bind multiple file names– +1 when LINK, -1 when UNLINK– A file will be deleted when reference count is 0• WARN: violation of the principle of least astonishment

– No cycle allowed• Except for ‘.’ and ‘..’• Naming current and parent

directory with no need to know their names

65

BlockBlock num

File(inode)

Inode num

FilenamePath name

Page 65: Naming

No cycle for LINK

66

/

25:1

• /a/b is a directory• The refcnt of a is 1• a’s inode num is 25

/

25:2

/

25: 1

a

b

• LINK (“/a/b/c”, a”)• Cause a cycle!• Refcnt of a is 2

a

bc bc

• UNLINK (“/a”)• Refcnt of a is 1, so the

inode 25 is not deleted• Now inode 25 is dis-

connected from graph

a

BlockBlock num

File(inode)

Inode num

FilenamePath name

Page 66: Naming

Renaming - 1

• Text edit usually save editing file in a tmp file• What if the computer fails between 1 & 2?

– to_name will be lost, which surprises the user– Need atomic action in chap-9

• Weaker specification– if to_name already exist, it will already exist even if machine

fails between 1 & 2

67

BlockBlock num

File(inode)

Inode num

FilenamePath name

Page 67: Naming

Absolute path name layer

• HOME directory– Every user’s default working directory– Problem: no sharing of HOME files between users

• Context: the root directory– A universal context for all users– Well-known name: ‘/’– Both ‘/.’ and ‘/..’ are linked to ‘/’

69

BlockBlock num

File(inode)

Inode num

Filename

Absolute pathPath name

Page 68: Naming

An example: find blocks of “/programs/pong.c”

70

Page 69: Naming

An example: find blocks of “/programs/pong.c”

71

• ‘/’ root directory: inode is 1

Page 70: Naming

An example: find blocks of “/programs/pong.c”

72

• Find the first directory in ‘/’ by block number

Page 71: Naming

An example: find blocks of “/programs/pong.c”

73

• Find ‘/programs’ by comparing name

Page 72: Naming

An example: find blocks of “/programs/pong.c”

74

• Find ‘/programs’ inode by its inode number 7

Page 73: Naming

An example: find blocks of “/programs/pong.c”

75

• Find the first file in ‘/programs/’

Page 74: Naming

An example: find blocks of “/programs/pong.c”

76

• Find ‘/programs/pong.c’ by comparing its name

Page 75: Naming

An example: find blocks of “/programs/pong.c”

77

• Find inode of ‘/programs/pong.c’ by the inode number 9

Page 76: Naming

An example: find blocks of “/programs/pong.c”

78

• Find block number of ‘/programs/pong.c’

Page 77: Naming

An example: find blocks of “/programs/pong.c”

79

• Find data of block 61 by its block number– And data of block 44 & 15

Page 78: Naming

Symbolic link layer

• MOUNT– Records the device and the root inode number of the file

system in memory– Record in the in-memory version of the inode for

“/dev/fd1” its parent’s inode– UNMOUNT undoes the mount

• Change to the file name layer– If LOOKUP runs into an inode on which a file system is

mount, it uses the root inode of that file system for the lookup

80

BlockBlock num

File(inode)

Inode num

Filename

Symbolic linkAbsolute pathPath name

Page 79: Naming

Symbolic link layer

• Name files on other disks– Inode is different on other disks– Supports to attach new disks to the name space

• Two options– Make inodes unique across all disks– Create synonyms for the files on the other disks

• Soft link (symbolic link)– SYMLINK– Add another type of inode– Context: the directory hierarchy

81

BlockBlock num

File(inode)

Inode num

Filename

Symbolic linkAbsolute pathPath name

Page 80: Naming

Two types of links (synonyms)• Add link “assignment” to “Mail/new-assignment”• Hard link

– No new file is created, just add a binding between a string and an existing inode

– Target inode reference count is increased– If target file is deleted, the link is still valid

• Soft link– A new file is created, the data is the string “Mail/new-

assignment”– Target inode reference count is not increased– If target file is deleted, the link is not valid

• Soft link can create cycle by SYMLINK(“a”, “a”)

82

Page 81: Naming

Symbolic link layer

• Another interesting behavior of soft link– Current directory is “/Scholarly/programs/www”– This wd contains a soft link• “CSE2012-web” -> “Scholarly/programs/www”

– Run following commands• CHDIR (“CSE2012-web”)• CHDIR (“..”)

– What is the current directory?• “..” is resolved in the new default context

83

Page 82: Naming

Decouple modules with indirection

84

Page 83: Naming

Implementing the file system API

• Review– CHDIR, MKDIR– LINK, UNLINK, RENAME– SYMLINK– MOUNT, UNMOUNT

• Next– OPEN, READ, WRITE, CLOSE– FSYNC

85

Page 84: Naming

File meta-data

• Owner ID– User ID and group ID that own this inode

• Types of permission– Owner, group, other– Read, write, execute

• Time stamps– Last access (by OPEN)– Last modification (by WRITE)– Last change of inode (by LINK)

86

Page 85: Naming

OPEN file

• Check user’s permission• Update last access time• Return a short name for a file– fd: file descriptor– Used by READ, WRITE, CLOSE

87

Page 86: Naming

File descriptor

• Each process starts with three open files– Standard in: fd = 0– Standard out: fd = 1– Standard error: fd = 2

• Can also use fd to name opened devices– Keyboard, display, etc.– Allow a designer not to worry about input/output• Just read from fd 0 and write to fd 1

• Each process has its own fd name space

88

Page 87: Naming

File cursor

• File cursor– Keep track of operation position within a file

• Sharing cursor– Parent passes its fd to its child

• In UNIX, child inherits all open fds from its parent

– Allow parent and child to share a output file• Not sharing cursor– Two processes open the same file

89

Page 88: Naming

fd_table & file_table

• One file_table for the whole system– Records information for opened files• Inode number, file cursor, reference count of opening

processes• Children can share the cursor with their parent

• One fd_table for each process– Records mapping of fd to index of the file_table

90

Page 89: Naming

File cursor sharing

91

3 115

index

Process A

fd_tablefd

3 116

index

Process Bfd

3 116

index

C is B’s childfd

• Process A, B and C all open just one file with inode number 23• Process A and B open the same file, not share file cursor• Process A and C share the file cursor

23 128

23 240

...inode num file cursorindex

115

116

file_table

1

refcnt

2

Page 90: Naming

WRITE & CLOSE• WRITE is similar to READ– Allocate new block if necessary– Update inode’s size and mtime

• CLOSE– Free the entry in the fd_table – Decrease the reference counter in file table– Free the entry in file table if counter is 0

• Failures in the middle may cause inconsistency– E.g. a block is allocated from on-disk free list, but no inode

records that block yet, then the block is lost

94

Page 91: Naming

Question

• When writing, which order is prefered?– Allocate new blocks, write new data, update size– Allocate new blocks, update size, write new data– Update size, allocate new blocks, write new data

95

Page 92: Naming

Delete after OPEN but before CLOSE

• One process has a file open• Another process removes the last name

pointing to that file– Reference counter is now 0

• The inode isn’t freed until the first process calls CLOSE

96

Page 93: Naming

FSYNC• Block cache– Cache of recently used disk blocks– Read from disk if cache miss– Delay the writes for batching– Improve performance– Problem: may cause inconsistency if fail before write

• FSYNC– Ensure all changes to the file have been written to the

storage device

97

Page 94: Naming

98

Questions

• What about virtual address space?• Where is cache?• Who assigns the physical addresses?• What’s the address of disk?• How to ensure DMA security?

Page 95: Naming

99

ABOUT THE LAB

Page 96: Naming

100

Distributed File System

• Components– Extent Server, Lock Server, Client– Shift the complexity from server to client• Unlike NFS

Page 97: Naming

101

FUSE

Page 98: Naming

102

Lab-1: Lock Server

• RPC– How to pass arguments?– Read rpc/rpc.cc

• Lock Server and Client– acqure/release lock– Multi-thread

• At-most-once– How to identify duplicated requests?

Page 99: Naming

103

Collaboration Policy

• You must write all the code you hand in• You are not allowed to look at other’s code• You may discuss with other students

Page 100: Naming

104

Program Environment

• We provide a VM image– Run on VMware– Available on our web site– username: cse– password: cselab

Page 101: Naming

105

Hand In• Hand in Process– $ make handin– rename the tgz file with your student ID– email the tgz file to xiayubin at gmail.com

• Hard Deadline– Hand in before the deadline: x 100%– Within 24 hours after deadline: x 80%– Within 48 hours after deadline: x 60%– Within 72 hours after deadline: x 40%– Within 96 hours after deadline: x 20%

Page 102: Naming

106

BACKUP