56
COMP091 – Operating Systems 1 Linux Filesystems (EXTx) ISO9660 UDF

Linux Filesystems (EXTx) ISO9660 UDFfleming0.flemingc.on.ca/~chbaker/COMP091-OS1/COMP091-04-FS2.…Filesystems • ReiserFS – Hans Reiser led team at Namesys – First journaling

Embed Size (px)

Citation preview

COMP091 – Operating Systems 1

Linux Filesystems (EXTx)ISO9660

UDF

Linux Filesystems

Filesystems• ReiserFS

– Hans Reiser led team at Namesys

– First journaling FS for linux

– Out of favor since Hans charged with murdering wife and Namesys folded

• JFS

– Journaling File System from IBM

• ext2, ext3, ext4

– Journaling introduced in version 3

– The “standard” FS for linux

• btrfs is a sophisticated new fs for linux

• FATxx, NTFS, XFS and others also supported

Ext file systems

• ext: Extended File System

– 1992

– Created for the linux kernel

– Based on Unix File System (UFS)

– 2GB, 255 character file names

• ext2

– 2TB

– POSIX ACL's

– Timestamps

Sidetrack: POSIX

• Portable Operating System Interface

• Family of IEEE standards

– Institute of Electrical and Electronics Engineers

• IEEE Std 1003.1-1988

• ISO/IEC 9945

• Richard Stallman suggested the name

• Designed to maintain compatibility between operating systems.

ext3

• Journaling

• Made JFS and ReiserFS unnecessary

– Although ext3 not as fast

• Backward compatible with ext2

– Just add journal file

• Blocksize 1 – 8 KiB

• Filesize 16 GiB – 2 TiB

• FS size 2 TiB – 32 TiB

Journal

• After image log

• Options:

– Data and metadata, written before data

– Metadata only, data written before the journal is marked as committed

– Metadata only, written in any order

ext4

• Stable in linux 2.6.28 (2008)

• Basically a batch of simultaneous updates to ext3

• Motivated partly by problems of large file systems

• FS size 1 EiB(2**60)

• File size 16 TiB

https://en.wikipedia.org/wiki/Exbibyte

ext4

• Extents replace block mapping for allocation

– Extent = starting block and block count

• Available pre-allocation

• Delayed allocation

• 64000 subdirectories (up from 32000)

• Htree indexes– Indexes on hash of filename– Smaller and faster

• Nanosecond timestamps

Blocks

• Like clusters in FAT file system

• In ext2/3: 1024, 2048 or 4096

– Specify when fs created

• Block groups are large collections of contiguous blocks that partition the FS data structures

– Usually size determined by FS but can be specified

• Size limited by bitmaps so large FS may need multiple block groups

Block Group

Superblock

• 2 sectors (1024 bytes) that describe the file system

– Volume label

– Block size

– # blocks per group

– # reserved blocks before the 1st block group

– Count of free inodes & blocks (total all groups)

Superblock

• 1st superblock is1024 bytes past the beginning of the file system

• Copies of the superblock are in the first block of each block group

Group Descriptor Table• Stores

– The group descriptors

– One for each block group

– Starting block addresses

– block bitmap

– inode bitmap

– inode table

– Count of free inodes & blocks for the group

• Located in the block after the superblock

– Backup copies are in the same block groups as the superblock backups

Block Bitmap

• One bit per block in the group

– size = #blocks / 8

• Linux creates a block group to have as many blocks as there are bits in a block

• Thus, a block bitmap is always 1 block in size

• Tracks block allocation for the group

Block Bitmap

Inode Bitmap

• Tracks the allocation of inodes in the group

– Size = #inodes per group / 8

– Size defined at file system creation

• Typically fewer inodes than blocks in group

inodes• Contained in inode table

• Like MFT records for files

• 256 bytes per inode

• Number of inodes determines number of files

• Specified when FS created

• Contain file and directory metadata

• Directory has file or directory name and pointer to inode in the inode table

• Inode points to the file content blocks

inode contents• Type of the file

– plain file, Directory, symbolic link, device file

• Access permissions

• Owner and group ID numbers

• Size in bytes

• Number of links (directory references)

• Times of last access and last modification to the file

• List of data blocks claimed by the file.

• Address of the file's blocks on the disk

Directories

• Special files that associate names with the inode numbers used internally by the file system

• Each entry associates one file name with one inode number

• Consists of:

– inode number,

– length of the file name

– actual text of the file name.

Directory Structure

• Hierarchical

– Like FAT and NTFS

• But all data stored in one tree

– No C: N: etc

• Includes network shares, all physical devices, removable devices and some virtual file systems

Linux File Structure

/

usr mnt varrootbin boot

/dev/fd0

/mnt/floppy /mnt/cdrom

/dev/hdc

/mnt/USB

/dev/sda

File system

Points to a device

References

• https://en.wikipedia.org/wiki/Ext2

• https://en.wikipedia.org/wiki/Ext3

• https://en.wikipedia.org/wiki/Ext4

• https://ext4.wiki.kernel.org/index.php/Ext4_Disk_Layout– Technical

btrfs B-tree File System

btrfs

• B-tree file system

• Development started at Oracle 2007

• Now considered stable

• Available in most linux distros

• Default fs in new SuSE releases

• Completely new design

– Ext file systems design encumbered by backward compatibility

– btrfs borrows features from reiserfs

btrfs Features – B-tree

• B-tree

• Specialized search tree optimized for disk indexes

• Self balancing

• “Safe” balancing algorithm

btrfs B-trees

• Everything stored in B-trees of different types

– Accessed by one algorithm/code base

• All trees indexed by one root tree

• Sub-trees for various file system functions

• All use the same index format

– 64 bit object ID

– 8 bit type

– Remaining 64 bits are type-dependent

B-tree Types

• Root tree

• File system tree

– Visible files and directories

– Files stored in extents (or in-tree)

• Extent allocation tree

• Log tree

– Holds a journal

B-tree Types

• Chunk and Device trees

– Chunks are parts of a logical division of the fs space

– Mapped to physical chunks by Chunk tree

– Device tree contains inverse mapping

– Allows mapping to change, eg to add a new device without changing logical view of the fs

• To mount a fs need to find the chunk tree (by looking in the chunk tree?)

– Super blocks at fixed locations contain addresses of chunk and root tree locations

btrfs Features – Copy on Write

• Copy on Write sometimes called COW

• General idea is that many processes can access the same resource

• When one wants to change the resource it makes a copy and changes it

• Other processes still see the unchanged resource

• Used in memory management for delayed allocation, demand paging etc

• Used in NTFS for Volume Shadow Copy

• Can be used for checkpointing

btrfs Copy on Write

• COW provides before images

– Like a journal

• Can be used to undo changes, making btrfs self-healing

• Used to implement file-cloning

– A snapshot of a file

– Clone created by COW

• Can snapshot an entire btrfs volume

– New version created on the fly by COW

btrfs Copy on Write

• Converting from ext in place

• Btrfs mostly doesn't care where its metadata are stored

• Can be put in empty space of an ext fs

• Block pointers point to data blocks created by the ext file system

• COW used to allow changes through btrfs while leaving original ext data intact

• Eventually the btrfs becomes the only copy

btrfs -- Other Features

• Sub volumes

– Part of a btrfs directory tree can be treated as a sub-volume

– Acts like a separately mountable partition contained in the btrfs file system

– Snapshots are implemented as sub-volumes

• Multi-devices

– fs can be created over a pool of multiple devices or partitions

– New devices can be added to expand the fs capacity

btrfs -- RAID

• Muti-device btrfs can use RAID to spread the data over the physical devices

• RAID 0,1,10,5 and 6 are planned

• RAID5 and RAID6 not really ready yet

• More flexible about volumes used in mirror set

• RAID5 and 6 will use more parity devices to provide increased reliability

btrfs -- Send/Receive

• Send creates diff file between a sub-volume and some other volume

– Such as a volume and a snapshot

• Receiving the diff file makes one volume equal the other

• The diff file is essentially an incremental backup

• Can also be used to create and maintain a remote replica

btrfs Reference

• https://wiki.archlinux.org/index.php/Btrfs

• Lots more at the end of the above article

ISO9660

ISO9660

• Filesystem spec for data on CDs (and DVDs)

• 24 byte frames

• 2352 byte sectors (98 frames)

• 2048 user data, 288 bytes ECC and headers and sync data

– A/V disks can use ECC area for data

• Assumed contiguous allocation allows simpler FS structures

ISO Limitations

• For cross platform compatibility:

• File names have upper case letters, digits, underscores and one “.”

• No spaces

• Eight level directories

• 4GiB file size limit

• Most OS ignore or circumvent these limits

• Path table (for efficiency) imposes limit 65,535 on number of directories (not in linux)

ISO Extensions

• Sessions

– ISO is read only, contiguous pre-allocating FS

– Doesn't expect data to be appended

• Sessions allow data to be added to a CD

• Each new session contains an updated copy of the entire disk's directories and other data structures

ISO Extensions

• Joliet

– Unicode names

– Avoids file name restrictions

• Rock Ridge supports POSIX acl's and longer names

• El Torito allows cd's to be bootable

• Apple's ISO 9660 extensions allow for Apple resource forks

UDF x.x

UDF

• Universal Disk Format

– Open specification designed for any media

– Mostly used for DVD and BD instead of ISO9660

– Official file system for DVD-Video and Audio per DVD Forum

• Design suitable for incremental updates

– As opposed to creating ISO then burning

• Specification maintained by Optical Storage Technology Association (OSTA)

– Most of the world's optical product manufacturers and resellers

UDF Revisions

• 1.02: DVD video

• 1.50: Introduced VAT for CD-R/DVD-R

• 2.00: File types for DVD recording

• 2.01: Fix bugs in 2.00

• 2.50: Adds metadata partition, Used on some Blu-ray disks

• 2.60: Adds Pseudo OverWrite. Used by some other Blu-ray disks

UDF “Builds”

• Plain (the original one)

– Must be built then burned to CD like ISO9660

• VAT

– Like ISO9660 with lots of little sessions

• Spared

– For -RW media

– Knows to move often changed metadata around to avoid wearing out sectors with re-writes

Compatibility

• Windows calls UDF “Live File System”

– Because nothing should be referred to by its real name

• Vista and later support all features

• Linux support is evolving

– Read for all versions

– Write is “safe” up to 2.01 for plain build

Terminology

Terminology

• Sector

– Hard drives

– At interface between disk and controller

– Header, data and ECC

– Header contains sync bytes, address identification, flaw flag and header parity bytes

– Usually 512 bytes for data

– Can be 2048 or 4096 (advanced format sectors)

– Sometimes interleaved

Terminology

• Track

– One ring of sectors around a disk

– Can be read by one read head without moving the head

– Invisible to FS and application

• Cylinder

– On multi-platter disk, one head per platter, all move in unison

– Cylinder = the tracks read by the heads without moving the heads

Terminology

• Block

– Sometimes called physical record

– Consists of multiple logical records• On tape media where there is no sector

– Or multiple sectors• On disk

– Transfer unit from device to file system

– Application can specify block size to fine tune performance

– Block size = track size can make sense

Terminology

• Record

– = logical record

– Meaningful to applications

– Internal structure of files• File is usually a collection of logical records

– Not relevant to FS or OS• Except in files OS cares about, like directories,

executables, MFT

– Physical record usually means block

– Sometimes sector is called physical record

Terminology

• Stream

– The non-metadata part of a file

– Actual data is a string of bytes• Byte 0 to byte filesize – 1

– This part of file sometimes called the data stream• NTFS allows alternate data streams

– Some of the metadata are called streams– In MFT name value pairs, value is called stream

Terminology

• Cluster

– Allocation unit

– Sometimes called block

– In ext FS allocation units are called blocks• Ext4 and btrfs allocation unit called extent

Terminology

• Partition

– A subdivision of the space on a disk

– Contiguous, may need to start on physical boundaries

– FSs are contained in partitions

– Often called volumes

– Partitions are the “mounting unit”

– Note C: is where a FS in a partition or volume is mounted

– But C:, D: etc are often called drives, or disks

Terminology

• Logical Volumes

– Linux

• Logical Disks, Storage Spaces

– Windows (name changed with S2012)

• Refer to partitions that span multiple physical devices

• Usually use of LV's allows addition or removal of physical extents to resize a volume

• Also may support snapshots, RAID levels