Prof. Dr. Bernhard NeumairLehrstuhl für praktische Informatik - GWDGInstitut für Informatik
Kapitel 4:File-System Virtualization
4. File-System Virtualization
4.2Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
File Virtualization, File System Virtualization
Storage Virtualization
DiskVirtualization
Tape, Tape Drive,Tape Library Virtualization
File SystemVirtualization
File/RecordVirtualization
BlockVirtualization
Host-based, Serverbased Virtualization
Network-basedVirtualization
Storage device, Storage subsystem Virtualization
In-band Virtualization
Out-of-bandVirtualization
What is created:
Where it is done:
How it is implemented:
4. File-System Virtualization
4.3Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Terminology
File / Record VirtualizationPresents one or more underlying objects as a single composite object -Objects can be files or directoriesCan provide HSM-like (Hierarchical Storage Management) properties in a storage systemPresents an integrated file interface - File data and metadata are managed separately in the storage system
File System VirtualizationAggregates multiple file systems into one large „virtual file system“.Users access data through the virtual file systemUnderlying file systems transparent to usersEnables additional functionality, e.g. a different file access protocol, on top of one or more existing file systemsVirtual file system can be implemented in addition to standard file system
4. File-System Virtualization
4.4Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
File System Virtualization: Network Attached Storage
FilesFiles
FilesFiles
Virtual file systems
NFS, CIFS clients
File Servers
NAS Appliances
Host access LAN
4. File-System Virtualization
4.5Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Network Attached Storage NAS
IP-NetworkIP-NetworkClients & ServerClients & Server
Network Attached StorageNetwork Attached Storage
NAS ApplianceNAS Appliance
File ProtocolFile Protocol
(CIFS, NFS, HTTP,FTP, etc.)(CIFS, NFS, HTTP,FTP, etc.)
4. File-System Virtualization
4.6Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
NAS Systems
Host
LANLAN
Host
Application
NAS System
LAN-attached NAS systemMay do SN/block aggregation, etc. inside the NAS system box
LAN-attached NAS systemMay do SN/block aggregation, etc. inside the NAS system box
File
/ re
cord
La
yer
Blo
ck
Laye
r
Host block-aggregation
Network block-aggregation
Device block-aggregation
Private SN?
Private SN?
4. File-System Virtualization
4.7Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
NAS Head, NAS Gateway
IP Networkover EthernetIP Network
over Ethernet
File IO ProtocolsEnterprise
SAN
FibreChannel
Shared Storage
NFSor
CIFS
NFSor
CIFS
FileSystem
FileSystem
Block IO Protocol
NAS GatewayGives the combined benefits of NAS und SAN
NAS flexibility and ease of useSAN scalability on the IP network
Increases the reach of Fibre Channel storage devicesExtends beyond topology limitations of Fibre ChannelAllows Fibre Channel devices to be used on the IP networkConnectivity to switches, RAID controllers and disk arrays
Leverages the value of Fibre Channel investmentReduces access costs to Fibre devicesAllows access to underutilized SAN storageEnables heterogeneous file serving on SAN storage devices
4. File-System Virtualization
4.8Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
NAS Head, NAS Gateway (2)
Host
LANLAN
Host
Application
NAS head
No storage in the file server controller boxNo storage in the file server controller box
File
/ re
cord
La
yer
Blo
ck
Laye
r
Host block-aggregation
Network block-aggregation
Device block-aggregationSNSN
Disk array
4. File-System Virtualization
4.9Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
NAS Appliances and SAN
„blocks“ (virtual disks)„blocks“ (virtual disks)
FilesFiles
Virtual file systems
NFS, CIFS clients
Storage access SAN (Fibre Channel)
SAN file appliance- virtualized storage- present file systems
Host access LAN
• Heterogeneous hosts• NAS and SAN advantages• Scalability & Performance
• Heterogeneous hosts• NAS and SAN advantages• Scalability & Performance
4. File-System Virtualization
4.10Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Storage Virtualization: Protocols and Layers
SANSAN
iSCSI Appliance
DAS SAN iSCSI NASComputer System
ApplicationOS FileSystem
DatabaseSystem
RAWPartitionLVM
SCSI Device DriverSCSI Bus Adapter
Computer SystemApplication
OS FileSystem
DatabaseSystem
RAWPartitionLVM
SCSI Device DriverFC Host Bus Adapter
Computer SystemApplication
OS FileSystem
DatabaseSystem
RAWPartitionLVM
iSCSI layerTCP/IP Stack
NIC
Computer SystemApplication
OS File System
I/O Redirector
NFS / CIFSTCP/IP Stack
NIC
Computer SystemApplication
OS File System
I/O Redirector
NFS / CIFSTCP/IP Stack
NIC
Block I/OBlock I/O File I/OFile I/O
SCSI Bus AdapterSCSI Bus Adapter IP NetworkIP NetworkIP NetworkIP Network
FC AdapterFC Adapter
NAS GatewayNIC
TCP/IP Stack
SCSI Device DriverFile System + LVM
NICTCP/IP Stack
iSCSI layerI/O Bus Adapter
NAS ApplianceNIC
TCP/IP Stack
SCSI Device DriverFile System + LVM
SANSANBlock I/OBlock I/O
4. File-System Virtualization
4.11Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Types of File Systems
Local Disk File System
Distributed File System
SAN File System (SAN File Sharing System)
Cluster File System
4. File-System Virtualization
4.12Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Physical and Virtual File Systems
Distributed file servers create virtual file systemPhysical files & file systems not visible to clientsClients access and recognize the virtual/logical file system
What is a (physical/local) file system?Common functionality available from almost all operating systemsPhysical (on-disk) file structure is both file system- and OS-specificPhysical file system assembles disk blocks into files, controls accessUsual structure: hierarchical tree of directories and files
4. File-System Virtualization
4.13Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Local Disk File System
Serial File System sharing on same or dissimilar OS by offering one Volume Manager & one File System available on multiple OSBenefits
Simplification of the sequential computingData Migration easy and fast
If OS is differentAvoid VTOC issue with one volume ManagerNeed to „convert“ meta-data if Byte ordering between processor is not the same (BE: Sparc, PA-Risc, Power – LE: Intel)
ExamplesHomogeneous OS
Most file systems (and volume managers) available on the market with import/deport volumes/vgs and mount/umount file system command sequences (UFS, HFS, XFS, JFS, VxFS, ext2/3… SDS/SVM, LVM, XVM, VxVM…)
Heterogeneous OSVERITAS Storage Foundation with Portable Data Container feature
4. File-System Virtualization
4.14Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Local Disk File System (2)
Example:DW Application
How?OS #0 server analyses dataOS #1 server starts batches OS #2 server loads data into the DWOS #3 server backups data
BenefitsNo data multiplication Cost effective for StorageMore effective use of serversNo time wasted in copying data between servers
- Import Disk Group- Start Volume- Mount File System
- Deport Disk Group- Stop Volume- Unmount File System
Storage NetworkStorage Network
OS #0 OS #1 OS #2 OS #3
Shared Disks
4. File-System Virtualization
4.15Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Distributed File Systems
Distributed File System – General CharacteristicsNetwork transparency, User Mobility, Fault Tolerance, Scalability, File MobilitySemantics preservation (some protocols)Replication may be integratedNo network block aggregation by default
4. File-System Virtualization
4.16Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Distributed File Systems (2)
NFS and CIFS (SMB) (NAS protocols)Asymmetric (Client/Server) architectureUses TCP/IP (UDP used by default for NFSv3, NFSv4 TCP)Some development on NFS over RDMA (Remote DMA)de facto standards todayTraditional File Server Generation
AFS, OpenAFS, DFS, Coda, InterMezzo …Asymmetric (Client/Server) architectureWorks on TCP/UDP on IP
4. File-System Virtualization
4.17Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Distributed File Systems (3)
Example:File Serving (Local/LAN)
How?1 big server with NFS/CIFS layerHundreds of clients
BenefitsConsolidate storageMore effective use of resources
Data and Control Access
Shared Disks
NFS/CIFSClients
NAS Protocols
Storage NetworkStorage Network
File Server
4. File-System Virtualization
4.18Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Distributed File System Concerns
Locking – Reads and Writes aren't atomicRead/write – could read combination of old & new dataRead/write – could mix data from both writesLocking can protect clients from these (mis-)behaviors
Cache coherency – When do changes propagate?Read (on another node) after write may return stale data (“weak coherency”)Explicit flushing or strong coherency prevents this
Data Stability – what happens when crash follows write?Stable data: write completion ensures persistence Unstable data: crash may revert to old data
4. File-System Virtualization
4.19Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Distributed File Systems: NFS vs. CIFS
NFS used primarily with Unix, CIFS with Windows
Client: write-backServer: write-back permitted, written data may not be stable
Client: write-backServer: write-through, written data is always stable
Data stability
Usually transparent, modifications can help with conflicts (e.g., open copy of file that's in use).
Stronger coherence requires application changes (e.g., close and reopen file to get current data).
Application Impacts of coherency
Strong locking and invalidation for both attributes and file data.
Weak attribute expiration times plus cached data revalidation on file open.
Cache Coherency (Client)
Mandatory, locking affects all file access.
Advisory, locking only affects applications that use locking APIs.
Locking scope
CIFS (historically SMB)NFS
4. File-System Virtualization
4.20Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Wide Area File Services (1)
WANWAN
NFS/CIFSClients
Data Network-LAN
NFS/CIFSClients
NAS ProtocolsData
Network-LAN
Remote Offices
NFS/CIFSClients
NAS Protocols
Traditional Method
NAS Protocols
NAS Protocols
Data Center
Storage NetworkStorage Network
File Server
Shared Disks
NAS Protocols
4. File-System Virtualization
4.21Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Wide Area File Services (2)
WANWAN
NFS/CIFSClients
Data Network-LAN
NFS/CIFSClients
NAS ProtocolsData
Network-LAN
Remote Offices
NFS/CIFSClients
NAS Protocols
Recent Method
Data Center
Storage NetworkStorage Network
File Server
Shared Disks
NAS Protocols
NAS Extender
Slave
NAS Extender
MasterNAS Extender
Slave
- Private protocol
- Excellent in Read caching mode
- Write-through is preferable- Private protocol
- Excellent in Read caching mode
- Write-through is preferable
4. File-System Virtualization
4.22Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Global Name Space
Goal Provide a single view of the NAS File System to deliver more performance and scalabilitySimplify Management with centralized administration (if possible)
Global Namespace
4. File-System Virtualization
4.23Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems
1 node (Master) or a set of mastersUnderstand, manage and use metadata on diskUse of file system even if portions of it are inaccessibleBlock addresses distributed to nodes (clients) on request
Other nodes (clients)Connection to SAN storageAvoid overhead due to Metadata managementAccess to data directly using block addresses sent by Master(s)
Designed to support hundreds or thousands of nodesMixed role between direct data access with host based thin software and NAS access
4. File-System Virtualization
4.24Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems (2)
Flexibility of network FS at SAN speedLong-term goal for the industry development for Capacity and Performance scaling: Scaling to hundreds of PetaBytes of capacity and tens of GigaBytes/secMore recent File Server generationAlso PVFS(v2) (Parallel Virtual File System)Examples- Apple Xsan - IBM TotalStorage SAN FS, SANergy- ADIC StorNext FS - IBRIX Fusion- DataPlow SAN FS & Nasan FS - SANBOLIC Kayo Volume Sharing- EMC HighRoad - SGI CXFS- HP DirectNFS – xNFS - SUN QFS
(Cal. Soft.) – Transoft Fibrenet - TerraScale TerraGrid
4. File-System Virtualization
4.25Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems (3)
Data and Control Access
Shared Disks
Clients sw
Storage NetworkStorage Network
File Server
Example:Multimedia Application
How?1 big server with NFS/CIFS layerServer and Client SAN FS layerHundreds of clients
Benefits Increased throughputConsolidate storage, very scalableMore effective use of resources
Block list
File Request
App.Data
Network-LAN
App.
Metadata Server
NFS/CIFSServer
Data Access
4. File-System Virtualization
4.26Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems (4)
How it works?Asymmetric or Client/Server modelServer controls client access, resolves conflictsThin client software layer handles SAN device and server interaction
Operational sequenceClient requests file access from server (e.g., read, write)Server returns allowed access (e.g., read/write, read-only) and SAN device addressesClient uses addresses to access file on SAN deviceServer calls back to client to remove access when needed by other clients
4. File-System Virtualization
4.27Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems (5)
Lock MechanismProvided by the server at a central locationVarious granularity: file, record, byte…Some implementations use SMB or NFS semanticsThe server needs to be protected cause it represents a SPOF
Cache CoherencySome implementations deliver cache coherency with traditional validate/invalidate mechanism, others don´t offer cache at all
4. File-System Virtualization
4.28Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
SAN File Systems, SAN File Sharing Systems (6)
LANLANHost
Application
File system metadata
Hosts get file metadata from FS/NAS controller, then access the data directly
Hosts get file metadata from FS/NAS controller, then access the data directly
File
/ re
cord
La
yer
Blo
ck
Laye
r
Host block-aggregation
Network block-aggregation
Device block-aggregation
SNSN
Disk array
Block accesses
Hos
t with
LV
M
File metadata
FS controller can also be NAS serverFS controller can also be NAS server
4. File-System Virtualization
4.29Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Cluster File Systems (also called Shared Data Cluster)
A Cluster FS allows a FS and files to be sharedAll nodes understand Physical (on-disk) FS structure
The File System may be mounted by all the nodesSingle FS Image: same data view from all nodesExamples
HP CFS (TruCluster)HP/Cal.Soft. Monster FSIBM GPFSMACROIMPACT SANique CFSPOLYSERVE Matrix ServerREDHAT GFS (Sistina FS)SANBOLIC MelioFSSUN Global File ServiceVERITAS CFSLUSTRE (/ OSD)
4. File-System Virtualization
4.30Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Cluster File Systems (2)
SharedDisks
Storage NetworkStorage Network
First Host
Cluster File SystemCluster File System
Cluster
HeartBeat
Lock Management
Second Host
Cluster Volume ManagerCluster Volume Manager
Web ServerWeb Server
Example:Web Servers Farm
How?Shared VM/FSpossible Load Balancer in front
BenefitsIncreased throughput More effective use of serversFailure is transparentHigh SLAs
4. File-System Virtualization
4.31Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Cluster File Systems (3)
Asymmetric ImplementationMaster node mounts the file system, manages logging and locking: any node could be a master nodeOther nodes are clients for logging and locking managementAll nodes have direct read/write access to file dataSome implementations support automatic failover of master node, others need additional failover software
On master failure, have to recover system log and all locksOn any other failure, release/recover locks held by failed node
Symmetric Implementation – „peer-to-peer“All nodes mount and access file systemLock management is symmetric across data sharing domain.Logging is usually per-node, coordinated by lock management: on failure, recover that node´s log and release/recover locks it held
4. File-System Virtualization
4.32Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Cluster File Systems (4)
Lock MechanismDistributed or Global Lock Management (DLM/GLM) controlled by master node for Asymmetric model or all nodes for Symmetric model, managed at the lock and file levelSome implementations ship lock status to all nodes, other maintain a local lock repositoryGranularity varies: file, record, byte…
Cache Coherency – Single File System ImageEvery modification is seen by all nodes as soon as a modification in the data sharing domain occurs
Concurrent vs serial data accessConcurrent: multiple systems access the data simultaneouslySerial: one system at a time uses and access the data
4. File-System Virtualization
4.33Prof. Dr. Bernhard NeumairPlanung und Betrieb von IT/TK-Infrastrukturen - SS 09
Cluster File Systems (5)
LANLAN
Application Purposes:- Load spreading across
peers (scalability)- Alternate paths (high
availability, scalability)
Purposes:- Load spreading across
peers (scalability)- Alternate paths (high
availability, scalability)
File
/ re
cord
La
yer
Blo
ck
Laye
r
Host block-aggregation
Network block-aggregation
Device block-aggregationDisk array
Hos
t with
LVM
RAI
D
NAS head
NAS Server
Host Host
Cluster FSHos
t with
LVM
Shared LVM
OptionalCluster FS
Dist FSSAN FS
SNSN