64
1 A Label-Based File System Shang Rong Tsai Department of Information Manageme nt Chang-Jung University

A Label-Based File System

  • Upload
    daphne

  • View
    34

  • Download
    0

Embed Size (px)

DESCRIPTION

A Label-Based File System. Shang Rong Tsai Department of Information Management Chang-Jung University. Outlines. Background and Introduction Network Storage/SAN/NAS Object-Based Storage A Label-Based File System Epilogue. The Epoch of Data Explosion. - PowerPoint PPT Presentation

Citation preview

Page 1: A Label-Based File System

1

A Label-Based File System

Shang Rong Tsai

Department of Information Management

Chang-Jung University

Page 2: A Label-Based File System

2

Outlines

• Background and Introduction

• Network Storage/SAN/NAS

• Object-Based Storage

• A Label-Based File System

• Epilogue

Page 3: A Label-Based File System

3

The Epoch of Data Explosion

• Electronic data are continuously growing• Data and storage managements are becoming more and more

important– Continuous service and data availability– Data backup and restore– Storage and data sharing– Efficiency in data and storage management

• Expecting using less man power• Flexibility in system configuration• Efficiently expanding storage capacity

Page 4: A Label-Based File System

4

Personal Data

• Capacity of disks is growing

• Huge number of files on a single disk– Documents got from Internet– Photos by Digital Cameras

• Sometimes we have problems to find a file

• More ways to locate files are needed

• Giving labels to files for locating

Page 5: A Label-Based File System

5

Network Storage Systems

• Storages are moving to networks for sharing and efficient management

• Such demands push the emerge of SAN (Storage Area Network)• NAS (Network Attached Storage) has been used for longer time• Object-Based Storage System

– Both SAN and NAS have their own merits and drawbacks.– SAN supports data access at block level, not good for applications in data

sharing– NAS imposes bottleneck on servers and has bad scalability– Object-Based approach potentially eliminates their drawbacks

• Network storage technologies offer a new platform for networking and storage people to play a new game.

Page 6: A Label-Based File System

6

Conceptual Model of SAN

clientclient client

client

DiskDisk Disk

tape

Linux Windows SunOS

SAN to connect hosts and storages

Network to connect clients and hosts

Hosts

Clients

Storages

Page 7: A Label-Based File System

7

What is a SAN

• SAN is a high speed network (traditionally Fiber Channel) to connect storages to servers (hosts). The network is basically used as a replacement of storage bus in traditional shared bus storage systems, thus enhances the possible sharing scale and possible fail over, etc.

• Access to SAN storage is at block level

Page 8: A Label-Based File System

8

Why SANs (1/2)

• From the communication aspect, the SAN can bypass the possible communication bottleneck. It enables communication between– Server-to-server– Server-to-storage (typical model)– Storage-to-storage (e.g. for backup without

servers’ intervention)

Page 9: A Label-Based File System

9

Why SANs (2/2)

• Improvements to application availability: Storage is accessible through multiple data paths for better reliability, availability, and serviceability.

• Higher application performance: Storage processing is off-loaded from servers

• Virtualized storage: Storages on SAN can be flexibly configured as a logical volume of any size, thus easily sharing storage space at configuration

• Data backup to remote sites: enabled for disaster protection

• Simplified centralized management: Single image of storage systems simplifies management.

Page 10: A Label-Based File System

10

Storage Virtualization (1/2)

• SNIA defines storage virtualization as: “The act of integrating one or more (back end)

services or functions with additional (front end) functionality for the purpose of providing useful abstractions. Typically, virtualization hides some of the back-end complexity, or adds or integrates new functionality with existing back end services. Examples of virtualization are the aggregation of multiple instances of a service into one virtualized service, or to add security to an otherwise insecure service. Virtualization can be nested or applied to multiple layers of a system.”

Page 11: A Label-Based File System

11

Storage Virtualization (2/2)

• To be more practical, storage virtualization is the aggregation of physical storage from multiple network storage devices into a single logical storage device that is managed and used by a central host.

• Logical Volume Manager (LVM) is basically the concept of storage virtualization which has been used for many years.

• By storage virtualization we can easily and flexibly configure the size of a logical disk.

• For some applications or sites, the amount of storage required grows at unprecedented rates. Try to think about if a disk partition is becoming full.

Page 12: A Label-Based File System

12

SAN connectivity

• Traditionally, SAN used Fibre Channel technology to implement the storage networks

• Fibre Channel SANs support high bandwidth storage traffic at 200 MB/s and enhancements to 10 Gb/s in the near future. This will be mostly used for inter switch links (ISL) between switches.

• iSCSI SAN (SCSI over TCP/IP) is a relatively new approach for storage networks.

• Fibre Channel over IP (FCIP) • FCIP can be used as WAN bridging transport for b

oth FC and iSCSI based campus SANs

Page 13: A Label-Based File System

13

iSCSI

• iSCSI stands for Internet Small Computer System Interconnect

• iSCSI is a protocol for encapsulating SCSI commands on a TCP/IP network

• The iSCSI protocol enables universal access to storage devices and Storage Area Networks over TCP/IP network

Page 14: A Label-Based File System

14

SCSI Protocol Layers

• SCSI command layer– Generic commands (for all devices)– Device specific commands

• Transport Layer

• Physical Layer (Connectivity Layer)

Page 15: A Label-Based File System

15

SCSI application (e.g. File systems)

SCSI block commands SCSI stream commandsSCSI commands for other types of devices

SCSI Generic (Primary) commands

SCSIcommandslayer

SCSI transportlayer

Physical layer

iSCSI over TCP/IP

TCP

IP

Layer 2 (Ethernet)

Parallel SCSI SCSI over Fiber Channel

Fiber ChannelParallel SCSI Bus

SCSI Protocol Layer

Page 16: A Label-Based File System

16

iSCSI PDU (from SNIA)

Page 17: A Label-Based File System

17

Overview of iSCSI

• iSCSI provides initiators and targets with unique names and a discovery method.

• The iSCSI protocol establishes communication sessions between initiators and targets, and provides methods for them to authenticate one another.

• An iSCSI session may contain one or more TCP connections and provides recovery in case of connection failures.

• SCSI CDBs (Command Descriptor Block) are passed from the SCSI command layer to the iSCSI transport layer. The iSCSI transport layer encapsulates the SCSI CDB into an iSCSI Protocol Data Unit (PDU) and forwards it to the Transmission Control Protocol (TCP) layer.

• iSCSI provides the SCSI command layer with a reliable transport.

Page 18: A Label-Based File System

18

What is NAS

• Network Attached Storage (NAS) is basically a LAN attached file server that provides shared file access using a file sharing protocol such as Network File System (NFS) or Common Internet File System (CIFS)

• Access to NAS is at file level.

Page 19: A Label-Based File System

19

Why NAS

• NAS technology has been used for decades to share files, thus to save storage space and to keep data consistency. (in contrast to file copy, like FTP)

• Data sharing in NAS is at file level, which matches the semantic for applications. In contrast, sharing in SAN is at block level which is very difficult (if not impossible) for applications to R/W data sharing. (Applications recognize files, not blocks)

Page 20: A Label-Based File System

20

What is an Object-Based Storage

• Storage devices that operate at object-level• Traditional storage devices (Such as DAS,

SAN) operate at block-level• Objects are typically files which match the

semantic level that applications manipulate data.

• In traditional systems, files are mapped to blocks before they are stored in storages.

Page 21: A Label-Based File System

21

Why Object-Based Storage (1/2)

• Drawbacks for NAS– Most of the processing on file access are on the

file server => Poor scalability– Difficult to distribute users’ files to multiple

servers while preserve a single global file namespace.

• We may delegate the management of the subset of the whole file namespace to a file server, however, the load distributed to each server may be very uneven.

Page 22: A Label-Based File System

22

Why Object-Based Storage (2/2)

• Drawbacks of SAN– SANs operate at block-level, so sharing data bet

ween applications may still need upper layer software.

• file read/write sharing• Record/file locking

• Object-Based Storage can potentially eliminate both the drawbacks of NAS and SAN

Page 23: A Label-Based File System

23

The Value of Objects (from SNIA)

• Better security via capabilities– Each object can have its own security domain– All I/O is authorized by the device

• Easier to share data– Files and records can be stored as objects– Low-level metadata managed by device

• Opportunities for intelligence– Attribute-based learning for resource allocation

• Better caching, pre-fetching and staging of data– Self-configuring storage w/ continuous reorganization

• Layout objects to best serve client requests

Page 24: A Label-Based File System

24

Two Basic Components in File Service

• Directory Service– Providing the global file name space visible to users

and applications

– Mapping file pathname to unique file id

– May need the flat file system to access directories

• Flat File System– Given the file id, returning the file contents

– Storing file attributes

– Managing file allocation on disks

Page 25: A Label-Based File System

25

Typical Handling in File Access

• File pathname lookup : open /a/b/c– Symbolic file name => file id (inode in Unix)– For each file path component, get the file id, get

the file contents until getting file id of the target

• Get the attributes (including the location of the file on disks) of the file

• Ready for W/R

Page 26: A Label-Based File System

26

How to Restructure the file system to fit the Object-Based Storage Architecture ?

• We may (as Lustre did) partition the functions into– MDS (Meta Data Server)– OSD (Object-Storage Device)

• What is the total system architecture?• What are the data and management functions delegated t

o MDS?• What are the data and management functions delegated t

o OSDs? • Important observation: Typically, among the file process

ing in systems, 90% processing is by OSDs, 10% by MDS => To balance the loads

Page 27: A Label-Based File System

27

Traditional structure vs. Object-Based structure

Page 28: A Label-Based File System

28

Lustre• A system developed at CMU, a very early working syste

m demonstrate the concept of object-based storage systems.

• Modifying the Linux kernel• A few number of MDSs• A large number of OSDs• Majorly targeting at the applications in large scale comp

uting.– Large number of users– File service for high performance computing

• Good Scalability

Page 29: A Label-Based File System

29

OST 1

OST 2

OST 7

OST 3

OST 6

OST 5

OST 4

GigE

QSW Net

Lustre Clients(1,000 Lustre Lite)

Up to 10,000’s

MDS 1(active)

MDS 2(failover)

Lustre Object Storage Targets (OST)

Linux OST

Servers with disk arrays

3rd party OST Appliances

SAN

Lustre Architecture (from CFS Inc.)

Page 30: A Label-Based File System

30

Lustre Components and Functions (1/2)

• Basic Components– Client Filesystems

• Interfacing to local file system (VFS) on clients

• As MDS client

• As OST client

– MetaData Servers

• All the meta-data operations - creating new directories, files, symbolic links, or acquiring and updating inodes, are handled by the MDS.

– Object Storage Target

• All the file I/O related operations are directed to the OST’s.

Page 31: A Label-Based File System

31

Lustre Components and Functions (2/2)

• The role of the client filesystem is to provide a directory tree, subdivided into filesets, which provides cluster-wide Unix file sharing semantics.

• The client filesystem interacts with the meta-data servers for meta-data handling, i.e. for the acquisition and updates of inodes and directory information.

• File I/O, including the allocation of blocks, striping, and security enforcement, is contained in the protocol between the client filesystem and the object storage targets.

• A third protocol exists between the OST and the MDS, largely for pre-allocation and recovery purposes.

Page 32: A Label-Based File System

32

SNIA T10 Specification

• A specification developed for Object-based Disk

• The SCSI command set defined to provide efficient peer-to-peer operation of input/output logical units that manage the allocation, placement, and accessing of variable-size data-storage containers, called objects.

Page 33: A Label-Based File System

33

Some Operations defined in T10

• Format OSD -defining OSD structure on device• Create Object Group -defining a set in which to create

objects• Create object -creating an object, returning the object

ID fid• Open fid• Read (fid,starting byte,length)• Write (fid,starting byte,length)• Close fid• Get attributes of an object

Page 34: A Label-Based File System

34

Possible Intelligent functions

• OSDs serve and manage data at object (most often representing files) level. This enables OSDs to smartly enforce management on a per-object basis– Automatic replication/backup

– QoS

– Caching/Prefetching

– Optimal layout

– Preprocessing/Postpressing (such as data compression/decompression, data filtering)

Page 35: A Label-Based File System

35

How to support the intelligent functions ?

• Extensible file attributes would be a good way• Users can define their own file attributes. A special ‘co

de’ attribute can be assigned to a particular file. Setting the code attribute to a file is in effect installing the code to the OSD to execute the intelligent functions within the OSD.

• How to set the code attribute in practice ?– Consideration for platform (OSD) dependence – Sharing the code on OSD

• How to command the OSDs to execute the intelligent functions automatically

Page 36: A Label-Based File System

36

Label-Based File System(LBFS)

• A project cooperated with Industrial Technology Research Institute

• A file system using object-based structure• Features

– Capability to give labels to files for easy locating

– Flexibility in restructuring the namespace– object-based structure

Page 37: A Label-Based File System

37

Why LBFS ? • Number of files are growing on a single disk• Traditional directory-based file systems can not

satisfy the requirements in locating files– Sometimes we forget in which directories the files reside– Content-based file search takes times– Typical file attributes are not enough

• The namespace of the whole file system can not be reorganized flexibly– Can each user have his own file namespace given a

single set of files ?

Page 38: A Label-Based File System

38

Overview of LBFS (1/2)

OSD

MDSClient

Network

CategoryA CategoryB

Namespace and meta data service

Object access

service

CategoryA

Page 39: A Label-Based File System

39

Overview of LBFS (2/2)

• MDS providing meta data and namespace services• MDS providing a browsing structure called Category• OSD providing object storage• System abstraction

– General Objects (GO)

– Attributes of file objects

– Labels

– Collections

– Categories

Page 40: A Label-Based File System

40

General Object and Label (1/2)

• GO is the basic unit to store a file object• We plan a higher level object called (XO)

XML objects for encapsulating an XML information unit

• Label is basically a keyword attached to a GO by users

• Searching a file by labels is much faster than by content matching

Page 41: A Label-Based File System

41

GO attributes

data

General attributes: Object Name、 Object Size、 Time Stamp …

Indexing attributes: Label1、 Label2 …

01.jpg

File attributes + Labels

File contents

General Object and Label (2/2)

Page 42: A Label-Based File System

42

GO Metadataid Unique ID of GO

name Name of GO

size Size of GO

uid Owner ID

gid Group ID

mode Access mode of GO

atime Last access time

ctime creation time

mtime Last modified time

contentType content-type of GO

referenceCount Reference count of GO

objects GO data objects list

labels list of labels attached to GO

typical file metadata

GO extended metadata

Page 43: A Label-Based File System

43

Collection (1/3)

• Collection is logically an objects container• A collection logically contains a set of symbolic

links, each of which links to a file object • Each collection has an expression defined for it to

specify the set of objects contained in the collection

Collection

i-expression : Picture Picture

Picture

Doc

Picture

Picture

Doc

Music

Page 44: A Label-Based File System

44

Collection (2/3)

• When users create an object and give it a label, the object will be “included” in the corresponding collection automatically

Collection A

i-expression : Picture

Collection B

i-expression : Music

User

Music

Picture

Picture

Music

Creating objects and labeling

Page 45: A Label-Based File System

45

Collection (3/3)

• In traditional file systems, a directory and 和 files under it are bound in a fixed way

• In an LBFS, a collection and file objects in it are bound in a flexible way, by matching the expression assigned to the collection and the labels attached to the objects

• The expression , called i-expression– AND label-name– OR label-name– NOT label-name

Page 46: A Label-Based File System

46

Category (1/3)

• An Category is logically a namespace of file objects, it is organized as a tree structure, where leaf nodes are objects and non-leaf nodes are collections

Category GOs

Page 47: A Label-Based File System

47

Category ???(2/3)

Music Category

GOs

音樂

古典 流行 精選

中國 西洋

i-expression :Null

i-expression :Null

i-expression: 音樂 +中國 +古典

i-expression : 音樂 +流行

i-expression : 音樂 +精選i-expression: 音樂 +西洋 +古典

音樂

中國古典精選

音樂

西洋古典精選

音樂

中國古典

音樂

西洋

音樂

西洋

音樂

流行

音樂

流行精選

照片

風景

照片

人物

林志玲

照片

風景

阿里山

文件

論文OSD

文件

投影片

OSD

Page 48: A Label-Based File System

48

Category (3/3)

• Multiple Categories can be organized to bind a single set of objects

George

GeorgeGeorge

Photo

Photo

Photo

PhotoPhoto

Photo

Photo Photo

Photo

Mary

Mary

Family

Family Family

Family

Family

Mary PhotoMary

George

Family

Mary

Category

CategoryCategory

Page 49: A Label-Based File System

49

Overview of the prototype

MDS

VirtualPartitionManager

OSDManager

GOMetadataService

Metadata Storage Pool

Networking

SecurityManager

• Networking :與 Client 使用 Metadata Service Protocol ,與 OSD 使用 iSCSI Protocol

• MDS :提供 MDS 對外的服務• VP Manager :負責管理各個 VP ,包含

根據 Storage Policy 決定物件的配置• OSD Manager :負責管理各 OSD• Security Manager :負責 T10 OSD

Security Model 中所提供的功能,包含credential 的發放

• GO Metadata Service :負責管理 GO Metadata • Metadata Storage Pool :真正儲存 metadata 的地方

Page 50: A Label-Based File System

50

Metadata Storage Pool

• 測試用的雛型系統使用資料庫儲存 Category 、Collection 、 Label 等相關資訊。

• Metadata Storage Pool 介面明確 (metadata storage API) ,方便未來更換不同的儲存媒體。

MDS

VirtualPartitionManager

OSDManager

GOMetadataService

Metadata Storage Pool

Networking

SecurityManager

Page 51: A Label-Based File System

51

Metadata Service Protocol

• Metadata Service Protocol is designed for exchanging messages between Clients and MDS

• Messages are XML-based

Element Name Description Attribute Name Description

<Function> MDS 執行的 function name Function 名稱

<ArgList> Function 參數

<Return> MDS 回傳內容 name Function 名稱

code 錯誤代碼

Page 52: A Label-Based File System

52

Message Example• Client request

<?xml version="1.0" encoding="utf-8" ?>

<Function name="collection_list">

<ArgList>

<catId>Category_4</catId>

</ArgList>

</Function>

• MDS response<?xml version="1.0" encoding="utf-8" ?>

<Return function="collection_list" code="">

<![CDATA[Category_4 Collection_35 Video {} Category_4 Collection_36 Music {}]]>

</Return>

Page 53: A Label-Based File System

53

OSD Implementation

• 使用去年學研計劃研發之 OSD– 以 Intel iSCSI/OSD 為藍本修改而來– 使用 T10 命令操作物件– 運轉在 Linux 系統上

Page 54: A Label-Based File System

54

Client Implement

• Networking :使用 Metadata Service Protocol ,與 iSCSI Protocol 和 MDS 及 OSD 通信。

• OSD Client : 與 OSD 的客戶端軟體。• Cache : 處理 GO 、 Category 、 Collection 、 Label 的快取管理。• MDS Client : MDS 的客戶端軟體。• Security : 使用者帳號、登入等安全控管。• LBOFS : 系統提供給開發者銜接 LBFS

的函數庫。• Label Based Object FS Manager : 提供給使用者的 LBFS 操作介面。 Cache

OSDclient MDS

client

Networking

LBOFS

Security

Label Based ObjectFS Manager

LBOFS API

Page 55: A Label-Based File System

55

LBOFS API – Account & Label

Function Name Description

lbofs_login (userName,passwd) 登入 Virtual Partition

lbofs_logout 登出

Lbofs_label_add (vpId,lblName) 新增 Labellbofs_label_del (vpId,lblId) 刪除 Labellbofs_label_list (vpId) 列出使用者在 VP 中的 Label

Page 56: A Label-Based File System

56

LBOFS API – Category

lbofs_category_add (vpId,catName) 建立 Category

lbofs_category_del (vpId,catId) 刪除 Category

lbofs_category_list (vpid) 列出 VP 中所有 Category

lbofs_category_get (vpId,catId,buf) 取得 Category metadata

lbofs_category_rename (vpId,catId,catName) 更改 Category 名稱lbofs_category_sharable (vpId,catId,share) 設定 Category 共享lbofs_category_query (vpId,contition) 查詢 VP 中的 Category

Page 57: A Label-Based File System

57

LBOFS API – Collection

lbofs_collection_add(vpId,catId,colParentId,coName) 建立 collection

lbofs_collection_del (vpId,catId,colId) 刪除 collection

lbofs_collection_list (vpId,catId)列出 Category 中所有Collection

lbofs_collection_list_children (vpId,catId,parentId)列出 Collection 下一層的 Collection

與 GO

lbofs_collection_get (vpId,catId,colId) 取得 Collection metadata

lbofs_collection_rename (vpId,catId,colId,colName) 更改 Collection 名稱

lbofs_collection_set_parent (vpId,catId,colId,parentId) 設定 Collection 之 Parent Id

lbofs_collection_set_expr (vpId,catId,colId,expr) 設定 Collection 之條件式lbofs_collection_query (vpId,catId,condition) 查詢 Category 中的 Collection

Page 58: A Label-Based File System

58

LBOFS API – GOlbofs_go_add (vpId,goName) 建立 GO

lbofs_go_del (vpId,goId) 刪除 GO

lbofs_go_get (vpId,goId,buf) 取得 GO metadata

lbofs_go_rename (vpId,goId,newGoName) 更改 GO 名稱lbofs_go_label_add (vpId,goId,lblId,lblName) 增加 GO 上的 Label

lbofs_go_label_del (vpId,goId,lblId) 刪除 GO 上的 Label

lbofs_go_chmod (vpId,goId,mode) 更改 GO 上的存取權限lbofs_go_chgrp (vpId,goId,grpId) 更改 GO 上的使用者群組lbofs_go_set_time (vpId,goId,atime,ctime,mtime) 設定 GO 上的時間戳記lbofs_go_query (vpId,condition) 列出符合條件的 GO

lbofs_go_read_data (vpId,goId) 讀取 GO 資料lbofs_go_write_data (vpId,goId) 寫入 GO 資料lbofs_go_set_attribute (vpId,goId,page,attribute,value) 設定 GO attribute

lbofs_go_flush (value) 強制將 OSD 上的資料寫入lbofs_go_lock (vpId,goId) 對 GO Lock

lbofs_go_unlock (vpId,goId) 解除 GO Lock

Page 59: A Label-Based File System

59

Label Based Object FS Manager (1/4)

• Exporting Windows files to LBOFS

• Labeling the exported files

Local FS LBOFS

Export to LBOFSB

B

Page 60: A Label-Based File System

60

Label Based Object FS Manager (2/4)

Page 61: A Label-Based File System

61

Label Based Object FS Manager (3/4)

Page 62: A Label-Based File System

62

Label Based Object FS Manager (4/4)

Page 63: A Label-Based File System

63

Epilogue (1/2)

• Network Storage Technologies get more attention by IT industry for data sharing and effective data management for huge and dynamically growing amount of data.

• SAN and NAS have their roles to play, currently the major approaches for network storages.

• SAN suffers the difficulty in direct data sharing.• NAS suffers the difficulty in system scalability• Object-Based Storage emerges as a new approach to solve

both problems.• Object-Based Storage offers new opportunities for intelligent

storage devices which potentially derive many research topics

Page 64: A Label-Based File System

64

Epilogue (2/2)

• LBFS uses labels to assist locating files quickly• LBFS uses collections to flexibly bind file objects to file

containers• LBFS provides category abstraction for various browsing

structures to fit individual need • We plan to modify the current prototype to serve as a

system for storing long-term useful documents with the following capabilities:– Automatic file backup– Label-based file search