6
Optimizing iSCSI Storage Network: A Direct Data Transfer Scheme Using Connection Migration Veena Tyagi Center for Development of Advanced Computing (CDAC) Mumbai, India [email protected] Chanchal Gupta Symantec Software India Pvt. Ltd. Pune, India [email protected] Abstract-This paper describes the design and performance evaluation of direct data transfer scheme between client applications and storage devices in iSCSI storage area network based distributed file systems. This is done by using “TCP- Migrate Option” for establishing iSCSI session between server and storage devices. Migration enabled TCP connection is migrated to client machine for providing direct data path between client and storage device. This connection migration is completely transparent to the applications on client, server and storage devices. We evaluate the architecture using distributed file system NFS, but approach is flexible enough to be integrated with any distributed file system. The analytical model and qualitative analysis suggest this approach improves the scalability (by removing server as a bottleneck) and performance (through shorter data paths) and is especially useful for fast delivery of streaming media and file transfer applications. I. INTRODUCTION Distributed file systems provide remote access to common file storage in a networked environment. They enable users of groups of computers to operate as though they were sharing a single large file system [1]. With today’s mature IP network and the need to support steadily growing amounts of digital content, iSCSI (Internet Small Computer Systems Interface) storage is emerging as the future solution in networked storage [3, 4]. iSCSI is an end-to- end protocol for transporting storage I/O block data over an IP network [2, 3, 4]. It encapsulates disk access requests (in the form of SCSI CDB commands) into TCP packets, and transmits the SCSI commands and block-level data over IP networks. It extends the SAN network to a remote area and enables new applications like data mirroring, remote backup and remote management [5]. It also unifies the storage and data networks, thus greatly reducing management cost [2, 3]. The protocol is used on servers (iSCSI initiators), storage devices (iSCSI targets) and protocol transfer gateway devices. The general deployment scenario of iSCSI storage network is shown in Fig.1. Classification of Applications – In today’s computing world, there are different types of applications, e.g., voice, real and non-real-time data, multimedia services, internet banking and e-commerce. Based on their data requirements, we classify these applications into following categories [6]: Class A Applications: Applications that have high data requirement but not real time. For example, web browsing, telnet sessions etc. Class B Applications: Applications that involve voluminous data transfer and also expect good response time. For example, web based file transfer & data sharing applications, streaming multimedia services etc. Class C Applications: Applications that does not involve large data transfer but expect good response time. For example, internet banking, e-commerce etc. Motivation – The applications of class A and class B involve large volumes of data transfer and the real time applications of class B and class C are performance-sensitive [6]. Figure 1. General deployment scenario for iSCSI storage area network based distributed file system It can be shown from Fig. 1 that iSCSI based storage system increases the number of network crossings and packet processing overhead in the data path between client application and block storage which degrades the performance of the networked storage. From the client’s perspective, it results in poor end-to-end service availability. To address the problem of efficient and real time transfer of large data volumes in geographically distributed applications, we have proposed the direct data transfer scheme using 978-1-4244-3805-1/08/$25.00 2008IEEE ICON 2008

[IEEE 2008 16th IEEE International Conference on Networks - New Delhi, India (2008.12.12-2008.12.14)] 2008 16th IEEE International Conference on Networks - Optimizing iSCSI storage

Embed Size (px)

Citation preview

Optimizing iSCSI Storage Network: A Direct Data Transfer Scheme Using

Connection Migration

Veena Tyagi Center for Development of Advanced Computing (CDAC)

Mumbai, India [email protected]

Chanchal Gupta Symantec Software India Pvt. Ltd.

Pune, India [email protected]

Abstract-This paper describes the design and performance evaluation of direct data transfer scheme between client applications and storage devices in iSCSI storage area network based distributed file systems. This is done by using “TCP-Migrate Option” for establishing iSCSI session between server and storage devices. Migration enabled TCP connection is migrated to client machine for providing direct data path between client and storage device. This connection migration is completely transparent to the applications on client, server and storage devices. We evaluate the architecture using distributed file system NFS, but approach is flexible enough to be integrated with any distributed file system. The analytical model and qualitative analysis suggest this approach improves the scalability (by removing server as a bottleneck) and performance (through shorter data paths) and is especially useful for fast delivery of streaming media and file transfer applications.

I. INTRODUCTION

Distributed file systems provide remote access to common file storage in a networked environment. They enable users of groups of computers to operate as though they were sharing a single large file system [1].

With today’s mature IP network and the need to support steadily growing amounts of digital content, iSCSI (Internet Small Computer Systems Interface) storage is emerging as the future solution in networked storage [3, 4]. iSCSI is an end-to-end protocol for transporting storage I/O block data over an IP network [2, 3, 4]. It encapsulates disk access requests (in the form of SCSI CDB commands) into TCP packets, and transmits the SCSI commands and block-level data over IP networks. It extends the SAN network to a remote area and enables new applications like data mirroring, remote backup and remote management [5]. It also unifies the storage and data networks, thus greatly reducing management cost [2, 3]. The protocol is used on servers (iSCSI initiators), storage devices (iSCSI targets) and protocol transfer gateway devices. The general deployment scenario of iSCSI storage network is shown in Fig.1.

Classification of Applications – In today’s computing

world, there are different types of applications, e.g., voice, real and non-real-time data, multimedia services, internet banking and e-commerce. Based on their data requirements, we classify these applications into following categories [6]:

Class A Applications: Applications that have high data requirement but not real time. For example, web browsing, telnet sessions etc. Class B Applications: Applications that involve voluminous data transfer and also expect good response time. For example, web based file transfer & data sharing applications, streaming multimedia services etc. Class C Applications: Applications that does not involve large data transfer but expect good response time. For example, internet banking, e-commerce etc.

Motivation – The applications of class A and class B

involve large volumes of data transfer and the real time applications of class B and class C are performance-sensitive [6].

Figure 1. General deployment scenario for iSCSI storage area network based distributed file system

It can be shown from Fig. 1 that iSCSI based storage system

increases the number of network crossings and packet processing overhead in the data path between client application and block storage which degrades the performance of the networked storage. From the client’s perspective, it results in poor end-to-end service availability.

To address the problem of efficient and real time transfer of large data volumes in geographically distributed applications, we have proposed the direct data transfer scheme using

978-1-4244-3805-1/08/$25.00 ⓒ2008IEEE ICON 2008

connection migration at TCP layer. The goal of this paper is to demonstrate that direct data transfer between storage and client applications not only enhances the performance (through reduced number of network crossings in data path) but also off loads the server.

The rest of this paper is organized as follows. In Section 2, we survey related work in the area of efficient data transfer in network attached storage. We describe the proposed architecture in Section 3. Section 4 details the analytical model and performance evaluation of proposed approach. Finally, Section 5 provides concluding remarks and directions for future work.

II. RELATED WORK

Efficient data transfer to client applications has long been an active research topic. Some related work has been done with the aim of offloading server through direct data transfer to client in network-attached storage. Network SCSI (NetSCSI) and Network-attached Secure Disks (NASD) are the network-attached storage architectures [8]. In these architectures storage is attached to server over LAN where SCSI command interface provides data connectivity. Both the architectures do not address iSCSI, which connects storage to server over IP network.

NetSCSI is a storage-attached architecture that makes minimal changes to the hardware and software of SCSI disks. A file manager translates file system requests from the client into SCSI commands for its disks. However, rather than returning data to the file manager to be forwarded to the client, NetSCSI disks send data directly to the client Since the network ports on the disks may be connected to a hostile and open network, integrity for the file system structure on disk requires a second port to a private, file manager-owned, network [8].

With network-attached secure disks, the goal of minimal change from the SCSI interface is relaxed. In this architecture network-attached disks are connected directly to the network. The focus is on selecting a command interface that reduces the number of client storage interactions that must be relayed through the file manager, thus offloading more of the file manager’s work. Common, data-intensive operations, such as reads and writes, go straight to the disk, while less-common operations, including namespace and access control manipulations, go to the file manager. This removes the file manager as the “middleman” in the data transfer. The client can be given tokens, or capabilities, from the file manager so that all subsequent communication between the client and storage server go directly to the disk. This architecture remedies the problem of the file server becoming a bottleneck and allows more flexibility than the NetSCSI solution [8].

III. END-TO-END ARCHITECTURE

In this section, we describe our end-to-end architecture for fast and direct delivery of data from storage device to client applications. There are three main components in the system: MIG-server module, MIG-client module and TCP connection migration mechanism for the iSCSI session. Fig. 2 shows the direct data transfer architecture for iSCSI based storage area network integrated with distributed file system NFS. MIG-server and MIG-client modules reside at the file server and client respectively and provide support for TCP connection (established between server and storage device) migration from server to client machine. In order to have minimal impact on the underlying operating system, these modules are designed as Loadable Kernel Modules. These modules are responsible for tracking the incoming and outgoing file request/reply, saving (restoring) the state of migrating connection during the migration phases and exchanging information about migrating TCP connection.

A. TCP Connection Migration Mechanism for the iSCSI

session Fig. 3 shows the TCP connection migration from server to

client. File server, on receiving a file request from the client, initiates an iSCSI session (through iSCSI initiator) with storage device (iSCSI target) using ‘TCP-Migrate Option’ [7]. ‘TCP-Migrate Option’ is used to felicitate the migration of TCP connection for the established iSCSI session to client machine.

Figure 2. End-to-end architecture for direct data transfer scheme in iSCSI based storage area network integrated with NFS

During TCP connection establishment with ‘TCP-migrate option’, token T is negotiated between server and storage device which is transferred to client during migration. The iSCSI session [4] goes through three standard phases (Initial login phase, Security authentication phase and Operational negotiating phase) of iSCSI session establishment on the server

and after that TCP connection for the iSCSI session is migrated to the client machine for direct data transfer.

Only the established TCP connection is migrated to the client, not the iSCSI session. During migration following information is transferred to client:

1) File handle 2) File inode: it includes block mapping for file on the

storage device. 3) TCP connection state: it includes tcp sequence number

space, ack number space, connection token T negotiated in connection establishment etc.

On receiving the connection migration information from server, client reinitiates the migrated TCP connection with storage device using connection token T (negotiated in TCP connection establishment and transferred to client during migration). After re-establishing the connection, full featured phase of iSCSI session takes place and file is directly transferred from storage device to client and eliminate server from the data-path.

B. MIG-CLIENT Module

This module listens at a well-defined port for TCP connection migration request from the MIG-sever module and sits below VFS. It tracks all the outgoing file requests and incoming replies. It also maintains an array of mappings from file handles to respective TCP connection (with state of iSCSI session)

Figure 3. TCP connection migration mechanism for the iSCSI session,

connection C initially established between server and storage, migrates to client

which is migrated from the server as part of TCP connection migration. On receiving a file request from client application through VFS, it searches for file mapping in the array. On success, it directly retrieves the file from the storage device using the migrated TCP connection (it does not establish a new

iSCSI session or TCP connection but renews the migrated TCP connection using connection token T).

Otherwise (if it does not find the mapping) it establishes a new connection with server and requests file, which subsequently results in establishing a new iSCSI session between iSCSI initiator and storage device and migration of established TCP connection for the resultant iSCSI session to client after login phase for direct data transfer. After successful file transfer, it notifies MIG-server module and deletes the mapping from the array.

C. MIG-SERVER Module

This module is designed to be in the kernel-space of server and sits below VFS on server machine. It tracks all the incoming file request/reply to and from the server to know the IP address and requested file of the client for which the TCP connection is to be migrated. It also maintains a mapping of file handle to the actual file inode on the server system. During migration, this mapping is sent to the client, so client knows the actual block mapping of files on the storage devices and directly sends SCSI command through iSCSI to read/write the data from the storage devices without further server intervention. On receiving a file request, it initiates a new iSCSI session and contacts the MIG-client module on well-defined port for TCP connection migration. On notification from client of successful file transfer, it closes the TCP connection, iSCSI session and does necessary cleanup. D. An Example

Fig. 4 shows sequence diagram for a sample scenario where client requests a file X from the server (iSCSI initiator) and the file is directly transferred from storage device to client using TCP connection migration mechanism for the established iSCSI session between server and storage device. The client establishes a TCP connection C1 with the server in standard fashion and requests a file X (step 1). Virtual File System on the server resolves file X on the storage device (step 2) and establishes an iSCSI session (through iSCSI initiator) and TCP connection C2 using TCP-Migrate [7] with the iSCSI storage device in step 3. After completion of iSCSI login phase (step 4), MIG-server module starts the migration of established TCP connection C2 for the iSCSI session to MIG-client module on client. In step 5, MIG-server module establishes a new TCP connection C3 with MIG-client module and transfers the state information for connection C2 to it (step 6). The state comprises tcp sequence numbers and ack numbers of C2, connection token T negotiated in step 3, file inode (it includes block mappings for the file on the storage device) and file handle for requested file X. After successful migration, MIG-client module re-establishes TCP connection with iSCSI

c) TCP connection resumption after migration

using token T

b) TCP connection migration with state and token T

Server Client

Storage

a) iSCSI session and TCP connection establishment

with TCP-migrate and token T

Figure 4. Sequence diagram for a sample scenario for direct data transfer using

TCP connection migration for the established iSCSI session

storage device in step 7 using state transferred in step 5. On this connection, file is directly transferred from storage device to client in step 8. After complete file transfer, MIG-client notifies MIG-server of file transfer completion and closes the connection C3 (step 9). MIG-server closes the iSCSI connection C2 and signals sever module of request completion (step 10). Server modules subsequently notifies client of file completion and closes connection C1 (step 11).

IV. PERFORMANCE EVALUATION

The main objective of this section is to investigate the impact of proposed architecture on different classes of applications in terms of response time and throughput. To

provide a proof of concept, we provide the analytical model for the proposed scheme and measure the performance thru quantitative analysis based on the analytical model. Fig. 5 shows the timing diagram for the analytical model.

1. TCP Connection C1, client requests a file X

VFS Step 2

A. Analytical Model 3. iSCSI session with

connection C2 is established with TCP-Migrate To develop analytical model for the performance analysis,

we consider a NFS client, in one network, sends read request for a file of size O to NFS server, in other network. The requested file to be read is stored by the NFS server in iSCSI storage disk. The network entities such as NFS server and iSCSI storage disk support TCP-migrate implementation. The network bandwidth is assumed to be same across all links. For our analysis we are assuming that the login phase between initiator and target is already over and iSCSI disk is mounted on iSCSI client (initiator) with one live migration enabled connection for iSCSI session (since this phase is there in standard operation). When the read request comes to NFS server and subsequently to iSCSI initiator via MIG-server, MIG-server initiates TCP migration process with MIG-client. The performance will be measured in terms of following parameters:

4. iSCSi login, authentication phase on connection C2

MIG-client module

MIG-Server Module

5. Establish new TCP connection C3 for migration

6. TCP connection (from the established iSCSI session) migration

7. TCP connection re-establishement using state transferred during migration

8. Direct file transfer

MIG-Server

1) Response time: This is calculated from the time when the NFS client initiates a TCP connection with server until the time at which the client receives the requested file in its entirety. For the purpose of calculating response time we are ignoring packet processing delay, as it is negligible compare to propagation delay in WAN environment. This analysis assumes that the network is not congested and there are no packet retransmissions due to packet drop/corruption.

Module

9. Notifies MIG-server of file transfer completion and closes TCP

connection C3 The other notations are: 10. Closes TCP

connection C2 a. Dynamic TCP congestion window size is W. b. MSS (Maximum segment size) is S bits. 11. Closes TCP

connection C1

Client

c. Transmission rate of link is R bps. d. Size of connection migration specific data is O’. From Fig. 5, the TCP-migrate handoff latency Th is equal to

the time to establish TCP connection between MIG-server and MIG-client (3/2 RTT’), transmission time of the migration-specific data (O’/R) and sum of all stall times due to TCP slow start [9]. Th I s given by

Server Storage

BCTh −= .

(1) ∑−

=

−−+++=1'

1'

1' )]/(2'/[/''2/3k

k

k RSRTTRSRORTT where k’ is defined as the no. of windows that delivered O’

and is given by

}/'2....222:'min{' 1'210 SOkk k ≥++++= − }/')1'2(:'min{ SOkk ≥−=

)}1/'(log':'min{ 2 +≥= SOkk

))1/'((log2 += SOceil (2)

From Fig. 5, response time Tr is equal to time to establish

TCP connection between NFS client and NFS server (3/2 RTT), TCP-migrate handoff latency (Th), time to send SYN segment for handshake between MIG-client and iSCSI target (RTT’’), transmission time of the requested file (O/R) and sum of all stall times due to TCP start . Response time Tr is given by

∑−

=

−−+++++=1

1

1 )]/(2''/[/''2/3k

k

khr RSRTTRSRORTTTRTTT

(3)

Figure 5. TCP timing diagram for the analytical model of proposed scheme

where k is defined as the no. of windows that delivered O and is given by (similar to (2))

))1/((log2 += SOceilk (4)

The inferences drawn from (1) and (3) are as follows:

For a TCP slow start, the total stall time causes a significant delay. In a WAN environment and where geographical distances are much more, the number of times TCP experiences fall from the steady state due to packet re-transmission would be more than one.

O’ in (1) is very small compare to O and Th can be further reduced by keeping TCP segment size to maximum possible value so that O’ can be delivered with minimum k’. A

NFS client initiate TCP connection

With small Th and large O (large read/write data), response time is directly dependent on O/R and stall time. The effect of stall time on overall response time would be even more as packet retransmission rate increases.

NFS server

Read req for file

In the background of points a, b and c, we conclude that response time in our proposed architecture is less compare to the case where O is transported twice on the network (first between iSCSI storage to NFS server and then from NFS server to NFS client). The gain in overall response time would be even more in a WAN environment with large geographical distance where packet re-transmissions are more.

B MIG-server initiates a new

tcp connection for migration and sends migration info

MIG-Client

}S/R 1st window

}stall time RTT’ { 2) CPU overload at server: For our analysis, it is defined

as the CPU cycles consumed by iSCSI initiator to read/write blocks over network and is calculated as the sum of all times spent on interrupt handling, packet processing and application to operating system context switching. It is measured and published that the average read/write CPU overhead to an initiator for an I/O size of 8KB is more than 40% and CPU overhead increases with the increase in I/O size [10]. In our proposed architecture data is transferred directly to the client, hence server CPU overhead due to block read/write over network is offloaded to client CPU. This provides the scalability to server and remove server as a bottleneck point.

}2S/R 2nd window

} stall time

}4S/R 3rd win

} stall time

C

O’ delivered

(State transfer)

B. Qualitative Analysis The effect of proposed architecture can be analysed in terms

of the following parameters: 1) Delay due to TCP migration: This is the total time

taken by a TCP peer to acquire a new IP address, to transport migration-specific information at the new acquired IP address, and to complete TCP-migrate signalling messages. Due to this delay (from (i)) associated with TCP-migrate, our proposed architecture is more appropriate for application with long-lived tcp connection and large data requirements such as audio/video streaming, file sharing etc. The applications which do not have high data requirement will not have much performance gain and experience the same performance.

D

RTT ‘’{

O delivered

(file transfer)

SYN + Token T Resume migrated tcp connection

with TCP slow start

File data ack

Client Application / MIG-client

Server (iSCSI initiator) / MIG-Server Storage

(iSCSI Target)

RTT {

2) Fast: By providing direct data path between client and storage, overall overhead and delay is reduced. It results in fast delivery of data to client applications.

3) Transparency: The iSCSI application running over TCP requires that if connection is broken then its mechanism to resume should be transparent to the iSCSI application. TCP-migrate fulfil this requirement.

4) Security: The proposed architecture is secure from the iSCSI session establishment point because login and authentication phase gets complete between server and storage and after that TCP connection is migrated to client. Another point about security is: An application may have different levels of security requirements in different network environment. For example transferring a file from IP storage disk to application server may have security permissions but on the other hand if the same file is transferred directly to client sitting in some other network, it may require some security mechanism. T address this problem, some security mechanism needs to be in place which supports file transfer from storage network to all networks.

5) Robust: With the proposed scheme, server is eliminated from the data path and main processing (i.e. data fetching cycles) is offloaded to the client. It removes server as a bottleneck point and chances of failover are reduced.

The above analysis shows our architecture qualifies and is

appropriate where iSCSI storage network is attached to application servers such as ftp servers, streaming media servers and where there is a requirement of continuous data transfer.

V. CONCLUSION

We have proposed the end-to end architecture for direct data transfer from storage device to client application in iSCSI storage network leveraging TCP connection-migration mechanism. Our framework is end-to-end and transparent to storage devices, client and server applications.

We presented the analytical framework of our architecture

for performance measurement in terms of response time and CPU overload, against applications such as file transfer and streaming media. We analysed our approach qualitatively against various parameters such as fast data delivery, transparency, security, and robustness. We have also presented the performance challenges that must be addressed in order to integrate iSCSI storage area network with distributed file system.

Storage area networks are becoming an integral part of enterprise storage solutions because they provide resource sharing, storage capacity scaling, and performance benefits. With the emergence of Gigabit Ethernet technology, it is now possible to construct IP storage area networks which leverage an organization’s existing IP infrastructure. Convergence of iSCSI storage network and distributed file system can fulfil the future requirements of Internet applications in terms of storage and speed.

REFERENCES [1] J. H. Howard et al. “Scale and performance in a distributed file system,”

ACM Transactions on Computer Systems 6(1), pp. 51-81, Feb. 1988. [2] J. Sataran, et al. “iSCSI Specification Internet Draft,”

http://www.ietf.org/internetdrafts/draft-ietf-ips-iscsi-20.txt, Jan. 2003. [3] Peter J. Hunter, “Introduction to IP storage, SNIA-IP Storage Forum,”

www.snseurope.com, Oct. 2004 [4] http://www.iscsistorage.com/ipstorage.htm [5] Y. Lu, D. Du, “Performance Evaluation of iSCSI-based Storage

Subsystem”, IEEE Communication Magazine, pp. 164-201, Aug. 2003. [6] S. Mohanty, and I. F. Akyildiz, “Performance Analysis of Handoff

Techniques Based on Mobile IP, TCP-Migrate, and SIP,” IEEE Transactions on Mobile Computing, vol. 6, no. 7, pp. 731-747, Jul. 2007.

[7] A. C. Sorenson and H. Balakrishnan, “An End-to-End approach to Host Mobility,” International Conference on Mobile Computing and Networking, pp. 155-166, Aug. 2000.

[8] G. A. Gibson et al. “File server scaling with network attached secure disks,” ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems, pp. 272-284, 1997.

[9] J. F. Kurose and K. W. Ross, “Computer Networking,” Pearson Education, 2005.

[10] H. M. Khosravi, A. Joglekar and R. Iyer, “Performance characterization of iSCSI processing in a server platform,” 24th IEEE International conf on Performance, Computing, and Communications, pp. 99-107, Apr. 2005.