16
Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries. ARCHIVE SOLUTIONS WITH DELL EMC ISILON SCALE-OUT NAS Abstract This white paper outlines the principles and concepts involved in implementing an enterprise-class archive repository using Dell EMC Isilon scale-out storage. It also describes key Isilon technology and features used to implement an efficient and secure archive solution. December 2017 WHITE PAPER

Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

Embed Size (px)

Citation preview

Page 1: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

1 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

ARCHIVE SOLUTIONS WITH DELL EMC ISILON SCALE-OUT NAS

Abstract This white paper outlines the principles and concepts involved in implementing an enterprise-class archive repository using Dell EMC Isilon scale-out storage. It also describes key Isilon technology and features used to implement an efficient and secure archive solution. December 2017

WHITE PAPER

Page 2: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

2 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

TABLE OF CONTENTS

1.0 Executive summary ........................................................................................................................ 3 1.1 About this White paper ................................................................................................................................................. 3 1.2 Assumptions ................................................................................................................................................................. 3 1.3 Industry trends ............................................................................................................................................................. 3

2.0 Archive vs. Backup ........................................................................................................................ 4 2.1 Archiving objectives ..................................................................................................................................................... 4 2.2 Archive technologies .................................................................................................................................................... 5

3.0 Dell EMC Isilon Archive Solutions Architecture .............................................................................. 5 3.1 Hardware ...................................................................................................................................................................... 5 3.2 Software ....................................................................................................................................................................... 6 3.3 Archiving with Automated Storage Tiering ................................................................................................................... 7 3.4 Archive server load balancing with Isilon SmartConnect ............................................................................................. 9 3.5 High availability, data protection and security .............................................................................................................. 9

4.0 Archive sizing ............................................................................................................................... 10 4.1 What to understand about sizing an archive .............................................................................................................. 10 4.2 Sizing examples ......................................................................................................................................................... 11 4.3 Scaling the estimate for future growth and planning .................................................................................................. 13 4.4 Other sizing considerations ........................................................................................................................................ 14

4.4.1 Information on the maximum number of connections ............................................................................................................ 14 4.4.2 Maximum number of total files in the file system ................................................................................................................... 14 4.4.3 Isilon SyncIQ for archive data replication............................................................................................................................... 14

5.0 Conclusion ................................................................................................................................... 14

TAKE THE NEXT STEP .................................................................................................................... 15

6.0 Appendix ...................................................................................................................................... 16 6.1 Archive application file system ACL requirements ..................................................................................................... 16 6.2 References ................................................................................................................................................................. 16

Page 3: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

3 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

1.0 Executive summary As data continues to grow, simply adding more capacity to primary storage no longer makes sense. It’s expensive, inefficient, and doesn’t meet the growing need to correctly classify, manage, and protect data in a way that allows quick, efficient access. As a result, archive systems are an increasingly important business requirement in modern enterprise environments.

Archiving applications can efficiently manage growing data stores to contain storage costs and satisfy both compliance and e-discovery requirements. Dell EMC Isilon scale-out network-attached storage (NAS) provides a disk-based archive that scales from terabytes (TB) to petabytes (PB) of capacity. With Isilon, archival information benefits from massive scalability and the robustness of the Dell EMC Isilon OneFS operating system, with proven capabilities for protecting data and optimizing the flow of information within an organization.

This white paper outlines the principles and concepts for deploying an Isilon storage cluster as an enterprise-class archive repository. It includes architectural explanations of the technologies and features associated with providing NAS-based file services, as well as technical recommendations for sizing Isilon storage clusters based on appropriate cluster limits for archive applications.

1.1 About this White paper

This guide provides technical considerations when planning archive systems based on Isilon storage in an enterprise environment. It assumes the use of third-party archive software.

1.2 Assumptions

This guide assumes the reader has an understanding and working knowledge of the following:

• NAS storage protocols

• Dell EMC Isilon scale-out storage architecture and the OneFS operating system

• File-system management concepts and practices

While this guide is intended to provide a consolidated reference point for systems administrators and managers looking to deploy archive software on an Isilon storage cluster, it is not intended to be the authoritative source of information on the technologies and features used to provide and support a file-services platform.

1.3 Industry trends

Data archiving is the process of identifying and moving inactive data out of current production (primary storage) systems and into long-term archival storage systems. Implementing and maintaining an efficient and secure data archive is increasingly important for organizations across a wide range of industries and public sector agencies.

The rapid growth of unstructured data that most enterprises are experiencing is straining primary storage resources and pressuring IT budgets. Moving inactive data out of primary storage optimizes the performance of primary storage resources while archival systems store information much more cost-effectively while still maintaining the data online, in contrast with the cost of retrieval with tape media.

Many enterprises are subject to varying regulatory requirements that mandate long-term data retention. Some organizations are also looking to preserve and protect content for historical purposes. Archive retention periods of 10 to15 years are not uncommon for many businesses, while some industries are mandated to retain specific data upwards of 100 years.

In certain cases, archive data is purposely not deduplicated, compressed, or changed in any manner to effectively meet stringent compliance requirements or to avoid potential litigation issues. Archiving can help e-discovery solutions and is an important way to comply with regulatory and corporate governance requirements.

Page 4: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

4 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

2.0 Archive vs. Backup The term “data archiving” is sometimes confused with or used interchangeably with “data backup.” This is often the result of assuming that a backup can be a substitute for an archive if needed. In reality, these two data retention strategies have distinct data protection objectives.

Backups, whether to disk or tape, typically have relatively short lifecycles—measured in days, weeks, or months between full copies. They are primarily used to restore data that may have been lost, corrupted, or destroyed. A data backup must usually be reconstituted from a proprietary backup format to other media at a different location before it can be used.

Unlike a backup that is used to restore recently lost or deleted data, archiving is a systematic approach to providing structure to unstructured data, usually laying it out in a predictable directory format, based on repeatable application algorithms. It enables the storing, management, retrieval, and eventual deletion of data as appropriate throughout its entire lifecycle. Table 1 summarizes the differences between archive and backup.

Archive Backup

A primary copy of information A secondary copy of information

Used for information retrieval Used for operational recovery

Improves operational efficiency by removing fixed content and duplicate data from the operational environment

Improves availability by enabling applications to be restored to specific points in time

Typically long term (years, decades, or forever) Typically short term (weeks or months)

Data typically maintained for analysis or compliance as a managed repository

Data typically overwritten on a periodic basis (daily, weekly, monthly, etc.) and stored in an altered state which needs to be restored

Useful for compliance—enforces retention policies by typically storing data in its native form

Does not meet the needs of regulatory compliance—though some are forced to use it this way

Table 1: Archive and Backup Comparison

2.1 Archiving objectives

Many enterprises are seeing 50% growth in data each year, but little or no growth in storage hardware budgets. Not all data is of equal value and its value can change. Many organizations find that their managed data is too performance-sensitive to move to tape storage, but not time-critical enough to justify the cost of keeping it on high-performance storage. As an organization’s data footprint grows, managed data becomes more heterogeneous, with different data subsets having divergent performance and protection requirements.

Data archiving objectives will vary by organization but will typically include one or more of the following:

• Reclaim expensive existing primary storage

• Reduce on-going primary storage acquisition costs

• Move static data out of the recurring backup process

• Secure and protect data for long-term retention

• Satisfy applicable regulatory and governance requirements

• Provide reliable availability of data when needed

Data archive solutions can be used to manage growing data stores efficiently, optimize the use of primary storage resources, protect data for long-term retention and, greatly help to reduce overall storage costs.

Page 5: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

5 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

2.2 Archive technologies

An effective archiving solution is typically comprised of a mix of software that will likely include:

• Archiving software that automates the movement of data from primary storage to archival storage based on policies established in the data classification and rationalization process. Data classification and rationalization (policy tuning) are functions of the organization’s data mining requirements and legal compliance needs. Archive software can delete files at the end of their retention period. It may also include features such as data compression or single-instance storage to maximize disk utilization. Archiving application software generally utilizes NFS or SMB protocols to move content from primary storage to an archive repository.

• E-discovery software, using archiving software as a base, employs advanced search features that enable users and administrators to quickly search all files, emails, texts, and other data related to a specific topic, for use in data mining services or in response to legal inquiries. Some applications combine e-discovery and archive into purpose-built platforms such as an email archive solution or document and records management solutions.

Data retention policies can be established and enforced through archiving and e-discovery software by ensuring that all files of significance are flagged appropriately to prevent them from being altered or deleted.

3.0 Dell EMC Isilon Archive Solutions Architecture Dell EMC Isilon archive solutions combine Isilon scale-out NAS storage platforms with archive software from Dell EMC and/or third-party providers.

Isilon Archive Solutions Architecture

Each element of the Isilon archive solution is described in the following sections.

3.1 Hardware

Dell EMC Isilon offers 3 tiers of hardware storage platforms – Isilon All-Flash, Isilon Hybrid and Isilon Archive – that can be combined to deliver the right blend of performance and capacity for a wide range of workloads, including file shares, home directories, archiving, and Big Data analytics.

For primary storage workloads and applications, Isilon offers:

Page 6: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

6 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

• Dell EMC Isilon F800 All-Flash: The Isilon F800 All-Flash scale-out NAS platform delivers extreme performance and massive scalability for the most demanding file applications and workloads. The Isilon F800 is ideal for I/O-intensive operations and high performance computing (HPC) applications required for a wide-range of industries including Media and Entertainment, Life Sciences, Electronic Design & Automation (EDA), and Financial Services.

• Dell EMC Isilon Hybrid Storage: Available in 3 models – Isilon H400, H500 and H600 – Isilon Hybrid storage provides a highly versatile NAS storage platform that strikes a balance between performance, capacity and economy to support a wide range of enterprise applications and workloads including file shares, home directories, and data analytics.

For nearline storage, active archiving, and high density, deep archiving, Isilon Archive storage platforms include:

• Dell EMC Isilon A200: Designed for highly efficient, large-capacity storage and is ideal for nearline storage and active archiving workloads. The Isilon A200 features 4 nodes and 60 SATA drives in a single 4U chassis and is the perfect choice for low cost active archiving where data retrieval, while infrequent, is expected.

• Dell EMC Isilon A2000: High density storage platform that is an ideal choice for deep archive and archive uses cases with a capacity requirement of over 2.5 PB. The Isilon A2000 houses 4 nodes and 80 SATA drives in a single 4U chassis and is ideal for inexpensive long-term archiving.

Table 2 provides a summary of design elements and considerations to determine the relative suitability of Isilon A200 and Isilon A2000 platforms

Design Element Isilon A200 more suitable Isilon A2000 more suitable

Type of archive Active archive - data at rest but accessed periodically

High density, deep archive – long-term data retention with minimal access

Archive size Typically < 2.5 PB Typically > 2.5 PB

Rack considerations If normal depth racks preferred over deep racks If deep racks preferred over normal racks

Weight consideration When weight of the nodes is a concern When weight of nodes is not a concern

Power consideration Low line or high line power requirements Only high line power is acceptable

Table 2: Isilon A200 and Isilon A2000 Comparison

Isilon A200 and A2000 Archive platforms can be combined into a single cluster that includes Isilon primary storage platforms (Isilon F800 All-Flash and Isilon Hybrid storage platforms). In this way, primary storage workloads and archive workloads can be supported in a single Isilon cluster. This provides an opportunity for implementation of a policy-based, automated storage tiering solution that automatically moves data to the appropriate storage tier and helps to optimize storage resources and lower costs.

3.2 Software All nodes in an Isilon cluster are powered by the Dell EMC Isilon OneFS operating system. OneFS combines the three layers of traditional storage architectures—file system, volume manager, and data protection—into one unified software layer, creating a single intelligent file system that spans all nodes within the cluster. This allows the Isilon cluster to facilitate the archive process automatically through policy-based file placement on specified storage tiers. With its modular, single file system architecture, OneFS enables Isilon storage systems to provide a massively scalable platform that is highly efficient and simple to manage.

Isilon storage solutions can scale from as small as 18 TB to over 68 PB in a single file system. To increase capacity, new nodes can be added to an existing Isilon cluster within a minute and without disruption. With the Isilon AutoBalance feature of OneFS, there is no need to move data manually – it is done automatically and transparently to system users.

The Isilon OneFS operating system enables:

• Independent, linear scalability of performance and capacity

• A single point of management for large and rapidly growing data repositories

Page 7: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

7 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

• High reliability and high availability with state-of-the-art data protection.

The OneFS operating system manages aggregation of multiple Isilon node types into one namespace and the combination of all available resources from every node in the cluster. An internal, Ethernet-based or InfiniBand-based network functions as a virtual backplane between the nodes and enables any node in the cluster to service data for any client in the cluster (i.e., any other node), regardless of where the data actually resides. A cluster may comprise of different types of Isilon storage tiers (including Isilon All-Flash, Isilon Hybrid and Isilon Archive storage platforms), with the nodes within each tier suited to a particular workflow participating in different activities within the same, unified OneFS file system. This mixed node approach is often the preferred choice for environments in which different workloads will coexist on the same storage array. It also enables enterprises to optimize storage resources by implementing a policy-based, automated tiered storage strategy using Dell EMC Isilon SmartPools and Isilon CloudPools software.

OneFS also provides symmetric multiprocessing (SMP) capabilities that enable the system to move tasks between processors, which results in extremely efficient workload balancing. The combination of these features also results in very high overall bandwidth and performance capabilities for the cluster—an important capability when the Isilon cluster is supporting primary and archive workloads simultaneously.

Dell EMC Isilon software for used for management and data protection purposes are summarized in Table 3.

DATA PROTECTION SOFTWARE

Dell EMC ISILON SNAPSHOTIQ Protect data efficiently and reliably with secure, near-instantaneous snapshots, while incurring little to no performance overhead, and speed the recovery of critical data with near-immediate on-demand snapshot restores.

Dell EMC ISILON SYNCIQ Replicate and distribute large mission-critical data sets to multiple shared storage systems in multiple sites for disaster recovery protection. Simple failover and failback to increase data availability.

Dell EMC ISILON SMARTLOCK Protect data against accidental, premature, or malicious alteration or deletion with Isilon’s software-based approach to WORM. Also helps meet stringent compliance and governance needs, such as SEC 17a-4 requirements

MANAGEMENT SOFTWARE

Dell EMC ISILON SMARTPOOLS Implement a highly efficient, automated tiered storage strategy to optimize storage performance and efficiency

Dell EMC ISILON CLOUDPOOLS Enable seamless tiering of data to a choice of public or private cloud providers

Dell EMC ISILON SMARTCONNECT Enable client connection load balancing and the dynamic NFS failover and failback of client connections across storage nodes to optimize the use of cluster resources

Dell EMC ISILON SMARTDEDUPE Increase efficiency and reduce storage capacity requirements by up to 35 percent with deduplication of redundant data across multiple sources

Dell EMC ISILON INSIGHTIQ Performance monitoring and reporting tools to maximize the performance of your Isilon cluster

Table 3: Isilon Data Protection and Management Software

3.3 Archiving with Automated Storage Tiering

Many enterprise environments have workloads or applications that require the use of active data while archiving older data. A mechanism is required to migrate stale primary data to archive storage, which should happen without any disruption to users and applications.

Page 8: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

8 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

Isilon SmartPools software simplifies management and lowers storage costs with a transparent, policy-based, automated tiering approach. It lets organizations optimize storage resources and automatically move older, unused data to economical archive storage. SmartPools is integrated with the Isilon OneFS operating system to allow a single point of management, with a single scalable file system that offer multiple tiers of performance—depending on the data.

With SmartPools, storage administrators can automatically match storage resources with specific data and application requirements. It also simplifies management by eliminating the need for manual data migrations. SmartPools moves data among tiers based on the enterprise requirements without sacrificing data protection, application performance, or uptime. Administrators can also use defined policies to move data based on age, type, owner, location, or other criteria, from one tier to another.

SmartPools is tightly integrated with Isilon OneFS, so all data, regardless of physical location, are in the same single file system. This means that SmartPools data movements are completely transparent to the end user application, removing management, backup, and other issues related to stub-based tiering architectures such as those present in hierarchical storage management (HSM) implementations.

With Isilon SmartPools, data protection levels can be set on a per-directory, storage tier, or even a per-file basis. Regardless of which type of Dell EMC Isilon storage nodes are used in the cluster, SmartPools can be used to control the data protection level. SmartPools also provides an option to configure performance profiles, so different types of data are actually laid out on disk differently to optimize performance for different types of workloads.

Figure 2 illustrates a mixed-node, single file system deployment in an Isilon cluster for online and archive data using SmartPools.

Isilon cluster with SmartPools for automated tiered storage

Automated tiering with Isilon provides the following advantages:

• Isilon SmartPools simplifies management with automatic, policy-based data movement within a single namespace, single file system without complex links, stubs, or manual data migration.

Page 9: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

9 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

• Enables storage consolidation by allowing the support of multiple applications and workloads with varying performance requirements on a single storage system.

• Optimizes storage resources by automatically aligning application needs.

• Adapts seamlessly to workflow changes and provides workflow isolation.

• Isilon CloudPools extends automated data tiering to the cloud with a choice of public and private cloud storage options

3.4 Archive server load balancing with Isilon SmartConnect

Enterprise archive environments will typically engage several if not dozens of archive servers. These servers are often set to move data to the archive vault based on certain file criteria, usually last modified time. While the working sets that the archive software or policies monitor are not likely to all coalesce simultaneously, it is possible. It is important to make sure you take this into consideration based on the number of archive servers that will be deployed. Dell EMC Isilon SmartConnect™ software optimizes network throughput by enabling intelligent client-connection load balancing and is used to automatically manage Isilon cluster access.

SmartConnect manages client connection load balancing through a single host name to the cluster nodes. This provides optimal utilization of the cluster’s available network interfaces and network system resources. By leveraging an organization’s existing DNS infrastructure, SmartConnect provides universal compatibility with all client types, eliminating the need for complicated connection management on the client side.

To a client system, the cluster appears as a single network element. SmartConnect automatically balances incoming client connections across all available interfaces on the Isilon storage cluster. This improves performance on the cluster by distributing the workload evenly across multiple network paths and multiple nodes.

For a Dell EMC Isilon storage cluster that hosts multiple concurrent workloads in addition to the organization’s archive, SmartConnect gives administrators the ability to partition workloads, by type, across the available node interfaces in a cluster. By maintaining multiple SmartConnect pools and minimizing the number of pools that overlap on a particular node interface, administrators can maintain sufficient network bandwidth for critical workloads on dedicated interface connections.

3.5 High availability, data protection and security

Isilon OneFS operating system provides scale-out data protection through Dell EMC Isilon FlexProtect™ feature of the Isilon OneFS operating system. FlexProtect utilizes advanced technology to provide redundancy and availability capabilities far beyond those of traditional RAID. FlexProtect uses Forward Error Correction (FEC) to create an n-way, redundant fabric that scales as nodes are added to the cluster, providing 100 percent data availability even with up to four simultaneous node failures. This goes far beyond the maximum level of RAID commonly in use today, which is the double-failure protection of RAID 6. Additional details on this are provided in the High Availability & Data Protection with Dell EMC Isilon Scale-Out NAS white paper.

For data backup and disaster recovery protection, you can easily copy and replicate data to remote sites. Dell EMC Isilon SnapshotIQ software enables fast and efficient data backup and allows you to take point-in-time copies of data with a choice of snapshot intervals and RPO time options. For multi-site disaster recovery protection, Dell EMC Isilon SyncIQ can be used to replicate data to your choice of local and remote sites. SyncIQ supports both LAN and WAN networks to replicate over short or long distances, providing protection from both site-specific and regional disasters. Superna Eyeglass, available through Dell EMC Select, can fully automate and orchestrate disaster recovery.

Security and compliance options from Isilon include:

• Role-based access control (RBAC) and secure access zones to limit data access.

• Immutable storage for data with its write once, read many (WORM) locking capability with Isilon SmartLock software. In this way, SmartLock protects your archived data against accidental, premature, or malicious alteration or deletion.

• Data at rest encryption options from Isilon storage platforms that include self-encrypting drives to prevent data theft.

• Integrated file system auditing to identify unauthorized data access attempts.

• Security and Technical Implementation Guide (STIG) hardening, CAC/PIV Smartcard authentication, and FIPS OpenSSL support.

Together, these capabilities provide a comprehensive data protection and security solution for your data archive.

Page 10: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

10 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

4.0 Archive sizing There are many factors to consider in sizing a cluster appropriately for an archive workload. Archive environments will often have many archive servers moving files based on time criteria to the Isilon cluster. Having many archive servers access the cluster is a perfect-use case for SmartConnect. SmartConnect will load balance the servers across all nodes of the cluster (while accessing the same file system for continuity). It is essential for sizing exercises that this practice is implemented so that no single-node bottleneck is reached artificially.

The potential to have many servers moving data simultaneously to the cluster can cause “cliff” events where the application can run into a requirement to move thousands of files all at once. Regardless of your deployment scenario, you must use peak values for your sizing estimate. If for example, you have 10 archive servers but they are on different vault schedules, you must plan for the worst-case scenario: that all are undergoing a vault operation simultaneously. That will be the assumption in the following sections.

This type of event requires consideration about the number of connections that can be handled, file system IO, bandwidth, etc. The following sizing section will address these considerations with Isilon clusters as well as provide general guidelines for elements such as maximum file counts and replication of the archive, among others.

4.1 What to understand about sizing an archive

There are three base scenarios to consider when sizing an archive:

• You have an existing Isilon cluster and wish to add an archive tier to it through expansion via node addition.

• You will be deploying a new Isilon scale-out cluster for archive purposes only, and using third-party archiving software such as Veritas Enterprise Vault or Dell EMC SourceOne for File Systems.

• You will be deploying a new Isilon scale-out cluster that will be used for both working and archive data sets.

For scenarios one and three above, SmartPools should be implemented. For each of these deployment models, consider the archive tier or the archive target cluster to be composed of either Isilon A200 or Isilon A2000 Archive storage platforms. That’s because:

• Archives are typically composed of capacity-intensive data sets.

• Archives usually have lower performance and throughput requirements than active working data sets (still size for peak vaulting events).

• Archives generally don’t have a strong metadata caching requirement.

• Archives are backed up on a different schedule than primary data.

Six key metrics are usually gathered for evaluating the number of nodes required to meet a given archive workload:

• The number of files that will be written to the file system

• The average file size that will be moved in an archiving operation

• The number of archive servers

• The number of connections expected from an archive server

• The expected capacity of the archive

• Growth requirements

The combination of I/O and average file size gives a good idea of the aggregate throughput the cluster can expect to receive. This helps plan SmartConnect zones and which interfaces need to be used.

Transfers or migrations that occur internally to an Isilon storage cluster use a back-end network with a choice of 2 InfiniBand connections or 2 10GbE (SFP) connections for each of the 4 nodes in the 4U chassis. In this way, there is no external network interface bandwidth consumed. Interface/bandwidth requirements, therefore, are more of a concern for archive target clusters. For this reason, it is usually best to start with identifying I/O requirements.

The number of archive servers and the expected connection counts help determine whether to consider a quantity of nodes that might be greater than the archive capacity requirements.

Page 11: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

11 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

Expected capacity and estimated growth help determine how many nodes and at what density are needed from a raw storage perspective. All of these factors need to be balanced such that the most stringent requirement is the sizing metric of choice.

Once a profile of performance requirements is available, it becomes a trivial matter to predict future performance values based on these results because performance grows linearly with node addition, as each Isilon node carries a “unit” of storage/processor/bandwidth.

4.2 Sizing examples

Using different scenarios based on the metrics mentioned above, we can review a couple of different deployment models to ascertain how scenarios might play out.

In the first example, we take a look at number of files, average file size, and how that correlates to bandwidth. Figure 2 depicts a scenario in which 10,000 files are pushed from three archive servers to an Isilon A200 that contains 4 nodes within the 4U chassis, at a rate of 100 files per second. In this example, the average file size for the archive is measured to be 128 KB.

Archive sizing example 1

Page 12: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

12 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

The bandwidth requirement for a target cluster would be, on average, (100 files/s)*(128 KB/file) = 12,800 KB/s. It would best be balanced over the cluster interfaces with SmartConnect managing the traffic across the archive target nodes. Three archive servers are not expected to drain the connection pool for a four-node cluster (the cluster is designed to handle many client connections). An internal cluster tier migration would not need to take this into consideration because it would be handled by the back-end IB network, but there might be contention for disk I/O.

The second example takes a look at capacity, bandwidth based on source file breakdowns, and the relative write capability of the archive target cluster.

Assumptions:

• 200 TB of archive source data

• File breakdown: 80/20: 80% 10+ MB files and 20% 4KB files

• Target cluster is an Isilon A200 that contains 4 nodes within a single 4U chassis

Archive Sizing Example 2 – 200 TB Distributed Archive Content

Note that 200 TB of archive data is within the capacity of a single Isilon A200 system configured with 8TB SATA drives. However, the aggregate throughput, based on some performance numbers for a 4 node cluster of this size, may mean that it is undersized for your deployment.

The following values are for demonstration purposes only and do not reflect true performance. Assume a typical mixed work load profile for the nodes as follows:

Page 13: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

13 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

• Write 15 MB/s (small files)

• Write 520 MB/s (large files)

The read values are not of great importance here (unless the source is another Isilon A200 cluster), so writes to the target cluster will be the values to use. In this case, the entire archiving event occurs at once. Requirements for writes to the target will, of course, vary. For example, 15 MB/s to write 40 TB of small files take about 31 days. One hundred and sixty TB of large files written to the target at 520 MB/s takes about 3.5 days, for a total transfer time of 34.5 days. This value is a maximum write capacity at the target and is impractical due to the time involved, the system impact, and, potentially, the network bandwidth requirements. However, expect the performance of the cluster to grow linearly, so if another A200 system is added to the cluster, transfer time will be cut in half.

The third example below is similar to the second one; however, the archive capacity requirement is much higher – in this case, over 2 PB

Assumptions:

• 2.5 PB of archive source data

• The workload is deep archive (long term data storage with minimal access)

• Physical space to install the hardware is a concern

• File breakdown – 80/20 (80% 10MB+ files and 20% 4KB files)

• Target cluster consists of Isilon A2000 systems (4 nodes per each 4U chassis) with a capacity of 800TB per chassis or Isilon A200 systems with a capacity of 480TB per chassis

The Isilon A2000 is specifically designed for deep archive type workloads. It provides massive scalability alongside lowering operating costs. The fact, that they come with 80 10TB SATA drives in a 4U form factor, makes them ideally suitable for places where physical space (datacenter space) is limited. The deep archive use case mentioned in the example can be satisfied using five A2000 systems. This would provide a usable capacity of 3 PB and a storage efficiency of over 83%. A conservative approach of using 85% of the usable capacity will still cover the requirement of 2.5PB. A comparable usable archive capacity can be achieved using an Isilon 200 cluster comprised of 8 systems (32 nodes). The relative advantages of the Isilon A2000-based solution in this scenario are reflected in the Table 4.

Solution Attributes Isilon A2000 Cluster Isilon A200 Cluster

Number of chassis required (4 nodes per chassis) 5 chassis (20 nodes) 8 chassis (32 nodes)

Total Rack Units 20 32

Aggregate Typical Power Consumption 5600 Watts 8500 Watts

Aggregate Typical Thermal Rating 19000 BTU/hr 28800 BTU/hr

Table 4: Solution Comparison: 3 PB Archive with Isilon A2000 or A200 Cluster

In this example, the archive solution using Isilon A2000 nodes requires less physical space, less power and lower cooling requirements.

4.3 Scaling the estimate for future growth and planning

Isilon clusters deployed with four or more nodes demonstrate performance growth that is mostly linear with node addition for a given working set. That is, if you size an archive cluster at 8 nodes (based on performance and/or capacity) and the performance requirement or expected growth changes by 25 percent, an addition of 2 nodes will supply the incremental increase, assuming you are load balancing the cluster with SmartConnect. Similarly, a cluster sized at 16 nodes that needs a 25 percent increase in performance for the same working set will need an addition of 4 nodes to the cluster.

Page 14: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

14 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

4.4 Other sizing considerations

The following sections briefly discuss other elements that you may want keep in mind when assessing an archive deployment.

4.4.1 Information on the maximum number of connections

If multiple archive servers experience an archiving cliff event, a condition that allows many connections to be opened against the cluster is possible. The number of SMB sessions to a single node is limited by the amount of memory for a single process due to per-connection overhead. SmartConnect is essential in order to most efficiently utilize cluster memory resources.

4.4.2 Maximum number of total files in the file system

Archives can grow quite large over time, and each cluster has a limit. The maximum number of files in a single directory is the same as the maximum number of files for the cluster; however, that is not a practical number. Limits depend on total storage capacity of the cluster, not the number of nodes (although obviously the use of more nodes typically translates to higher capacity). An Isilon cluster can scale to file counts in the billions.

4.4.3 Isilon SyncIQ for archive data replication

If archive replication is required, there are many considerations needed to determine both the effective replication timelines and whether it is even viable given the replication bandwidth. Replication assumptions include:

1. An existing Dell EMC Isilon source cluster composed of either single or mixed node types

• Replication strategy: Disaster Recovery (DR)

o DR allows for dissimilar performance profiles between the source, which is the primary working set, and the target

o The target is typically less highly performing and is used to restore data in the event that the primary is destroyed

Isilon A200 or Isilon A2000 systems can be used here

• Replication strategy: Business Continuance (BC)

o BC requires similar performance on the replication target as on the source, which is the primary working set

o In a BC scenario, the target location is made into the active site during a primary site failure

A run book and cutover process are required

To calculate transfer times, use the second example in section 4.2 as a reference.

5.0 Conclusion Dell EMC Isilon scale-out NAS for archives is designed to provide the most efficient utilization of capacity, reduce overall storage footprint, and deliver significant savings. Its support for standard NAS protocols provides flexibility in selecting the right archive applications for your business. Once data is archived off of the primary storage, Isilon incorporates a number of features to help ensure archived files are stored as efficiently as possible:

• Single file-system simplicity and single volume simplicity with 80 percent or greater storage utilization. And, unlike most other systems, an Isilon cluster’s storage efficiency, availability, and performance improve as its capacity scales.

• Isilon SmartPools automatic, policy-based tiering capabilities can be used to store data in either high-performance or high-capacity storage pools based on the data’s relevance and value to the organization.

• Isilon SmartConnect manages load balancing of many archive servers.

• Isilon’s simplified scale-out architecture enables consolidation of file data, reduces the number of locations across the organization where data is stored, decreases management overhead, and streamlines archive operations.

Additional features of Isilon archive solutions are summarized in Table 5.

Page 15: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

15 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

Feature Benefit

Massive scalability and high density storage options Archives may start small, but inevitably will grow. Isilon clusters can start as small as 36 TB and grow to over 68 PB in a single file system, adding additional performance, availability, and storage efficiency as it grows.

Cost-effective, disk-based solution that offers accessibility and data integrity advantages over tape

For long-term data preservation and immediate access, an Isilon disk-based archive solution offers the greatest flexibility and eliminates risks.

High-performance NAS access enables easy consolidation of archives across multiple application and compute environments

Performance that scales as capacity grows enables high-volume throughput and interoperability and flexibility with multi-protocol support.

Automated, policy-based tiering to minimize storage costs SmartPools and CloudPools software automatically manage storage tiering to help ensure that data is stored in the most effective tier for maximum protection and cost savings.

Automated load balancing Isilon SmartConnect provides a robust and automatic method for all of your archive servers to be load balanced across all nodes of the cluster to most effectively utilize all the available resources.

Robust data protection Isilon storage is highly resilient and offers a wide array of data protection and security options to safeguard your data.

Table 5: Key Isilon scale-out NAS features and benefits for archive

Dell EMC Isilon archive solutions provide organizations with a highly efficient, massively scalable, and secure disk-based archive that protects data for long-term retention while optimizing the use of primary storage resources.

TAKE THE NEXT STEP Contact your Dell EMC sales representative or authorized reseller to learn more about how Isilon scale-out NAS storage solutions can benefit your organization.

Shop Dell EMC Isilon to compare features and get more information.

Page 16: Archive Solutions with Dell EMC Isilon Scale-Out NAS · Archive Solutions With Dell EMC Isilon Scale ... based archive that scales from ... to move content from primary storage to

16 |

Archive Solutions With Dell EMC Isilon Scale-Out NAS © 2017 Dell Inc. or its subsidiaries.

6.0 Appendix

6.1 Archive application file system ACL requirements

Many applications will not function “out of the box” with default share ACLs. This is not a Dell EMC Isilon system limitation; it is a property of secure systems. There are a few share and ACL changes that need to occur for most third-party applications to function properly with their vaults. See the References section for an ACL link.

6.2 References

The following documents provide additional and relevant information.

High Availability & Data Protection with Dell EMC Isilon Scale-Out NAS An overview of Dell EMC Isilon data protection technology

Dell EMC Isilon OneFS Operating System An overview of Dell EMC Isilon OneFS operating system

Dell EMC Isilon Archive Scale-Out NAS Storage Specification sheet for Dell EMC Isilon Archive scale-out storage

Dell EMC Isilon SmartConnect A white paper discussing the Isilon cluster load-balancing mechanism

Next Generation Storage Tiering with Dell EMC Isilon SmartPools A white paper describing Isilon SmartPools functionality and configuration

Best Practices for Data Replication with Dell EMC Isilon SyncIQ A white paper describing Dell EMC Isilon replication technology

© 2017 Dell Inc. or its subsidiaries. All Rights Reserved. Dell, EMC and other trademarks are trademarks of Dell Inc. or its subsidiaries. Other trademarks may be trademarks of their respective owners. Reference Number: H11224.3

Learn more about Dell EMC Isilon

Contact a Dell EMC Expert View more resources Join the conversation with @DellEMCStorage