45
Huawei OceanStor UDS Massive Storage System Technical White Paper Issue 1.1 Date 2014-06 HUAWEI TECHNOLOGIES CO., LTD.

Huawei OceanStor UDS Massive Storage System … OceanStor UDS Massive Storage System Technical White Pater 3 Solutions A-Nodes: two T3200 servers UDSNs: ≤ seven (flexibly deployed

  • Upload
    vobao

  • View
    249

  • Download
    3

Embed Size (px)

Citation preview

Huawei OceanStor UDS Massive Storage System Technical White Paper

Issue 1.1

Date 2014-06

HUAWEI TECHNOLOGIES CO., LTD.

Copyright © Huawei Technologies Co., Ltd. 2013. All rights reserved.

No part of this document may be reproduced or transmitted in any form or by any means without

prior written consent of Huawei Technologies Co., Ltd.

Trademarks and Permissions

and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.

All other trademarks and trade names mentioned in this document are the property of their respective

holders.

Notice

The purchased products, services and features are stipulated by the contract made between Huawei and

the customer. All or part of the products, services and features described in this document may not

be within the purchase scope or the usage scope. Unless otherwise specified in the contract, all

statements, information, and recommendations in this document are provided "AS IS" without warranties,

guarantees or representations of any kind, either express or implied.

The information in this document is subject to change without notice. Every effort has been made in the

preparation of this document to ensure accuracy of the contents, but all statements, information, and

recommendations in this document do not constitute a warranty of any kind, express or implied.

Huawei Technologies Co., Ltd.

Address: Huawei Industrial Base

Bantian, Longgang

Shenzhen 518129

People's Republic of China

Website: http://enterprise.huawei.com

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater Contents

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

3

Contents

1 Executive Summary ...................................................................................................................... 5

2 Introduction.................................................................................................................................... 6

3 Solutions ......................................................................................................................................... 7

3.1 Product Composition .................................................................................................................................................... 7

3.2 Product Features ......................................................................................................................................................... 10

3.2.1 Exascale Scalability ................................................................................................................................................. 10

3.2.2 SoD Architecture ..................................................................................................................................................... 15

3.2.3 High Security and Reliability .................................................................................................................................. 21

3.2.4 Low TCO ................................................................................................................................................................. 27

4 Experience ..................................................................................................................................... 29

4.1 Solution 1: Massive Resource Pool ............................................................................................................................ 29

4.1.1 Typical Needs and Problems Facing Customers ...................................................................................................... 29

4.1.2 Solution .................................................................................................................................................................... 30

4.1.3 Software and Hardware Configurations ................................................................................................................... 31

4.1.4 Benefits .................................................................................................................................................................... 32

4.2 Solution 2: Centralized Backup .................................................................................................................................. 32

4.2.1 Typical Needs and Problems Facing Customers ...................................................................................................... 32

4.2.2 Solution .................................................................................................................................................................... 33

4.2.3 Software and Hardware Configurations ................................................................................................................... 34

4.2.4 Benefits .................................................................................................................................................................... 35

4.3 Solution 3: Web Disk .................................................................................................................................................. 35

4.3.1 Typical Needs and Problems Facing Customers ...................................................................................................... 35

4.3.2 Solution .................................................................................................................................................................... 36

4.3.3 Software and Hardware Configurations ................................................................................................................... 37

4.3.4 Solution Network ..................................................................................................................................................... 39

4.3.5 Benefits .................................................................................................................................................................... 39

4.4 Solution 4: Centralized Active Archiving ................................................................................................................... 40

4.4.1 Typical Needs and Problems Facing Customers ...................................................................................................... 40

4.4.2 Solution .................................................................................................................................................................... 40

4.4.3 Software and Hardware Configurations ................................................................................................................... 41

4.4.4 Solution Network ..................................................................................................................................................... 42

HUAWEI OceanStor UDS Massive Storage System Technical

White Pater Contents

4.4.5 Benefits .................................................................................................................................................................... 43

5 Conclusion .................................................................................................................................... 44

6 Acronyms and Abbreviations ................................................................................................... 45

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 1 Executive Summary

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

5

1 Executive Summary

As the IT industry develops, data amount soars at an unprecedented speed. A brand-new

storage system is required for the reliable storage of massive data. Massive storage systems

come into being. This document describes HUAWEI UDS massive storage system (UDS for

short) in terms of product composition, application scenario, and advantage. With large

capacity, high reliability, and outstanding scalability, the UDS brings unique values to

customers.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 2 Introduction

2 Introduction

As the IT industry evolves, people's life becomes closely related to the IT. Data, as the

cornerstone and most important assets in the IT industry, is growing at an unprecedented

speed. Big data now becomes a trend. All industries call for secure and reliable storage of

massive data.

HUAWEI UDS massive storage system is developed to address problems and challenges

facing customers. The UDS features:

Industry-leading scale-out distributed storage architecture and the distributed hash table

(DHT) algorithm

Diversified external interfaces compatible with Amazon Simple Storage Service (S3)

interfaces

Multi-level data protection technologies such as Multiple Copies (MC) and Erasure Code

(EC)

With large capacity, high reliability, easy maintenance, and flexible scalability, the UDS

applies to scenarios of massive data storage and centralized backup, and supports an exascale

capacity and a secure, reliable, efficient, and converged architecture.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

7

3 Solutions

Based on industry trends and a thorough understanding of customer needs, Huawei releases

the UDS, a massive storage system designed specifically for the big data market. The UDS

employs the DHT-based scale-out storage architecture and multiple data protection

technologies such as EC and MC to ensure data security, and provides unified external

interfaces for the access of multiple types of services, meeting massive data storage

requirements.

3.1 Product Composition The UDS consists of access nodes (A-Node for short) and universal distributed storage nodes

(UDSNs). A-Nodes are used for data scheduling, that is, distributing data requests from

upper-layer services to UDSNs. UDSNs are used for data storage. To meet the requirement of

massive data storage, A-Nodes and UDSNs are deployed in high availability (HA) clusters,

namely, the access cluster and the storage cluster. 0 shows the components deployed in a UDS

cabinet.

Figure 3-1 Components deployed in a UDS cabinet

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

A-Nodes: two T3200 servers

UDSNs: ≤ seven (flexibly deployed based on the load bearing capability of equipment

rooms and power consumption requirements)

Access switches: two S6724 (S6724 for the enterprise market/S6324 for the carrier

market), full 10GE

An A-Node is used to process and control access requests initiated by clients, establish object

transmission channels, and manage metadata. A-Nodes can be clustered. When the amount of

concurrent access requests is large, new A-Nodes can be added to improve request processing

capabilities, thereby eliminating data processing bottlenecks.

Figure 3-2 shows the appearance of an A-Node.

Figure 3-2 Appearance of an A-Node

Specifications Value

Disk type SATA, SAS, NL SAS, and SSD

Max. number of disks 12

Max. capacity per disk 4 TB

AC power supplies 100 V to 127 V or 200 V to 240 V, 1+1 power

supply redundancy

Power consumption Without service disks: 350 W

Max. power consumption: 650 W

Dimensions 86.1 mm x 446 mm x 585 mm (2 U)

Weight 18.5 kg (unloaded)

A UDSN is used to store, replicate, and ensure consistency of data and metadata. A UDSN

contains innovative smart disks. Unlike traditional disks, smart disks combine disk drives and

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

9

CPUs to provide improved data processing capabilities. The storage capacity of the UDS can

be expanded by adding UDSNs.

Figure 3-3 shows the appearance of a UDSN.

Figure 3-3 Appearance of a UDSN

Specifications Value

Disk type SATA

Max. number of disks 75

Max. capacity per disk 4 TB

AC power supplies 100 V to 127 V or 200 V to 240 V, 2+2 or 1+1

power supply redundancy

Max. power

consumption

1350 W

Dimensions 176.5 mm x 446 mm x 790 mm (4 U)

Weight 45.2 kg (unloaded)

97.7 kg (fully-loaded)

The UDS provides massive data storage capabilities by A-Node and UDSN clusters and

cross-cabinet capacity expansion. UDS cabinets are connected by service switches S6724

(S6724 for enterprise markets and S6324 for carrier markets) and core switches over a full

10GE network, as shown in Figure 3-4.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Figure 3-4 UDS system network diagram

3.2 Product Features As a massive storage system, the UDS applies to big data scenarios where upper-layer

services vary and data amount soars. To meet the increasing requirement of data security, the

UDS provides a secure, reliable, massive, efficient, and converged storage architecture to

cope with challenges in the big data era.

3.2.1 Exascale Scalability

For customers who do not want to purchase a large storage capacity during initial deployment,

but will expand the system capacity with the service growth, the UDS reduces their initial

investment and provides flexible capacity scalability. For customers whose service systems

carry heavy workloads, the UDS provides a massive storage resource pool to eliminate

storage capacity bottlenecks.

Based on the elastic DHT, DHT-based one-off addressing, and key technologies such as

decentralized architecture, stateless access cluster, and metadata hashing, the UDS provides

massive storage capacities scalable to exabyte level.

3.2.1.1 DHT

The UDS uses the DHT-based hash algorithm to divide and address all storage units' address

space and then maps the divided address space to the DHT ring. Each storage unit stores data

as objects and can be located by its address space. Upon data object read/write, each data

object stored on a storage unit is located using the one-off hash algorithm addressing.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

11

A DHT ring has an infinite address space and is elastic in size (as shown in Figure 3-5) by

changing the partition size. Theoretically, a DHT ring supports an extremely large number of

storage units, laying a solid foundation for exascale capacity expansion.

Figure 3-5 Elastic DHT ring that supports infinite node expansion

The UDS provides the following advantages based on the DHT:

1 Metadata is evenly in virtual space comprising all physical nodes, enabling infinite

storage expansion.

2 Data is equivalently accessed on each two nodes in a point-to-point approach, shortening

latency of central node index query and eliminating performance bottlenecks.

3 Storage capacity can be gradually expanded on demand.

The DHT technical principles are as follows:

Each storage unit (smart disk) corresponds to a physical node and has a unique ID.

In the UDS, data has a key and is stored by the key's hash value. The hash value of a key

corresponds to a storage unit.

Hash values of all keys reside in integer range [0, 232

-1]. When the UDS is being

initialized, this integer range is divided into multiple same size partitions, each of which

contains the same number of hash value integers. Each partition represents the same hash

space.

The capacity of each physical node is usually divided into 20 to 40 partitions.

Each partition corresponds to a virtual node. Data in a partition is stored onto the

corresponding virtual node.

The UDS maintains and updates a mapping table between partitions (or virtual nodes)

and physical nodes.

The DHT ring is an integer range of 0 to 2128

. Each virtual mode is mapped to the DHT

ring and each key of data is mapped to a virtual node. The hash value (data) of a key is

stored to the corresponding virtual node.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

After the UDS is expanded, for example, new physical nodes are added, the number of

hash space partitions keeps unchanged but the mappings between virtual nodes and

physical nodes are updated automatically. The DHT ring has an infinite address space

therefore supports unlimited virtual nodes. By adjusting mappings between virtual nodes

and physical nodes, unlimited physical nodes can be added.

Figure 3-6 uses a storage cluster with four physical nodes (each physical node contains five

virtual nodes) as an example to describe the DHT technical principles.

Figure 3-6 DHT technical principles

The hash space (0 to 232

-1) is divided into N same-size partitions. In the preceding figure, the

hash space is divided into 20 partitions from P0 to P19. Each partition contains the same

number of hash values.

The hash value of each key is mapped to a partition. For example, the hash value of key k1 is

mapped to partition P0.

A to T in the preceding figure represent 20 virtual nodes. Data in a partition is stored onto the

corresponding virtual node. For example, data represented by the key whose hash value is

mapped to P0 is stored to virtual node A. Similarly, data represented by the key whose hash

value is mapped to P1 is stored to virtual node B.

Physical nodes 1, 2, 3, and 4 that represent physical storage units (smart disks) provide

persistent data processing capabilities. A physical node has a mapping relationship with

virtual nodes. Usually, a physical node corresponds to multiple virtual nodes. This mapping

relationship is similar to that between partitions and physical nodes.

The number of partitions is determined when the UDS is being initialized and keeps

unchanged after the number of physical nodes increases. The change in partition quantity will

cause the number of hash values to change in each partition. As a result, data in each partition

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

13

and node will be relocated. Therefore, the number of hash partitions is kept unchanged to

avoid data relocation.

Physical nodes can be added or removed online based on capacity requirements. The number

of partitions does not change with that of physical nodes in the SoD cluster, but mappings

between partitions and physical nodes are automatically updated after the number of physical

nodes is changed. As shown in the preceding figure, physical nodes 1, 2, 3, and 4 correspond

to five virtual nodes (partitions) respectively. After the new physical node 5 is added, each

physical node corresponds to four virtual nodes (partitions).

Figure 3-7 Mappings between physical nodes and partitions after a physical node is added

After a new physical node is added, four partitions are allocated to the new physical node.

Therefore, 1/5 data of the cluster is migrated to the new node.

Based on the DHT algorithm, if the UDS has M physical nodes and a new node is added,

1/(M+1) data of the UDS is migrated after partitions are reallocated to all physical nodes.

Similarly, if a node is faulty or removed from the UDS, 1/M data of the UDS is migrated after

the partition reallocation. Figure 3-8 shows mappings between physical nodes and partitions

after physical node 4 is removed.

Figure 3-8 Mappings between physical nodes and partitions after a physical node is removed

3.2.1.2 Decentralized Architecture

The UDS has two logical clusters: the access cluster and the storage cluster, which consist of

A-Nodes and UDSNs respectively. An A-Node provides access to the object-based storage

service. It also processes and controls access requests initiated by clients, establishes object

transmission channels, and manages metadata. A UDSN is used to store, replicate, and ensure

consistency of data and metadata.

Figure 3-9 shows the DHT algorithm-based equivalent point-to-point data access between

A-Nodes and UDSNs. In the UDS, an A-Node can directly access any UDSN for data

read/write based on the DHT-based addressing. Unlike traditional storage systems, this way of

data access in the UDS does not rely on central nodes, which shortens the latency of data

index query and eliminates access bottlenecks.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Figure 3-9 Decentralized architecture for equivalent point-to-point access

3.2.1.3 Smart Disk

The UDS uses smart disks as storage units, which are also regarded as physical nodes. A

smart disk contains a disk drive, energy-saving Advanced RISC Machines (ARM) chip,

small-capacity memory, and Ethernet ports. Each smart disk is allocated a dedicated IP

address to connect to switches and communicate with other smart disks in a distributed and

interconnected network, as shown in Figure 3-10. The UDS capacity can be expanded by

adding smart disks, enabling fine-grain capacity expansion at the disk-level.

Figure 3-10 Decentralized architecture for equivalent point-to-point access

Each smart disk has fixed data access throughput. Therefore, the throughput of the UDS can

linearly grow with the number of smart disks. For details, see the HUAWEI OceanStor UDS

Massive Storage System Technical White Paper — Smart Disks.

3.2.1.4 Stateless Cluster

In the UDS, A-Nodes are networked in the access cluster. Based on the object-based storage

technology and the DHT algorithm-based one-off addressing, A-Nodes that are loosely

coupled with UDSNs can work as stateless service nodes. An A-Node can process any service

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

15

requests allocated to it after load balancing. Unlike traditional storage systems where the

number of nodes used for processing service requests is limited due to state synchronization

and locking mechanisms, the UDS can have an unlimited number of A-Nodes in its access

cluster theoretically, eliminating architecture bottlenecks that hinder exabyte-level capacity

expansion.

3.2.1.5 Metadata Hashing

The UDS does not have dedicated metadata nodes. Instead, metadata services are provided by

A-Nodes, which distribute metadata slices evenly onto UDSNs in the same way as common

data based on the DHT algorithm. When the number of concurrent access requests soars,

metadata service requests are distributed to A-Nodes in the access cluster for load balancing

and A-Nodes can be added on demand to improve the request processing capability,

preventing a bottleneck from occurring.

3.2.1.6 MDC

The UDS provides the Multiple Data Center (MDC) feature to centrally schedule and manage

multiple DCs across regions. To meet different capacity requirements, the UDS can be

expanded from several terabytes to exabytes on demand.

Figure 3-11 Centralized scheduling and management of multiple DCs across regions

As shown in Figure 3-11, the UDS can synchronize data between cross-regional DCs,

customize data copy policies based on service level agreements (SLAs), and preferentially

access data on the nearest DC. The MDC feature ensures exascale capacity expansion in terms

of scalability, reliability, and operability.

3.2.2 SoD Architecture

Sea of Disks (SoD) is an innovative and decentralized storage architecture dedicated to

processing massive unstructured data that is much more frequently read than written. With

the DHT algorithm-based addressing, a large number of power-saving and cost-effective

smart disks are consolidated into a decentralized cluster with unified software and hardware.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Based on the SoD architecture, the UDS provides outstanding availability, scalability, and

maintainability. High performance is delivered while costs and energy consumption are cut

down.

The SoD architecture comprises the access cluster and the storage cluster.

The access cluster consists of A-Nodes that process external requests. A-Nodes have powerful

computing capabilities. Therefore, the access cluster provides computing-intensive services

such as request access, user authentication, data slicing, data aggregation, and data routing.

The storage cluster consists of UDSNs whose computing capability is inferior to that of

A-Nodes. A UDSN comprises power-saving and energy-efficient smart disks. Each smart disk

provides key-value interfaces. All user data is stored in the storage cluster as data slices after

being processed by the access cluster.

Data slices and partitions in the storage cluster are divided and routed based on the DHT.

The DHT determines the location where a data slice is stored. Therefore, the storage cluster

can be regarded as a DHT ring.

3.2.2.1 I/O Process

The data write process is as follows:

1 Request access: The client sets up connections with an A-Node of the UDS and transmits

data to that A-Node.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

17

2 Storage policy selection: The A-Node determines the data storage policy based on preset

configurations.

3 Data slicing: If the amount of data transmitted from the client exceeds 1 MB, the A-Node

divides the data into multiple slices of 1 MB each.

4 Data route: The A-Node writes the data slices into the storage cluster based on the DHT.

1 Request access: The client sets up connections with an A-Node of the UDS and sends a

data read request to that A-Node.

2 Data routing: The A-Node locates the partition where the requested data resides based on

the DHT, and obtains the address of the smart disk where the partition resides.

3 Data repair: If any data slice is damaged, the A-Node repairs the data slice based on the

specified data storage policy.

4 Data aggregation: The A-Node aggregates data slices to the original data and sends the

data to the client.

Buffers are reserved in A-Nodes for data slicing and aggregation.

1 Data write: After dividing data into slices, an A-Node buffers some data slices and writes

data slices to different UDSNs to speed up data write.

2 Data read: An A-Node anticipates the range where the data requested by the client

resides and then reads data slices consecutively from smart disks onto the buffer to speed

up data read.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

To achieve the optimal throughput with the minimum resources, A-Nodes automatically

adjusts buffer sizes and the volume of data concurrently read from or written to smart disks

based on connection speeds and data volume of clients.

3.2.2.2 Request Access

The access cluster of the UDS provides standard S3 interfaces and rich S3 ecosystems

(including tools, development packages, and third-party software integration).

Amazon S3 is the de facto standard in the cloud storage field. Based on the HTTP protocol,

Amazon S3 provides mature interfaces complying with the Representational State Transfer

(REST) protocol. S3 interfaces are easy-to-use, reliable, stateless, and easy to be accessed via

networks in nature.

S3 interfaces define a data model consisting of three layers: user, bucket, and object.

1 A user in the UDS can own and manage buckets and objects.

2 A bucket is the container of objects, similar to a folder in a file system. A bucket can

contain multiple objects but no other buckets.

3 An object is a set of data, similar to a file in a file system.

The UDS data model abandons the traditional nested directory structure. A single bucket is

able to house hundreds of millions of objects and is easy to expand. This flat storage structure

is highly suitable to unstructured data.

The UDS defines a user model consisting of three layers: account, group, and user.

1 An account owns resources in the UDS, and corresponds to an individual user, enterprise,

or organization.

2 A user uses resources in the UDS. An account can create multiple users and grant users

permission for different resources. Usually, a user corresponds to an enterprise or an

employee of an organization.

3 A group is a collection of users. An account can create multiple groups, add users to

different groups, and grant groups permission for different resources. One user can

belong to different groups. Users in a group inherit all permission of the group. Usually,

a group corresponds to an enterprise or a sub-department or sub-organization.

3.2.2.3 Storage Policy

The UDS provides flexible storage policies. A storage policy determines the reliability,

availability, security, and space occupation of data. Upon receiving data access requests from

users, the access cluster of the UDS reads user configurations from the user server and then

determines a data storage policy accordingly.

1. Multiple Copies (MC)

The MC storage policy generates multiple copies for a piece of data. Each data copy is stored

onto different physical nodes in the same storage cluster. Even if some data copies are

completely damaged or lost, users can still access the other data copies. This storage policy

provides high redundancy and reliability, but consumes large storage space.

The MC storage policy works based on a quorum mechanism. This mechanism defines a

group of replication parameters, which are called NWR for short.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

19

N: indicates the number of data copies. A piece of data has N copies in the UDS.

W: indicates the number of data copies that are successfully written onto the UDS. Only

after W data copies are successfully written, the UDS returns a write success message to

the user.

R: indicates the number of data copies that are successfully read from the UDS. Only

after R copies of the requested data are successfully read, the data is returned to the user.

The default MC policy adopted by the UDS is NWR = 322, that is, each piece of data has two

copies and data read and write require at least two copies respectively. This storage policy

strikes a balance between reliability and data consistency, applicable to most

reliability-demanding scenarios.

With this storage policy, the UDS's access cluster slices user data into pieces, and each piece

is replicated to multiple copies. The data copies are then written onto different data partitions

of physical nodes in the DHT.

The UDS is environment-aware, storing data copies onto physical environments independent

from each other to improve data reliability and availability. For example, copies of the same

data are stored onto different storage cabinets, enclosures, and physical nodes, to tolerate

more fault scenarios.

2. Erasure Code (EC)

The EC storage policy generates redundant data for a piece of data. If a piece of data is

partially damaged or lost, the UDS can use its redundant data to reconstruct or repair the

damaged data. The EC storage policy ensures high data reliability and consumes less storage

space, striking a balance between reliability and economy.

After the access cluster divides data into slices, consecutive M slices comprise an EC group.

Based on the EC storage policy, the UDS generates N parity data slices for the EC group. The

data slices and parity slices are stored onto a consecutive group of data partitions in the

storage cluster. In doing so, the data slices are stored onto different physical nodes, improving

the data reliability.

As long as the number of damaged data slices does not exceed N, the access cluster is able to

restore the damaged slices using the other ones.

3. Data stored in different DCs for cross-region disaster recovery

The Multiple Data Center (MDC) policy is configured by the unit of bucket. If an MDC

policy is enabled for a bucket, the access cluster writes the data from the bucket to the local

storage cluster and its data copies to UDS systems in other data centers.

The MDC policy supports asynchronous replication. Data is first written onto the local

storage cluster and then a background asynchronous replication task is initiated to replicate

the data to a remote data center. If this task fails, the UDS initiates the task again after a

periodic background scan.

For details, see the HUAWEI OceanStor UDS Massive Storage System Technical White Paper

— Reliability.

3.2.2.4 Data Routing

The UDS routes data slices based on the DHT.

After the UDS is initialized, mappings between data partitions of the storage cluster and

physical nodes are determined and recorded. At the same time, the UDS maps the data

partitions evenly to a hash space residing in the range of [0, 232

-1]. The next number after 232

-1 is back to 0. Therefore, the hash space is a ring.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

When reading a data slice, an A-Node first uses the consistency hash algorithm to calculate

the hash value of the data slice based on its key. The hash value is also stored in the hash

space. Therefore, both data slices and data partitions reside on the same logical hash ring. The

consistency hash algorithm stores each data slice onto the first data partition next to the data

slice in counterclockwise direction. After obtaining a data partition, the A-Node can locate the

physical node where the data partition resides; thereby the routing of the data slice is

completed.

If the MC storage policy is used, N copies of a data slice are stored. After locating the first

data partition, the A-Node locates the other N-1 data partitions clockwise along the logical

ring. The other N-1 data partitions are where the other N-1 data copies are stored.

If the EC storage policy is used, an A-Node locates the first data partition for the first data

slice of the EC group and then addresses other data partitions according to specific addressing

rules. For details, see the HUAWEI OceanStor UDS Massive Storage System Technical White Paper — Reliability.

3.2.2.5 Data Repair

If data slices are found damaged when clients read data on the UDS, A-Nodes repair the data

slices based on the storage policy. This is done to ensure the correctness and reliability of the

data read by the clients.

MC

If the data slice that is stored based on the MC policy is damaged, A-Nodes attempt to

read one of its copies to repair the data.

EC

If the data slice that is stored based on the EC policy is damaged, A-Nodes attempt to

read the EC group where the data slice resides. The intact data slices and parity data

slices in the EC group are used to repair the damaged data slice. For details, see the

HUAWEI OceanStor UDS Massive Storage System Technical White Paper — Reliability.

MDC

If the data slice that is stored based on the MDC policy is damaged, A-Nodes attempt to

read the desired data slice from the backup data center and use the data slice to repair the

damaged one.

In addition to read repair, the UDS constantly scans and verifies data in the cluster and

restores damaged data in the background if any errors are detected. Background repair is

classified into two levels: object-level and slice-level.

Object-level

A-Nodes constantly scan data in the cluster, calculate the digest of the object, and

compare the digest with the correct digest stored in the metadata. If the object data is

incorrect, the UDS repairs the object data using the repair mechanism applied to data

read.

Slice-level

Smart disks of the UDS constantly scan the data that they carry and repair incorrect data

slices by using the anti-entropy mechanism. For details about the anti-entropy

mechanism, see the HUAWEI OceanStor UDS Massive Storage System Technical White

Paper — Smart Disks.

3.2.2.6 Cluster Management

1. Access cluster

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

21

The access cluster of the UDS is stateless and idempotent.

Stateless: A-Nodes do not use data layout information to process requests. Each request

can be processed by any A-Node. Therefore, requests are independent from each other.

Idempotent: A client can send the same request for multiple times, and these attempts

have no adverse impact. Therefore, users can send a failed request repeatedly until the

request is successfully processed. Most of UDS's external interfaces are idempotent.

The access cluster is decentralized and can be regarded as a loose couple of stateless A-Nodes.

Adding or removing A-Nodes from the access cluster has no impact on the other nodes in the

cluster.

The access cluster's DNS implements load balancing and health checks on A-Nodes in the

cluster. Faulty A-Nodes can be discovered in a timely manner and removed from the address

list.

2. Storage cluster

The storage cluster is also decentralized.

UDSNs synchronize information with each other using the gossip protocol. Inspired by

the form of gossip seen in social networks, the gossip protocol is used to transmit

information in large-scale clusters. When being initialized, each UDSN in the storage

cluster obtains the cluster information that records the status of each node in the cluster.

A UDSN periodically selects another UDSN at random and synchronizes cluster

information with the latter. After being synchronized, cluster information on both nodes

is combined and updated.

When synchronizing cluster information using the gossip protocol, a UDSN uses the

protected health information (PHI) fault detector to check the status of other UDSNs.

The PHI fault detector anticipates the time window of the next synchronization of a

UDSN based on previous synchronization. If cluster information of a UDSN is not

synchronized as anticipated, the PHI fault detector considers the UDSN to be faulty.

When an A-Node attempts to write data onto a UDSN that is unavailable due to faults,

the data will be written onto another UDSN temporarily and then written back to the

intended UDSN after the faults are rectified. This data write process is called the hinted

handoff mechanism. For details about the hinted handoff mechanism, see the HUAWEI OceanStor UDS Massive Storage System Technical White Paper — Smart Disks.

When a UDSN detects another UDSN to be faulty, it records the fault in a local status

table. The access cluster periodically obtains status tables from UDSNs. When detecting

that a UDSN is faulty for a long time, the access cluster removes the UDSN from the

storage cluster and updates cluster information. The updated cluster information is then

transmitted to all UDSNs in the storage cluster through the gossip protocol.

After a UDSN is removed from the storage cluster, data partitions on this UDSN are

automatically migrated to the other UDSNs in the storage cluster. According to the

consistency hash algorithm, only 1/N data needs to be migrated after a UDSN is

removed from the storage cluster consisting of N UDSNs.

When a small number of UDSNs are added to the storage cluster, data partitions on the

other UDSNs in the storage cluster are automatically migrated to the newly added

UDSNs. According to the consistency hash algorithm, only 1/N+1 data needs to be

migrated after a UDSN is added to the storage cluster consisting of N UDSNs.

3.2.3 High Security and Reliability

As the IT develops, data gradually becomes the most important asset of a company and data

loss has an unpredictable adverse impact on a company. Therefore, increasing importance is

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

attached to data protection. The UDS employs multi-tenant, Multiple Copies (MC), Erasure

Code (EC), and multiple data consistency technologies to ensure data security and reliability

in massive data storage scenarios.

3.2.3.1 High Reliability Clusters

1. Storage cluster

In the storage cluster, data on any faulty smart disk (caused by a man-made or mechanical

error) can be recovered to other smart disks automatically. A faulty smart disk can be removed

from the UDS system without affecting data availability and data slices on smart disks. Unlike

a traditional storage system whose RAID group degrades when a member disk is faulty, the

UDS works correctly when any disk is faulty.

Immediate maintenance of a faulty disk is not needed in the UDS. Instead, all faulty smart

disks can be replaced at a time and need to be maintained only after certain conditions are met,

for example:

The failure rate or the capacity usage reaches the preset threshold, and critical alarms are

generated to inform users of replacing disks in batches or expanding system capacity. For

example, the disk failure rate reaches 6% or the capacity usage exceeds 80%. You can

configure the disk failure rate and capacity usage based on site requirements. This

threshold-triggered maintenance prolongs the maintenance period and reduces

maintenance costs.

All damaged or slow disks are batch replaced based on a preset periodic maintenance

schedule. This schedule-based maintenance reduces the possibility of system faults. After

smart disks are replaced, data slices are evenly relocated to all smart disks based on the

intelligent balancing algorithm, prolonging disk lifespan, lowering disk failure rate,

reducing data loss, and improving system reliability.

2. Access cluster

All A-Nodes comprise a distributed access cluster via load balancers. Instead of controlling or

saving layout information about data and metadata, A-Nodes use the DHT-based hash

algorithm to calculate data storage locations.

This is a breakthrough in storage structure. Data layout controlled and recorded by control

nodes or engines in traditional storage architecture is no longer required in the UDS

structure where A-Nodes determine data routes based on rules. This change greatly simplifies

data processing and resolves the bottleneck in cluster scalability and reliability. Nodes in a

cluster are freed from complex synchronization and lock mechanisms that restrict a cluster's

node quantity and affects node consistency and reliability.

The decentralized architecture adopted by the UDS eliminates adverse impact on system

availability when any A-Node is faulty due to human or mechanical errors.

3.2.3.2 Multi-Level Data Protection

1. Smart disk level: disk lifecycle management

Focusing on how to lower disk failure rates and control impacts brought by disk faults, the

UDS disk lifecycle management adopts end-to-end lifecycle management technologies such

as disk detection, disk repair, disk failure control, and pre-reconstruction.

The smart disk lifecycle management greatly reduces disk failure rates, prolongs disk lifespan,

and has the following advantages:

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

23

Automatic hardware control: The UDS employs the self-managed smart disk architecture

that enables self-monitoring, self-management, and selective data synchronization of

smart disks, remarkably improving system reliability while lowering hardware failure

rates.

Lowered disk failure rate: With the disk lifecycle management, the UDS obtains disk

status in real time, separates and repairs physical and logical bad sectors in a timely

manner, and implements weight management of smart disks, striking a balance between

system data and service loads and prolonging disk lifespan.

2. UDSN level: EC redundancy algorithm

The Erasure Code (EC) redundancy algorithm developed by Huawei is a superset of

traditional RAID. Different from traditional storage systems using RAID consisting of fixed

member disks, the UDS uses EC to consolidate all disks into a unified storage resource pool.

Each time data is written onto or read from the UDS, disks are automatically selected at

random and form a temporary RAID group onto which data blocks and parity blocks

are written. Compared with traditional RAID, temporary RAID improves the overall system

performance and resource utilization.

The innovative DHT-based EC algorithm enables the UDS to provide:

Longer data durability: When the EC policy is configured to M:N = 15:6, data durability

can reach thirteen nines, minimizing data loss risks.

Faster data reconstruction speed: The UDS distributes data objects to different smart

disks based on the hash algorithm. Multiple disks are involved in the reconstruction,

increasing the capacity for concurrent reconstruction and reducing the time for

reconstructing 1 TB data to four hours. This has greatly reduced the reconstruction time

and improved system reliability.

Improved reconstruction efficiency: Data reconstruction is based on objects and only

damaged objects are reconstructed. Undamaged objects or empty regions are not

processed. In this way, the data reconstruction rate is greatly increased.

Global hot spare and batch disk replacement: The UDS uses global hot spare space

instead of hot spare disks for data reconstruction. When the failure rate of smart disks

reaches the threshold, the UDS will send alarms for disk replacement. In the UDS,

immediate maintenance is not needed, reducing quantities of spare parts and upgrading

cycles and saving inputs of maintenance personnel.

Eliminated data recovery restrictions and improved EC recoverability: Compared with

the traditional RAID algorithm, the refined EC reconstruction algorithm can effectively

eliminate restrictions on the quantity of damaged data. Any damaged data block can be

recovered fully, which enhances data durability and system reliability.

More flexible data redundancy and improved disk utilization: The available EC policy

configuration for tenants is as follows: M (the number of data blocks) ={3/6/9/12/15}, N

(the number of parity blocks) = {1/2/3/6}. The EC policy, which is defined based on user

environments, has direct impacts on data durability and provides users with flexible

redundancy ratio configuration and more choices of data durability. The EC algorithm

can flexibly control disk utilization and reduce costs. For example, when M is 12 and N

is 3, the disk utilization is almost 80%.

Enhanced storage management efficiency and lowered costs: Users do not need to spend

much time on storage planning because all the disks automatically form a unified storage

pool. Users only need to insert new disks to expand the system capacity. The UDS

automatically distributes data to each disk.

Intelligent reconstruction of EC groups and lighter system load: The UDS intelligently

determines the range and size of the damaged data and temporarily reconstructs EC

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

groups to refine the data to be recovered, which reduces system load and improves

recovery efficiency.

3. Data center level: cross-regional disaster recovery and failover

The UDS provides the Multiple Data Center (MDC) feature that enables users to access the

massive storage system on the nearest DC, maximizing resource utilization and reducing

investment in storage. The UDS also provides flexible cross-regional data redundancy

policies to prevent data loss in the event of a disaster or unexpected fault occurred on the

active DC. Upon the breakdown of an active DC, the UDS quickly resumes services of the

active DC on a standby DC, minimizing the service interruption duration and ensuring service

continuity. The UDS also provides the load balancing technology for resource management,

enabling users to maximize their resource utilization and improve the return on investment

(ROI).

The multi-tenant-based MDC feature has the following advantages:

Unified resource scheduling: Multiple DCs are globally virtualized and consolidated to a

unified resource pool for improving resource utilization. In addition, the MDC feature

uses policy-based scheduling to ensure preferential data access on the nearest DC.

SLA policy-based control: SLA policies are used to control DR paths, number of DR

backups, and quality of service (QoS), supporting customers' service choice decisions

and maintaining an optimal balance between services and resources.

Cross-regional DR: The MDC feature uses HTTP/REST interfaces to perform DR

among DCs. Data stored on a local DC can be backed up and verified on a

remote DC. Data is transmitted between DCs over optimized networks, remarkably

enhancing the DR efficiency. DCs back up for each other to optimize resource

utilization.

For details, see the HUAWEI OceanStor UDS Massive Storage System Technical White Paper

— Reliability.

3.2.3.3 Continuous Data Detection and Repair

1. Short-term fault handling

The UDS defines a special intermediate state: short-term fault state. If the UDS detects a

short-term fault, it will start the fault diagnosis mechanism and try to recover the fault. If the

recovery fails and the fault persists for more than X (X is user-configurable, for example, 15

minutes), the fault goes to the permanent fault state and the node where the fault occurs exits

from the DHT ring. The UDS then starts the data recovery policy to recover damaged data.

The short-term fault handing technology has the following advantages:

Improved data and performance stability: Only the faults that cannot be detected and

recovered by the short-term handling technology are considered permanent faults and

require adjustment of the DHT ring, reducing the workload for data recovery and

migration.

Transparent to upper-layer services and enhanced business continuity: The short-term

fault handling mechanism is invisible to upper-layer services, ensuring business

continuity and durability.

Efficient system performance utilization and increased data recovery efficiency: After a

fault is considered as a permanent fault, the UDS starts the data rebalancing process and

evenly distributes damaged data to multiple nodes through effective data distribution

control.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

25

2. Data integrity

The UDS provides multi-level data integrity protection measures and supports four-level data

repair from track, slice, object, to data center.

The data integrity protection measures effectively ensure end-to-end data durability and have

the following advantages:

Full integrity protection and improved data durability: The UDS provides end-to-end

repair measures for damaged data at four levels from track, partition, object, to data

center.

Progressive recovery measures and reduces resources used in data recovery: Based on

the degree of data damage, the UDS tries to recover the data in a range as small as

possible to reduce resources used in data recovery.

3. End-to-end consistency

The UDS supports the end-to-end data consistency check at the application level, object level,

slice level, and physical level.

With the four-level data consistency check, the UDS ensures that no silence errors occur

during the data writing process (from the time data is sent by end users to the time data

is written onto the disk). At each level, once data inconsistency is detected, the UDS can

repair or resend the data quickly.

The end-to-end data consistency check greatly improves data security and has the following

advantages:

Data will not be damaged in storage and transmission, ensuing data correctness.

Possible malicious data tampering from internal personnel of cloud storage service

providers is prevented, increasing data security.

3.2.3.4 End-to-End Data Security

The UDS ensures data security in terms of data transfer, data integrity, identity authentication,

data access control, and data encryption.

1. Data transfer

The UDS provides object-based storage interfaces that are compatible with Amazon S3

interfaces and supports Representational State Transfer (REST) interfaces. Users can upload

SSL-encrypted data to the UDS in a DC using a Huawei or third-party terminal.

2. Identity authentication

The UDS uses access key ID (AK) and secret access key (SK) to authenticate user identities.

The keyed-hash message authentication code algorithm (HMAC) is used in authentication.

Based on the HMAC algorithm, a key and a message is input and a message summary is

output.

Each client user has a pair of AK and SK. The AK is public and identifies a unique user. The

SK is used for calculating signatures. Client users are required to keep the SK safe. An

operation request sent by a client user contains the user's AK and a signature calculated using

the SK (the signature is calculated based on the HMAC-SHA1.) Upon receiving the request,

the UDS checks the AK and SK stored on it and calculates a signature using the SK. Then the

UDS compares the obtained signature with the one in the request. If the two signatures are

consistent, the authentication succeeds. Figure 3-12 shows the process of identity

authentication.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Figure 3-12 Identity authentication

3. Object access control

The UDS provides a flexible and secure data access mechanism that allows users to set

different access control policies based on bucket and object configurations. Available access

control policies are: READ, WRITE, READ_ACP (users are granted the permission to read

the access control policy), WRITE _ACP, and FULL_CONTROL.

4. Static data encryption

The current version of UDS does not support data encryption in the cloud. If sensitive data is

to be stored in the cloud, you are advised to upload the data after encrypting it locally. In the

cloud scenario, keys for encrypted data are kept on clients. Figure 3-13 shows the process of

data encryption in a non-cloud scenario.

Figure 3-13 Data encryption in a non-cloud scenario

5. Data integrity

The UDS uses digital signatures to ensure data integrity during transfer. The current version

of UDS supports object integrity signatures and the later versions will support slice integrity

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

27

signatures. The integrity of a data slice is automatically verified by the UDS and the integrity

verification of a data object must be supported by client applications.

6. Data durability

The UDS provides 99.999% data availability and 99.9999% data durability.

3.2.4 Low TCO

3.2.4.1 Energy Saving

Apart from servers and network devices that consume about half of a DC's energy, storage

devices also consume a large portion of a DC's energy. As the number of storage devices

increases, more equipment room space is occupied and a larger amount of energy will be

consumed.

With 4 TB enterprise-class disks equipped with ARM chips, each 4 U 75-slot UDSN in the

UDS provides a 300 TB capacity. A single UDS cabinet can house up to 525 disks and

provides a 2.1 PB capacity. Compared with x86 servers providing the same computing or

storage capacity, the UDS halves the CPU power consumption and equipment room space

occupation with an average power consumption of 4.2 W/TB, ranking the top in the industry.

Moreover, the UDS employs the intelligent CPU frequency control and intelligent fan speed

control technologies to maintain lower power consumption of idle storage units. With the

high-density and power-saving design, the UDS lowers the power consumption of an

equipment room by 45%, providing a comprehensive energy-saving storage solution.

3.2.4.2 Automated Management

The UDS provides a graphical management system (as shown in Figure 3-14) that

automatically manages topologies, alarms, configurations, performance, logs, and users.

Moreover, the UDS can be automatically deployed and upgraded, without the need of manual

intervention and service interruption. The automatic upgrade and deployment enable service

transparency and simplify system deployment, upgrade, and capacity expansion, greatly

improving the management efficiency while lowering management costs.

In the UDS, the minimum maintenance unit can be an ARM-based smart disk. Disk failures

have minor impact on services. Immediate maintenance is not needed. Instead, faulty smart

disks can be batch replaced after the disk failure rate reaches the preset threshold. This zero

touch maintenance reduces the quantity of spare parts and upgrade cycles and saves inputs of

maintenance personnel.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 3 Solutions

Figure 3-14 Graphical management page

3.2.4.3 Open Interfaces

The UDS provides object-based storage interfaces that are compatible with Amazon S3

interfaces and supports HTTP/HTTPS Representational State Transfer (REST) interfaces.

With the open interfaces, the UDS opens its storage space to various types of customer

applications. The UDS's underlying storage space can be accessed using standard protocols,

regardless of data storage location and data format. Moreover, as the carrier for the

multi-tenant services and multiple instances, the UDS provides customers with different

levels of secure SLA and QoS services tailored to different scenarios, enhancing customer

competitiveness.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

29

4 Experience

With the IT development, the amount of unstructured data increases exponentially, and

various types of services come out, posing demanding requirements on reliability and

openness of storage systems. Traditional storage systems fail to meet such requirements.

Based on an in-depth analysis of customer needs, Huawei launches the UDS, a massive

storage system that helps customers resolve existing or pending issues of storing the huge

amount of data. Apart from providing massive storage capacity, the UDS supports the access

of multiple types of services by providing interfaces compatible with Amazon S3 interfaces.

Based on its broad compatibility, the UDS can be tailored to various application scenarios,

boasting success stories in multiple vertical industries. Currently, the UDS can be used in the

massive resource pool solution and the centralized backup solution.

4.1 Solution 1: Massive Resource Pool

4.1.1 Typical Needs and Problems Facing Customers

As the IT industry develops by leaps and bounds, various applications come out. These

applications are closely related to our daily life. People have been accustomed to resolving

most of their routine issues on the Internet, for example, sending emails, visiting

e-communities, and paying bills online. These changes bring forth the following new

situations to the storage industry:

The data volume is growing rapidly and is estimated to reach 35 ZB by 2020.

Data that needs to be stored changes from traditional structured data (database type data)

to unstructured data (such as electronic bills).

Sources of data become increasingly diversified, covering various services like SMS,

micro blogs, medical images, and scientific data.

Diversified data storage poses higher requirements on data reliability.

Different storage systems are used to store data from different types of services. Storage

vendors provide complex storage management systems for self interests. As a result,

customers have to arrange quite a number of IT management personnel to maintain

heterogeneous storage systems and networks, causing the TCO to soar.

The maintenance cost remains high as the number of storage devices increases.

Storage devices keep working for a long time at high power consumption and measures

are taken to adjust working temperature and dissipate heat in equipment rooms, leading

to constant rise of electricity fees in data centers.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

To cope with the preceding challenges, customers require a new generation of storage system

that works in new a pattern. It is expected that storage systems of the next generation have the

following features:

Massive data storage

Infinite storage capacity expansion without affecting performance

Storage for data of various services

Highly reliable

Thin provisioning for flexible space expansion

Simple storage management

Efficient and energy-saving

4.1.2 Solution

The UDS provides the massive resource pool solution to resolve the preceding problems and

meet the preceding needs. The solution is oriented to various upper-layer services, providing

customers with storage capacity for various types of service data. The solution requires that

upper-layer services comply with interface specifications, as the following described:

The UDS provides unified storage space to store data of various services, including files,

videos, and images.

The UDS adopts the scale-out architecture for flexible capacity expansion, allowing the

system capacity to be easily expanded from the minimum capacity of 448 TB to exabyte

level for massive data storage.

The performance is improved in line with capacity expansion to prevent performance

bottlenecks caused by increasing data.

Online capacity expansion is supported to ensure hitless services.

Adopting the high-density design, the UDS provides 30% more device space than x86

peers. The energy consumption of UDS equipment rooms is 30% lower than that

consumed by equipment rooms accommodating traditional storage devices.

All UDS nodes are clustered and multiple data protection technologies (such as MC and

EC) and automatic fault detection and repair technologies are employed to ensure system

reliability.

With its external interfaces compatible with Amazon S3 interfaces, the UDS can

interconnect with multiple services (such as file and image services) in various scenarios.

Each service has its specific storage space needs. The UDS supports thin provisioning

that allows flexible space expansion, meeting storage needs of different services while

preventing a waste of storage space caused by space assignment by fixed quota.

The UDS employs S3 authentication, data encryption, and access control to ensure data

security.

The UDS provides device administrators with a web UI management tool that manages

cloud storage devices and upper-layer services in a unified manner. Administrators no

longer need to use different tools to manage different services. This unified management

means frees administrators from heavy management work and lowers the TCO. Service

operators no longer need to manage complex storage devices but concentrate on service

operation. In this way, more values are created.

Figure 4-1 shows the massive resource pool scenario.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

31

Figure 4-1 Massive resource pool scenario

4.1.3 Software and Hardware Configurations

This section provides a list that describes the devices, interfaces, and software to be

configured.

[Sample is omitted.]

Table 4-1 Software and hardware configurations of the massive resource pool solution

Location Hardware/Software Model Quantity Remarks

Equipment

room in

XXX DC

A-Node RH2288 3

UDSN UDSN 14

SATA disk 2 TB 224

Service switch S6748 2

Management switch S3728 1

Cabinet 1

Optical module 4

Distributed storage

software

1

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

System management

software

1

License 10

4.1.4 Benefits

The UDS massive resource pool solution enables customers to reliably store massive data and

simplify complex storage management, creating greater values. The benefits of the solution

include:

Massive capacity: The UDS provides storage for massive data and a storage capacity up

to exabyte level.

Security and reliability: To ensure multi-level security and reliability from storage nodes,

cabinets, to data centers, the UDS employs user authentication, data transmission

encryption, data integrity check, MC, and EC to protect data in an all-around way.

Low TCO: The UDS employs power-saving ARM chips to reduce energy consumption

per capacity and green technologies such as disk spin-down and intelligent fan speed

control to lower the power consumption of the entire system. Besides, a unified web

management page is provided to centrally manage all massive storage devices. This

unified management means simplifies storage management and cuts down investment in

IT personnel training and device maintenance, reducing the TCO.

4.2 Solution 2: Centralized Backup

4.2.1 Typical Needs and Problems Facing Customers

Backup is an important means for data protection. It is widely used in various application

scenarios and vertical industries, for example, file backup, database backup, banking data

backup, and transportation data backup. Media used by backup systems vary. The backup

media can be tapes, virtual tape libraries (VTLs), CD-ROMs, and disk arrays. All mainstream

storage vendors launch their backup storage products, for example, Symantec's NetBackup

and Backup Exec, and CommVault's Simpana. However, traditional backup systems have the

following disadvantages:

1 Dedicated personnel must be arranged to manage and maintain tapes of physical tape

libraries.

2 Physical tape libraries must be maintained periodically.

3 Data can only be recovered to the state since the last tape backup.

4 Data in physical tapes can only be sequentially accessed, causing long backup and

recovery windows.

5 Power consumption of physical tape libraries is low but that of VTL is high.

6 Tape libraries, particular VTLs, have limited capacities.

7 Both physical and virtual tape libraries cannot be infinitely expanded. New devices have

to be purchased.

8 Tape drives and robot arms of physical tape libraries may be damaged and storage array

engines and disks of VTLs may be faulty.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

33

9 Physical tape libraries must be carefully relocated.

To cope with the preceding challenges, customers require a new backup solution that has the

following capabilities:

High reliability: Storage of backup data is secure and reliable, and backup data is highly

available in recovery.

Ease of manage: A GUI-based web management system is provided to manage all

backup tasks and hardware devices in a unified manner. Self checks are periodically

initiated and alarms are automatically reported once faults are detected. A backup

task-oriented service process is set up to simplify backup and recovery.

Easy expansion: Capacities can be expanded from the initial minimum configurations to

large capacities without affecting performance.

Low cost: Massive data can be backed up at low power consumption and TCO.

4.2.2 Solution

The UDS provides the centralized backup solution to resolve the preceding problems and

meet the preceding needs. The solution is oriented to massive data scenarios, providing

customers with a comprehensive backup solution for ensuring data security. The solution

requires that upper-layer services comply with interface specifications. Currently, the UDS

has passed the interface tests of Symantec's NetBackup and CommVault's CV backup

software. Later, the UDS will participate in more interface tests of new backup software. The

cloud backup solution is described as follows:

The UDS adopts the scale-out architecture for flexible capacity expansion, allowing the

system capacity to be easily expanded from the minimum capacity of 448 TB to exabyte

level for massive data storage.

The performance is improved in line with capacity expansion to prevent performance

bottlenecks caused by increasing data.

Underlying massive storage space interwork with upper-layer backup software for

massive data backup solutions.

All UDS nodes are clustered and multiple data protection technologies (such as MC and

EC) and automatic fault detection and repair technologies are employed to ensure system

reliability.

The underlying massive resource pool provides highly reliable and low-cost storage

space.

Backup resources are allocated on demand to make full use of storage capacities.

Data can be recovered to a specific point-in-time without the need of sequential access,

saving recovery time and improving recovery efficiency.

Upper-layer backup services support various backup types such as files, databases, and

applications.

The entire solution can be automatically and quickly deployed across regions.

Backup resources across regions can be managed and schedules in a unified manner.

Solution components and services can be centrally managed.

Figure 4-2 shows the scenario of the centralized backup solution.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Figure 4-2 Scenario of the centralized backup solution

4.2.3 Software and Hardware Configurations

This section provides a list that describes the devices, interfaces, and software to be configured.

[Sample is omitted.]

Table 4-2 Software and hardware configurations of the centralized backup solution

Location Hardware/Software Model Quantity Remarks

Equipment

room in

XXX DC

A-Node RH2288 3

UDSN UDSN 14

SATA disk 2 TB 224

Service switch S6748 2

Management switch S3728 1

Cabinet 1

Optical module 4

Distributed storage

software

1

Desktop data backup

software

1

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

35

System management

software

1

License 10

Backup servers are provided by customers and are not listed in the preceding table.

4.2.4 Benefits

The UDS backup solution meets customers' requirements for massive data backup by

combining a massive storage system with upper-layer backup software. It also reliably

protects customers' data by providing diversified data reliability mechanisms. This solution is

economical, efficient, and easy-to-manage, saving customers' costs and creating larger value

for customers.

Massive capacity: The UDS provides storage for massive data and a storage capacity up

to exabyte level.

Low TCO: The UDS employs power-saving ARM chips to reduce energy consumption

per capacity and green technologies such as disk spin-down and intelligent fan speed

control to lower the power consumption of the entire system. Compared with VTLs, the

UDS enables the same backup software to provide a larger storage capacity, without the

need to add new backup software for capacity expansion, lowering the TCO.

High reliability: The UDS provides multiple data protection technologies such as MC,

EC, and MDC to ensure data reliability. Data can be recovered when any storage device,

cabinet, or data center is faulty, minimizing the recovery point objective (RPO). Besides,

all nodes in the UDS are deployed in clusters such as service cluster, switch cluster, and

storage cluster, eliminating single points of failure. Data can be restored upon lost and

the recovery time is not affected by any device fault, minimizing the recovery time

objective (RTO).

Ease of manage: A GUI-based web management system is provided to manage all

backup tasks and hardware devices in a unified manner. Self checks are periodically

initiated and alarms are automatically reported once faults are detected. A backup

task-oriented service process is set up to simplify backup and recovery.

High efficiency: Storage capacities are allocated on demand and expanded dynamically,

making full use of storage space and improving storage efficiency.

4.3 Solution 3: Web Disk

4.3.1 Typical Needs and Problems Facing Customers

With the in-depth development of the information society, individuals and enterprises have a

greater need of information sharing and exchanging. Existing information sharing platforms

fail to meet challenges brought by the increasing amount of data and diversified data storage

services. The challenges are as follows:

Individual and enterprise data increases rapidly and more types of data come out.

Enterprises are in urgent need of collaborative office and data backup.

Individual or enterprise data can hardly be accessed from mobile terminals such as

mobile phones.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Enterprise branches are scattered in different locations, causing difficulty in sharing

data. Diversified methods of data sharing cannot be centrally managed.

Data security must be ensured during access to individual and enterprise data over the

Internet.

An all-in-one solution is called for to store individual and enterprise data.

To cope with the preceding challenges, customers require a new data access solution that has

the following features:

For cloud service providers:

− Provides multiple access methods, such as data access from browsers, PCs and

mobile phone clients.

− Supports routine O&M, self-service, metadata billing, and interconnection with

network management systems.

For large- and medium-sized enterprises, governments, and scientific institutions:

− Implements desktop data protection.

− Provides a unified platform to store mission-critical information assets.

− Ensures that data is securely accessed anytime anywhere.

− Meets the requirements of on-demand data sharing among employees and branches.

4.3.2 Solution

The UDS provides the web disk solution to resolve the preceding problems and meet the

preceding needs. The solution uses the object-based storage technology to provide terminal

users with online storage services over IP-based networks. The solution is cost effective and

flexibly scalable, and provides a strong consolidation capability. It customizes user-oriented

and large-capacity individual DCs that are secure, speedy, and easy-to-use, providing

enterprise users with secure, reliable, economical, and easy-to-use web services that can be

quickly deployed during production and collaborative office. The cloud backup solution is

described as follows:

Underlying storage system consists of loosely coupled A-Nodes and UDSNs. A-Nodes

are used for data scheduling, that is, distributing data requests from upper-layer services

to UDSNs. UDSNs are used for data storage. A-Nodes and UDSNs are deployed in high

availability clusters. 4 U 75-slot UDSNs support enterprise-class SATA disks.

With the scale-out architecture, the capacity of the UDS can be flexibly expanded from

the initial minimum 300 TB to exabyte level. Performance can grow in line with

capacities, which eliminates performance bottlenecks caused by data growth. Moreover,

the UDS capacity can be expanded online without interrupting services.

Multiple access means are provided and the web disk clients support multiple operating

systems and browsers.

File sharing and synchronization are supported to meet requirements of data sharing

among multiple users and groups.

Interfaces compatible with Amazon S3 interfaces are provided for interconnection with

third-party services.

A comprehensive operation management system provides multi-level administrator

management, self-services such as web disk service subscription, activation, and

termination, and metadata of traffic and capacity billing.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

37

Comprehensive web disk management tools are provided to centrally manage underlying

storage resources covering system monitoring, logs, and alarms, simplifying

administrators' management work.

Figure 4-3 shows the scenario of the web disk solution.

Figure 4-3 Scenario of the web disk solution

4.3.3 Software and Hardware Configurations

[Sample is omitted.]

Table 4-3 Software and hardware configurations of the web disk solution

Location Hardware/Software Model Quantity Remarks

Equipment

room in

XXX DC

A-Node T3200 2

UDSN UDSN 4

Smart disk 4 TB 300

Service switch S6724 (for

enterprise

markets)/S6

324 (for

carrier

markets)

2

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Cloud storage

service-system data

node

4

Cloud storage

service-computing

node

4

Service switch for web

disks

S5700 (for

enterprise

markets)/S5

300 (for

carrier

markets)

2

Cabinet 2

Optical module 12

Optical fiber 12

UDS massive storage

system software – per

terabyte license (0 TB

to 500 TB)

500

UDS massive storage

system software – per

terabyte license (501

TB to 1000 TB)

500 Increases

with the

system

capacity. For

details, see the

quotation

template.

UDS massive storage

system software – per

terabyte license (1001

TB to 5000 TB)

200 Increases

with the

system

capacity.

For details, see

the quotation

template.

HUAWEI cloud

storage web disk – user

access subsystem

software license per

node

4

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

39

HUAWEI cloud

storage web disk – data

storage subsystem

software license per

node

4

4.3.4 Solution Network

This section provides the topology of the solution. Figure 4-4 shows an example topology. You

can modify the topology based on actual networks and upper-layer services.

Figure 4-4 Network diagram of the web disk solution

4.3.5 Benefits

The UDS web disk solution customizes massive storage systems for web disk services and

client management, meeting customers' requirements for an integrated E2E solution. The web

disk solution not only applies to carriers for external operation but also to enterprises for

internal use.

For cloud service providers:

− Enhanced customer loyalty

− More revenue streams from new services

− Web-based storage platforms that integrate various services to improve

competitiveness

For medium- and large-sized enterprises, governments, and scientific institutions:

− E2E data security

− Cross-region file sharing platform for higher working efficiency

− Permission- and domain-based management and organization structure import for

agile adaptation to service changes

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

− Resource statistics and report query that support O&M decision-making

4.4 Solution 4: Centralized Active Archiving

4.4.1 Typical Needs and Problems Facing Customers

Data must be archived when its amount reaches a certain degree. Both carriers and enterprises

are faced with the following problems in data archive:

Archive systems cannot be expanded in line with growing data.

Data is archived offline, causing inefficient management, query, and access.

Archive media are prone to damage caused by environments and climates.

To address the preceding problems, customers need a new archive solution that ensures:

Online archive data available anytime

Easy management, query, and archiving

Less expensive archive media

4.4.2 Solution

The UDS provides the active archive solution to resolve the preceding problems and meet the

preceding needs. Active archive means that data can be archived online. Active data refers to

hotspot data in the archived data. In this solution, archive data can be online.

Archiving software can be provided by customers or third-party vendors. Huawei has a

partner that provides the active archiving software.

A-Nodes and UDSNs are loosely coupled in the UDS. A-Nodes schedule data and

distribute upper-layer data requests to UDSNs and UDSNs are responsible for data

storage. Both A-Nodes and UDSNs can be deployed in clusters. High-density 4 U 75-slot

UDSNs support enterprise-class SATA disks.

With the scale-out architecture, the capacity of the UDS can be flexibly expanded from

the initial minimum 300 TB to exabyte level. Performance can grow in line with

capacities, which eliminates performance bottlenecks caused by data growth. Moreover,

the UDS capacity can be expanded online without interrupting services.

The UDS also provides automatic management tools for system administrators. The

automatic management tools can centrally manage all storage devices and their

upper-layer services, simplifying the administrative work and cutting the TCO.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

41

Figure 4-5 Scenario of the active archive solution

4.4.3 Software and Hardware Configurations

This section provides a list that describes the devices, interfaces, and software to be

configured.

[Sample is omitted.]

Table 4-4 Software and hardware configurations of the active archive solution

Location Hardware/Software Model Quantity Remarks

Equipment

room in

XXX DC

Service node T3200 2

A-Node T3200 2

UDSN UDSN 4

Smart disk 4 TB 300

Service switch S6724 (for

enterprise

markets)/S63

24 (for carrier

markets)

2

Cabinet 1

Optical module 12

Optical fiber 12

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

UDS massive storage

system software – per

terabyte license (0 TB

to 500 TB)

500

UDS massive storage

system software – per

terabyte license (501

TB to 1000 TB)

500 Increases

with the system

capacity. For

details,

see the quotation

template.

UDS massive storage

system software – per

terabyte license (1001

TB to 5000 TB)

200 Increases

with the

system capacity.

For details,

see the

quotation template.

4.4.4 Solution Network

This section provides the topology of the solution. Figure 4-6 shows an example topology. You can modify the topology based on actual networks and upper-layer services.

Figure 4-6 Network diagram of the active archive solution

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 4 Experience

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

43

4.4.5 Benefits

Archiving media is less expensive than the primary storage media, giving full play to

storage media of different price/performance ratios and cutting down the TCO.

Data is accessible at any time and data migration is transparent to ongoing services.

Archive data can be quickly and easily managed, queried, and accessed.

Storage systems are flexibly scalable to cope with explosive data growth.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 5 Conclusion

5 Conclusion

Massive data has become an irreversible IT development trend. All industries call for secure

and reliable storage of massive data. To address customers' problems and challenges, Huawei

launches the UDS massive storage system.

With industry-leading scale-out distributed storage architecture and the DHT algorithm, the

UDS has no storage engines (controllers) used in traditional storage systems and distributes

data onto different storage nodes. This architecture eliminates performance bottlenecks caused

by storage controllers. Besides, the UDS supports hitless capacity expansion up to exabytes.

System performance can grow linearly with capacities. The UDS has a significant cost

advantage over traditional storage systems when the data amount reaches a certain scale. With

all outstanding features, the UDS perfectly fits massive data storage scenarios.

With broad compatibility, the UDS can be used in diversified solutions tailored to different

scenarios. Various types of upper-layer applications can access the UDS and use the

underlying massive storage space of the UDS through UDS's external interfaces that are

compatible with Amazon S3 interfaces.

Data security is the focus of massive data scenarios. The UDS provides various data

protection mechanisms such as MC and EC to ensure data security, boosting customers'

confidence in the cloud era.

HUAWEI OceanStor UDS Massive Storage System

Technical White Pater 6 Acronyms and Abbreviations

Issue 01 (2014-04-106) Huawei Proprietary and Confidential

Copyright © Huawei Technologies Co., Ltd..

45

6 Acronyms and Abbreviations

Table 6-1 Acronyms and abbreviations

Acronyms and Abbreviations Full Spelling

UDS Universal Distributed Storage

SoD Sea of Disks

A-Node Access Node

UDSN Universal Distributed Storage Node

ACL Access Control List

AK Access Key ID

SK Secret Key