34
An Oracle White Paper April 2010 Sun SPARC Enterprise M3000 Server Architecture

Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

  • Upload
    hakhanh

  • View
    216

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

An Oracle White Paper April 2010

Sun SPARC Enterprise M3000 Server Architecture

Page 2: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Introduction ......................................................................................... 1

Sun SPARC Enterprise M3000 Server Overview............................ 3

Meeting the Needs of Commercial and Scientific Computing ......... 3

System Architecture ............................................................................ 4

System Component Overview......................................................... 4

System Outline................................................................................ 8

System Bus Architecture—Jupiter Interconnect.................................. 9

Jupiter Interconnect Architecture .................................................... 9

Performance of Sun SPARC Enterprise M3000 Server ................ 10

SPARC64 VII Processor ................................................................... 11

SPARC64 VII Overview ................................................................ 11

SPARC64 VII Microarchitecture.................................................... 12

Details of the Microarchitecture..................................................... 13

Cache System............................................................................... 16

Reliability, Availability, and Serviceability Functions ..................... 17

I/O Subsystem................................................................................... 20

I/O Subsystem Architecture .......................................................... 20

Reliability, Availability, and Serviceability.......................................... 21

Redundant and Hot-Swap Components ....................................... 21

Advanced Reliability Features....................................................... 22

Error Detection, Diagnosis, and Recovery .................................... 22

System Management ........................................................................ 23

eXtended System Control Facility ................................................. 23

Page 3: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Oracle Enterprise Manager Ops Center Software......................... 25

Oracle Solaris 10............................................................................... 26

Observability and Performance..................................................... 26

Availability ..................................................................................... 27

Security ......................................................................................... 28

Virtualization and Resource Management .................................... 28

Conclusion ........................................................................................ 30

Page 4: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Introduction Organizations now rely on technology more than ever before. Today, computer systems play a critical role in every function from product design to customer order fulfillment. In many cases, business success is dependent on the continuous availability of IT services. Once only required in pockets of data centers, mainframe-class reliability and serviceability are now essential for systems throughout an enterprise. In addition, powering data center servers and keeping services running through a power outage are significant concerns. On the other hand, the environment is also playing a key role in such considerations, in areas that include, for example, power conservation and miniaturization, amid demand to reduce the load on the environment. New computer systems that consume less power and that emit less greenhouse gases can play an essential role in protecting the environment.

Although availability is a top priority, costs must also remain within budget and operational familiarity maintained. To deliver networked services as efficiently and economically as possible, organizations look to maximize use of every IT asset through consolidation and virtualization strategies. As a result, modern IT system requirements reach far beyond simple measures of compute capacity. Organizations need highly flexible servers with built-in virtualization capabilities and associated tools, technologies, and processes that work to optimize server use. With budgets still in mind, new computing infrastructures must also help protect current investments in technology and training.

.

1

Page 5: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

A High-Performance, High-Reliability, Ecologically Sustainable Server: Introducing the Sun SPARC Enterprise M3000 Server Oracle�’s Sun SPARC Enterprise servers are highly reliable, easy-to-manage, vertically scalable systems with all the benefits of traditional mainframes�—without the associated cost, complexity, or vendor lock-in (Figure 1). In fact, Sun SPARC Enterprise servers deliver mainframe-class system architecture at open system prices.

The Sun SPARC Enterprise M3000 server is the entry-class model that has many characteristics of Sun SPARC Enterprise servers, and shares benefits such as operability and manageability common to the servers. With symmetric multiprocessing (SMP), 32 GB memory subsystem, and high-throughput I/O architecture, the server can ensure core business operations. Further, the server runs the powerful Oracle Solaris 10 operating system (OS) and includes leading virtualization technologies. Through the innovative Oracle Solaris Containers virtualization technology, the server brings sophisticated resource control to an open systems platform.

The server combines high performance, high quality, and ecological sustainability with a resilient system architecture, the advanced functions of Oracle Solaris 10, a compact form factor (two rack units [2U] in a rack cabinet), and the top CPU power in the entry class of servers. Moreover, Sun SPARC Enterprise servers offer improved performance over the previous generations of Sun servers, with a clear upgrade path that protects existing investments in software, training, and data center practices. By taking advantage of the Sun SPARC Enterprise M3000 server, IT organizations can create a more-powerful infrastructure, optimize hardware use, and increase application availability�—resulting in lower operational costs and risks.

Figure 1. Sun SPARC Enterprise M3000, M4000, M5000, M8000, and M9000 servers include many features to help improve uptime,

application performance, and data center efficiency.

2

Page 6: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Sun SPARC Enterprise M3000 Server Overview

The Sun SPARC Enterprise M3000 server offers numerous power, reliability, and energy-saving characteristics useful to enterprises. The Sun SPARC Enterprise M3000 server features an SMP design that uses the latest generation of SPARC64 processors connected to memory and I/O by a new high-speed, low-latency system interconnect, which delivers exceptional throughput to software applications. Characteristics of the Sun SPARC Enterprise M3000 are found in Table 1. Also architected to reduce unplanned downtime, this server includes stellar reliability, availability, and serviceability (RAS) capabilities to avoid outages and reduce recovery times. Design features, such as high-performance CPU and data path integrity, Memory Extended ECC, end-to-end data protection, hot-swappable components, fault-resilient power options, and hardware redundancy, boost the reliability of this server. The environment-conscious design of the Sun SPARC Enterprise M3000 server offers numerous benefits with the aim of energy consumption reduction that enterprises and data centers require. With the adoption of the SPARC64 VII processor, which achieves low power consumption while demonstrating high performance, and a structural design of improved cooling efficiency and cooling control, the server realizes power saving, space saving, and a quiet operation that reduces the environmental load.

TABLE 1. CHARACTERISTICS OF SUN SPARC ENTERPRISE M3000 SERVER

ENCLOSURE TWO RACK UNITS

SPARC64 VII processors 2.52 GHz

5 MB Level 2 cache

Four cores

Memory Up to 32 GB

Eight DIMM slots

Internal I/O slots Four PCIe

External I/O chassis None

Internal storage Serial Attached SCSI

Up to four drives

Dynamic system domains Maximum of one

External I/O connections One SAS port

Meeting the Needs of Commercial and Scientific Computing

Suiting a wide range of computing environments, the Sun SPARC Enterprise M3000 server provides the availability features needed to support commercial computing workloads along with the raw performance demanded by the high-performance community (Table 2).

3

Page 7: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

TABLE 2. SAMPLE WORKLOADS FOR THE SUN SPARC ENTERPRISE M3000 SERVER

Adaptive services Business processing (ERP, CRM, OLTP, batch)

Database

Decision support

Data mart

Web services

System and network management

Application development

Application services

Scientific engineering

System Architecture

Continually challenged to do more with less, IT organizations realize that meeting processing requirements with fewer, more-powerful systems holds economic advantages. In the Sun SPARC Enterprise M3000 server, the system interconnect processors, memory subsystem, and I/O subsystem work together to create a reasonably priced, high-performance platform.

System Component Overview

The design of the Sun SPARC Enterprise M3000 server specifically focuses on delivering high reliability, outstanding performance, and true SMP throughput. The characteristics and capabilities of every subsystem work toward this goal. The high-bandwidth system bus, powerful SPARC64 VII processor chips, high-density memory option, and high-speed PCI Express (PCIe) provide not only reliable performance for enterprise applications, but also high-level operational time and throughput.

System Interconnect

Based on mainframe technology, the Jupiter system interconnect enables high performance and reliability for the Sun SPARC Enterprise M3000 server. The system controller provides point-to-point connections among the CPU, memory, and I/O subsystems. The system interconnect delivers as much as 17 GB/sec of peak bandwidth, offering true SMP throughput. Additional technical details about the system interconnect are found in the section titled �“System Bus Architecture�—Jupiter Interconnect.�”

SPARC64 VII Processor

The Sun SPARC Enterprise M3000 server uses the SPARC64 VII processor developed by Fujitsu. The SPARC64 VII processor, which has a multicore and multithreading architecture, has been designed based on experience in the mainframe computer field accumulated over several

4

Page 8: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

decades in the pursuit of excellence in reliability and speed. It adopts advanced technology (65 nm for the SPARC64 VII), which realizes a maximum consumption of 135 watts. Additional technical details about the SPARC64 VII processor are found in the section titled �“SPARC64 VII Processor.�”

Memory

The memory subsystem of the Sun SPARC Enterprise M3000 server accommodates up to 32 GB of memory. The server uses DDR2 DIMMs with two-way memory interleave to enhance system performance. Available DIMM sizes include 1 GB, 2 GB, and 4 GB. Further details about the memory subsystem of the Sun SPARC Enterprise M3000 server are listed in Table 3.

TABLE 3. SUN SPARC ENTERPRISE M3000 SERVER MEMORY SUBSYSTEM SPECIFICATIONS

Maximum memory capacity 32 GB

DIMM slots 8

Bank size 4 DIMMs

Number of banks 2

Beyond performance, the memory subsystem of the Sun SPARC Enterprise M3000 server is built with reliability in mind. ECC protection is implemented for all data stored in main memory, and the following advanced features foster early diagnosis and fault isolation that preserve system integrity and raise application availability:

Memory patrol. Memory patrol periodically scans memory for errors. This proactive function prevents the use of faulty areas of memory before they can cause system or application errors, improving system reliability.

Memory Extended ECC. The Memory Extended ECC function provides single-bit error correction, supporting continuous processing despite events such as burst read errors, which are sometimes caused by memory device failures.

PCI Express Technology

The Sun SPARC Enterprise M3000 server uses a PCI bus to provide high-speed data transfer within the I/O subsystem. To support PCIe expansion cards, the server uses a PCIe physical layer (PCIe PHY) ASIC to manage the implementation of the PCIe protocol. PCIe technology doubles the peak data transfer rates of the original PCI technology and reaches the maximum throughput of 20 Gb/sec. In fact, PCIe was developed to accommodate high-speed interconnects such as Fibre Channel, Infiniband, and Gigabit Ethernet. Additional technical details about the I/O subsystem are found in the section titled �“I/O Subsystem.�”

5

Page 9: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Service Processor eXtended System Control Facility

Simplifying management of computer systems leads to higher availability levels for hosted applications. With this in mind, the Sun SPARC Enterprise M3000 server includes the eXtended System Control Facility (XSCF). The XSCF consists of a dedicated processor that is independent of the server and runs the XSCF Control Package (XCP) to provide remote monitoring and management capabilities. This service processor regularly monitors environmental sensors, provides advanced warning of potential error conditions, and executes proactive system maintenance procedures as necessary. Although power is supplied to the server, the XSCF constantly monitors the platform even when the system is inactive.

The XCP enables audit administration, hardware control capabilities, hardware status monitoring, reporting, and handling, automatic diagnosis, and domain recovery. Additional technical details about the XSCF and XCP are found in the section titled �“System Management.�”

Power and Cooling

The Sun SPARC Enterprise M3000 server uses separate modules for power and cooling. Sensors placed throughout the system measure the temperatures at processors and key ASICS as well as the exhaust temperature. Hardware redundancy in the power and cooling subsystems, combined with environmental monitoring, keeps the server operating even under power or fan fault conditions.

Fan Unit

The Sun SPARC Enterprise M3000 server uses fully redundant, hot-swap fans as the primary cooling system (Table 4). If a single fan fails, the XSCF detects the failure and switches the remaining fans to high-speed operation to compensate for the reduced airflow. The server can operate normally under these conditions, allowing ample time to service the failed unit. Replacement of fan units can occur without interrupting application processing.

TABLE 4. POWER AND COOLING SPECIFICATIONS OF SUN SPARC ENTERPRISE M3000 SERVER

Fan units

Two fan units

Two 80 mm fans

1+1 redundant

Power supplies 470 watts of rated power

Two units

1+1 redundant

Single-phase

Power cables Two power cables

1+1 redundant

6

Page 10: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Power Supply

The use of redundant power supplies and power cables adds to the fault resilience of the Sun SPARC Enterprise M3000 server (Table 4). Power is supplied to the server by redundant hot-swap power supplies, enabling continuous server operation even if a power supply fails. Because the power units are hot-swappable, they can be replaced during system operation.

Optional Dual Power Feed

Although organizations can control most factors within the data center, utility outages are often unexpected. The consequences of loss of electrical power can be devastating to IT operations. To enable organizations to reduce the impact of such incidents, the Sun SPARC Enterprise M3000 server is dual power feed capable. The AC power subsystem in this server is completely duplicated, providing optional reception of power from two external, independent AC power sources. The use of a dual power feed ensures that server operations are not affected, even after a single power grid failure. Therefore, the server can continue to be used. Though the dual power feed system and redundant power supply system are not compatible, the redundancy feature of either system increases system availability.

Operator Panel

The Sun SPARC Enterprise M3000 server features an operator panel, which has the following functions:

Displaying server status

Storing server identification and user setting information

Changing between operational and maintenance modes

Turning on power supplies for a domain

During server startup, the front panel LED status indicators monitor the XSCF and server operation (Figure 2).

7

Page 11: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Figure 2. The server operator panel of the Sun SPARC Enterprise M3000.

System Outline

The Sun SPARC Enterprise M3000 server is an economical, high-power compute platform with enterprise-class features. This server is designed to reliably carry data center workloads that undertake core business operations. The Sun SPARC Enterprise M3000 server enclosure measures 2U and supports one processor chip and 32 GB of memory. The SPARC64 VII (four cores) processor chip is mounted. In addition, the server features four short internal PCIe slots, four internal disk drives, one internal DVD drive, and an external SAS port for attaching addition storage or tape device. Two power supplies and two fan units power and cool the server. Front and rear views and a component diagram of the Sun SPARC Enterprise M3000 server are found in Figure 3 and Figure 4.

Figure 3. Sun SPARC Enterprise M3000 server components.

8

Page 12: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Figure 4. Front and rear views of the Sun SPARC Enterprise M3000 server.

System Bus Architecture—Jupiter Interconnect

The ability to deliver fast, predictable performance for a broad set of CPU applications rests largely on the capabilities of the system bus. The Sun SPARC Enterprise M3000 server uses a system interconnect designed to deliver consistent low latency. The Jupiter system bus benefits IT operations by delivering balanced and predictable performance for application workloads.

Jupiter Interconnect Architecture

The Jupiter interconnect design maximizes the overall performance of the Sun SPARC Enterprise M3000 server. Implemented as point-to-point connections that use packet-switched technology, this system bus provides fast response times by transmitting multiple data streams. Packet switching allows the interconnect to operate at a much-higher systemwide throughput by eliminating �“dead�” cycles on the bus. All routes are unidirectional, contention-free paths with multiplexed addresses, data, and control plus ECC in each direction.

System controllers within the Jupiter interconnect architecture direct traffic among CPUs, memory, and I/O subsystems.

9

Page 13: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

System Interconnect Reliability Features

The built-in redundancy and reliability features of the Sun SPARC Enterprise M3000 server system interconnect enhance the stability of this server. The Jupiter interconnect protects against loss or corruption of data with full ECC protection on all system buses and in memory. When a single-bit data error is detected in a CPU, memory, or an I/O controller, hardware corrects the data and performs the transfer.

Sun SPARC Enterprise M3000 System Interconnect Architecture

The Sun SPARC Enterprise M3000 system is implemented within a single motherboard. This server design features one logical system board with one system controller. The system controller is connected to CPUs, memory, and the I/O controller (PCIe bridge), as shown in Figure 5.

Figure 5. Sun SPARC Enterprise M3000 server system interconnect.

Performance of Sun SPARC Enterprise M3000 Server

The high bandwidth and overall design of the Jupiter system interconnect maximize the performance of the Sun SPARC Enterprise M3000 server. Theoretical peak system throughput,

10

Page 14: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

I/O bandwidth numbers, and stream benchmark results for the Sun SPARC Enterprise M3000 server are found in Table 5.

TABLE 5. PERFORMANCE OF SUN SPARC ENTERPRISE M3000 SERVER

Theoretical system bandwidth at peak timea (GB/sec) 17

Theoretical I/O bandwidth at peak timeb (GB/sec) 4

Triad results of stream benchmark (GB/sec) 4.5

Copy results of stream benchmark (GB/sec) 5.6

aThe theoretical system bandwidth at peak time is calculated by multiplying the bus width by the bus frequency between the

system controller and memory. bThe theoretical I/O bandwidth at peak time is calculated by multiplying the bus width by the bus frequency between the system

controller and PCI bridge.

SPARC64 VII Processor

The SPARC64 Series consists of SPARC processors developed by Fujitsu for UNIX servers. Customers have realized high-reliability technology�—consistent with the mainframe class�—and a frequency exceeding 1 GHz with the SPARC64 V. The SPARC64 VI has realized high throughput by using the SPARC64 V as a base and incorporating a two-core by two-thread architecture. The throughput of the latest SPARC64 VII has been improved further by incorporating a four-core architecture and by modifying the multithreading mechanism. The Sun SPARC Enterprise M3000 server uses this SPARC64 VII processor.

SPARC64 VII Overview

The SPARC64 VII is the latest processor developed by Fujitsu for the SPARC64 Series. It uses 65 nm technology and has an operating frequency of 2.5 GHz. The chip measures 21.3 mm by 20.9 mm and has four built-in cores with a shared 5 MB Level 2 (L2) cache configuration. The operating power consumption is 135 watts.

Fujitsu designed the SPARC64 VII for increased throughput while maintaining the high performance and high reliability that have been realized with the existing SPARC64 VI. For increased throughput, the number of built-in cores has been increased from two to four, and the multithreading mechanism to be used has been changed from vertical multithreading (VMT) to simultaneous multithreading (SMT). The L2 cache is configured to be shared by the four cores, and the throughput has been doubled so that data can be supplied to the four cores. Also, especially with the field of high-performance computing in mind, an intercore high-speed synchronization mechanism called hardware barrier has been implemented.

11

Page 15: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

SPARC64 VII Microarchitecture

This section provides an overview of the microarchitecture of the SPARC64 VII. Although the basic structure of the core pipeline of the SPARC64 VII is the same as that of the SPARC64 VI, it uses SMT technology instead of VMT technology to implement multithreading. As shown in Figure 6, the SPARC64 VI processor takes advantage of VMT technology to execute two threads in parallel�—only one thread is active at any given time. Within the VMT model, a latency event or specific trigger must occur for processing to switch over to the alternate thread. By implementing SMT technology, both threads within each core on the SPARC VII processor can execute simultaneously. As a result, the SPARC VII offers the potential to achieve greater throughput and performance. As shown in Figure 7, two threads can be executed simultaneously on each of the four cores.

Figure 6. SPARC64 VI VMT processing mode.

In the SMT design, Fujitsu focused on eliminating interference between threads as much as possible. The chip is configured so that, as a rule, the hardware resources for one thread are isolated from those of the other when both threads are running. In contrast, when either thread is in the idle state, the other thread can use the resources of both threads except for some resources. Thus, the chip has been designed to provide higher performance than in a single-thread operation. In the structure, both threads share the pipeline core. However, it is controlled so that, even if a pipeline is stalled in one thread, the processing in the other thread is not clogged up. In the instruction fetch stage, instruction decoding stage, or commit stage, either thread is selected in each cycle.

12

Page 16: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Figure 7. SPARC64 VII SMT processing mode.

Details of the Microarchitecture

As shown in Figure 8, a core of the SPARC64 VII is divided into the instruction fetch block and instruction execution block. The instruction fetch block contains the primary cache dedicated for instructions (L1I cache), and the instruction execution block contains the primary cache for operands (L1D cache).

13

Page 17: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Figure 8. Functional diagram of the SPARC64 VII core.

Instruction Fetch Block

The instruction fetch block, which operates independently of the instruction execution block, takes a series of instructions into the instruction buffer (IBUF), which are expected to be executed according to branch prediction. The IBUF has a capacity of 256 bytes and can store up to 64 instructions. When both threads are running, the IBUF is divided evenly for each thread. If an instruction execution is stalled, the instruction fetch continues until the IBUF becomes full. In contrast, if the instruction fetch pauses for some reason such as a cache error, instructions can be taken from the IBUF and the execution can continue as long as the IBUF contains instructions. The instruction fetch can be started in every cycle, and 32 bytes, which comprise eight instructions, are fetched at one time. The throughput of instruction execution is up to four instructions per cycle, and twice the throughput of instruction execution is ensured for the instruction fetch. The IBUF conceals the latency of the large-capacity primary instruction cache by separating the instruction fetch and instruction execution from each other (decoupling).

Instruction Execution Block

A core of the SPARC64 VII is divided into the instruction fetch block and instruction execution block. The instruction execution block operates independently of the instruction fetch block.

14

Page 18: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Instruction Decode and Issue

In the instruction decode and instruction issue stages, the four instructions in the Instruction Word Register (IWR) are decoded simultaneously, and resources required for execution (various reservation stations, fetch port and store port, and register update buffer) are determined. Then, whether there are free resources for them is checked. If there are free resources, they are allocated and given instruction identifications (IID) ranging from 0 to 63. Then, the instructions are issued. In other words, the maximum number of in-flight instructions is 64. Meanwhile, when both threads are running, the maximum number of instructions for each thread is 32. In each cycle, an instruction of either thread is decoded and threads are alternately switched.

When an instruction is issued, the IWR is released. For the instruction in any slot of the IWR, there are no restrictions on the allocation of resources such as reservation stations. Also, there are no restrictions on instruction-type combinations. Therefore, as long as there are free resources, instructions can be issued. Even if there is insufficient space for four instructions, as many instructions as possible are issued in program order. As described above, by eliminating stall conditions of instruction issue as much as possible, a high multiplicity level is ensured for any binary code.

Instruction Execution

A decoded instruction is registered in a reservation station. The SPARC64 VII has reservation stations for integer operation (reservation station for execution [RSE]) and reservation stations for floating point (RSF) operation. The RSEs and RSFs are divided into two queues for the execution unit. In other words, four reservation stations are provided for operation. They are RSEA, RSEB, RSFA, and RSFB. Each instruction stored in a reservation station is dispatched to the execution unit that corresponds to the reservation station in the order in which source operands are prepared for the instructions. Therefore, four operations can be dispatched simultaneously. Basically, the oldest instruction that can be dispatched (oldest ready) is selected from the instructions in a reservation station. However, in cases where a register to be updated by a load instruction is used as a source operand for an operation, the instruction is speculatively dispatched before the result of the load instruction is obtained. Then, in the execution stage, whether the speculatively dispatched instruction has been successful is determined; this is called speculative dispatch. Use of speculative dispatch conceals the latency of the pipeline for cache access, increasing the use efficiency of the execution unit. In addition to the above-described RSEs and RSFs, the other reservation stations are reservation stations for branch instructions (RSBR) and reservation stations for calculating addresses for load/store instructions (reservation station for address generation [RSA]).

Instruction Commit

All results of instructions that are executed out of order are stored once in the GPR Update Buffer (GUB) and FPR Update Buffer (FUB) work registers, which are not visible to software.

15

Page 19: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

To ensure the instruction order in a program, registers such as the general-purpose registers (GPR) and floating-point registers (FPR) and memory are updated in program order in the commit stage. In addition, control registers such as the PC are updated at the same time in the commit stage. Precise interrupts are guaranteed, and processing in execution can always be canceled. The above method is called a synchronous update method, which not only makes it easier to re-execute instructions after a branch prediction error, but also contributes to increased RAS, as explained later in this document. The maximum number of instructions that can be committed at one time is four. The instruction commit stage is shared by the two threads, and either thread is selected in each cycle to execute commit processing.

Cache System

As shown in Figure 9, the cache memory of the SPARC64 VII has a two-layer structure, consisting of a medium-capacity primary cache (L1 cache) and a high-capacity secondary cache (L2 cache).

Figure 9. SPARC64 VII processor core and cache design.

The L1 cache consists of a cache dedicated for instructions (L1I cache) and a cache dedicated for operands (L1D cache). Each of these caches has a capacity of 64 KB, uses the two-way set associative method, and has a block size of 64 B. The L1D cache is divided into eight banks on the four-byte address boundaries, and two operands can be accessed at one time. The L1 cache uses virtual addresses for cache indexes and physical addresses for cache tags. In the virtually indexed, physically tagged (VIPT) method, consistency can be lost if the same area of memory is accessed using different virtual addresses, because different indexes are used for registration

16

Page 20: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

(synonym problem). Through coordination with the L2 cache, the SPARC64 VII resolves the synonym problem with hardware.

The L2 cache has a maximum capacity of 5 MB, uses a 10-way set associative method, has a block size of 256 B, and is shared by the four cores. It adopts a two-bank interleaved structure, so 64 B of data can be read in each cycle. The bus for sending data that is read from the L2 cache to the L1 cache has a width of 32 B per two cores, and the bus for sending data from the L1 cache to L2 cache has a width of 16 B per core.

The cache update policies of the L1 cache and L2 cache are both write-back. That is, stored data is written into only one cache hierarchy. In the write-back method, cache-missed lines are always loaded onto the cache memory, so that the store operations can be completed by updating one cache hierarchy. In the write-back method, it is necessary to bring old data in memory onto the cache even if the data is stored, when a cache error occurs; however, the store operation is completed only on the cache when a cache hit occurs. In general, because the frequency of the store operation is quite high, the write-back method has an advantage because it can reduce intercache traffic and memory access traffic.

Meanwhile, because the write-back method keeps the latest data in the cache, if an error occurs in the relevant processor, there is a risk that the error could affect not only the internal operation of the processor but also the entire system. The SPARC64 VII has powerful RAS functions to cope with this problem.

Also, a new hardware barrier mechanism has been implemented in the SPARC64 VII. The hardware barrier mechanism synchronizes the cores in the CPU chip with each other, and faster synchronization processing can be implemented compared with a conventional synchronization process realized by software. This mechanism is especially useful in the high-performance computing area.

Reliability, Availability, and Serviceability Functions

In the SPARC64 VII, RAS functions comparable to those of mainframe computers have been implemented. With these RAS functions, errors are reliably detected, their effect is kept within a limited range, recovery processing is tried, error logs are recorded, and software is notified. In other words, the basics of RAS functions are thoroughly implemented. Through the imple-mentation of the RAS functions, the SPARC64 VII provides high reliability, high availability, high serviceability, and high data integrity as a processor for mission-critical UNIX servers.

Reliability, Availability, and Serviceability of Internal RAMs

Among the parts of a processor, RAM has the highest error occurrence frequency. Error detection and correction methods for the SPARC64 VII processor are highlighted in Table 6. In the SPARC64 VII, because any one-bit error in RAM can automatically be corrected by hardware without intervention by software, it does not affect software.

17

Page 21: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

TABLE 6. ERROR DETECTION AND CORRECTION METHOD FOR INTERNAL RAMS

TYPE ERROR DETECTION AND

PROTECTION METHOD

ERROR CORRECTION METHOD

L1 instruction cache Data Parity Invalidation and reread

Tag Parity + duplication Rewrite of duplicated data

L1 data cache Data SECDED ECC One-bit error correction using ECC

Tag Parity + duplication Rewrite of duplicated data

L2 cache Data SECDEDa ECC One-bit error correction using ECC

Tag SECDED ECC One-bit error correction using ECC

Instruction TLB Parity Invalidation

Data TLB Parity Invalidation

Branch history Parity Recovery from branch prediction failure

aSECDED: Single error correction and double error correction.

For the L1 cache, L2 cache, and TLB, degradation can be performed separately in way units. Error occurrence counts are made for each function unit. If the error occurrence count per unit time exceeds the upper limit, degradation is performed and the relevant way is not subsequently used. Hardware performs degradation automatically; at the same time, it also performs the required operation to ensure the continuity of coherency automatically. More specifically, hardware automatically performs the following: (1) operation that writes back to the L2 cache all the dirty lines in the way of the L1D cache to be degraded, and (2) operation that writes back to memory the dirty lines in the way of the L2 cache to be degraded. The degradation of a way is performed without adversely affecting software, and software operation is free from any effect except for a slowdown of the processing speed.

Reliability, Availability, and Serviceability of Internal Registers and Execution Units

The SPARC64 VII also provides error protection for registers and execution units, making doubly sure that data integrity is guaranteed (Table 7). For integer architecture registers, ECC is used from the SPARC64 VII to increase reliability. If an error occurs, the ECC circuit corrects the error. Parity bits have been added to the floating-point architecture registers and other registers. Also, the parity prediction circuit, residue check circuit, and other circuits have been added to the execution unit to propagate parity information to output results. In the unlikely event that a parity error occurs, it is detected, and hardware automatically re-executes the instruction to attempt recovery as described below. This function is called instruction retry.

TABLE 7. ERROR DETECTION AND PROTECTION METHOD FOR INTERNAL REGISTERS AND EXECUTION UNITS

TYPE ERROR DETECTION METHOD

PROTECTION METHOD

Integer register SECDED ECC Register

Floating-point register Parity

18

Page 22: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

PC, PSTATE, etc. Parity

Computation input-output register Parity

Addition and subtraction, division, shift, and graphic

operations

Parity prediction

Execution unit

Multiplication Parity prediction + residue check

Synchronous Update Method and Instruction Retry

As shown in the explanation of the instruction execution block, the SPARC64 VII uses the synchronous update method. When an error is detected, all the instructions being executed at this time are canceled. Interim results before commitment can be discarded, and only results updated by instructions that have been completed without encountering any errors remain in programmable resources. Therefore, not only can errors be prevented from destroying programmable resources, but hardware can also perform an instruction retry after error detection. Even in the case of a hang, because stalled instructions can be discarded once and then retried from the beginning, there is a possibility of recovery.

Instruction retry is triggered by an error and is automatically started. A retry is performed instruction by instruction to increase the chance of normal execution. When the execution is completed normally, the state automatically returns to the normal execution state. During this period, no software intervention is required, and if the instruction retry succeeds, the error does not affect software. An instruction retry is repeated until the number of retry times reaches the threshold, and when the threshold is exceeded, the occurrence of the error is reported to the software by an interrupt. Operational flow is shown in Figure 10.

Figure 10. Instruction retry by hardware after error detection.

Increased Serviceability

The SPARC64 VII has error-checking mechanisms in a variety of locations. If an error occurs, the system is notified of the error through a dedicated interface. On receipt of this notification,

19

Page 23: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

the XSCF firmware collects error logs through the dedicated interface and analyzes them. This series of operations does not affect software and is performed in the background.

With the mechanism described above, a system in which the SPARC64 VII is mounted can identify the location and type of a failure quickly and accurately while continuing operation. Thus, the system can obtain information useful for preventive maintenance to increase serviceability.

I/O Subsystem

A growing reliance on computer systems for every aspect of business operations brings along a need to store and process ever-increasing amounts of information. Powerful I/O subsystems are crucial to effectively moving and manipulating these large data sets. The Sun SPARC Enterprise M3000 server delivers exceptional I/O expansion and performance, enabling organizations to scale systems and accommodate evolving data storage needs.

I/O Subsystem Architecture

The use of PCI technology is crucial to the performance of the I/O subsystem within the Sun SPARC Enterprise M3000 server. A PCIe bridge supplies the connection between the main system and all I/O components, such as PCIe slots and internal drives (Figure 11). The PCIe bus also enables the connection of external I/O devices by using internal PCI slots.

Figure 11. Sun SPARC Enterprise M3000 server I/O subsystem architecture.

20

Page 24: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Sun SPARC Enterprise M3000 Server I/O Subsystem

In the Sun SPARC Enterprise M3000 server, a single PCIe bridge mounted on the motherboard connects all I/O components to the system controllers. The Sun SPARC Enterprise M3000 server has four PCIe slots.

I/O Devices

Along with a disk device directly integrated into it, the Sun SPARC Enterprise M3000 server supports one internal DVD drive and four internal SAS 2.5-inch hard disk drives. The Sun SPARC Enterprise M3000 server also supports one external SAS port, which can be connected to any SAS storage or tape device.

Reliability, Availability, and Serviceability

Reducing downtime�—both planned and unplanned�—is critical for IT services. System designs must include mechanisms that foster fault resilience, quick repair, and even rapid expansion, without impacting the availability of key services. Specifically designed to support complex, network computing solutions and stringent high-availability requirements, the system in the Sun SPARC Enterprise M3000 server includes redundant, hot-swap system components; diagnostic and error recovery features throughout the design; and built-in remote management features. The advanced architecture of this reliable server enables high levels of application availability and rapid recovery from many types of hardware faults, simplifying system operation and lowering costs for enterprises.

Redundant and Hot-Swap Components

Today�’s IT organizations are challenged by the pace of nonstop business operations. In a networked global economy, revenue opportunities remain available around the clock, forcing planned downtime windows to shrink and, in some cases, disappear entirely. To meet these demands, the Sun SPARC Enterprise M3000 server employs built-in redundant and hot-swap hardware to help mitigate the disruptions caused by individual component failures or changes to system configurations. In fact, these systems are able to recover from hardware failures�—often with no impact to users or system functionality.

The Sun SPARC Enterprise M3000 server features redundant, hot-swap power supplies and fan units. Also, redundant internal storage can be created by combining hot-swap disk drives with disk mirroring software. If a fault occurs, these duplicated components can enable continued operation. Depending upon the component and type of error, the system could continue to operate in a degraded mode or could reboot�—with the failure automatically diagnosed and the relevant component automatically configured out of the system. In addition, hot-swap hardware within the Sun SPARC Enterprise M3000 server speeds service and allows for the replacement or addition of components, without stopping the system.

21

Page 25: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Advanced Reliability Features

Advanced reliability features included within the components of the Sun SPARC Enterprise M3000 server increase the overall stability of this platform. In addition, advanced CPU integration and guaranteed data path integrity provide for autonomous error recovery by the SPARC 64 VII processor, reducing the time to initiate corrective action and subsequently increasing uptime.

XSCF and the predictive self-healing feature in Oracle Solaris further enhance the reliability of Sun SPARC Enterprise servers. The implementation of XSCF and predictive self-healing for Sun SPARC Enterprise servers enables the constant monitoring of all CPUs and memory. Depending upon the nature of the error, persistent CPU soft errors can be resolved by automatically offlining a thread, core, or entire CPU. In addition, a memory page retirement capability enables memory pages to be taken offline proactively, in response to multiple corrections to data access for a specific memory DIMM.

Error Detection, Diagnosis, and Recovery

The Sun SPARC Enterprise M3000 server features important technologies that correct failures early and keep marginal components from causing repeated downtime. Architectural advances that inherently increase reliability are augmented by the error detection and recovery capabilities within the server hardware subsystems. Ultimately, the following features work together to raise application availability:

End-to-end data protection detects and corrects errors throughout the system, ensuring complete data integrity.

State-of-the-art fault isolation enables the server to isolate errors within component boundaries and offline only the relevant resources instead of whole components. This feature applies to CPUs (cores), memory, and I/O devices.

Constant environment monitoring provides a historical log of all pertinent environmental and error conditions.

The host watchdog feature periodically checks the operation of software, including the domain operating system. This feature also uses the XSCF firmware to trigger error notification and recovery functions.

Periodic component status checks are performed to determine the status of many system devices to detect signs of an impending fault. Recovery mechanisms are triggered to prevent system and application failures.

Error logging, multistage alerts, electronic field-replaceable unit identification information, and system fault LED indicators all contribute to rapid problem resolution.

22

Page 26: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

System Management

Providing hands-on, local system administration for server systems is no longer realistic for many organizations. Around-the-clock system operation, disaster recovery hot sites, and geographically dispersed organizations lead to requirements for remote management of systems. One of the many benefits of Oracle servers is the support for lights-out data centers, enabling expensive support staff to work at any location with network access. The Sun SPARC Enterprise M3000 system design, combined with the powerful XSCF, XSCF Control Package (XCP), and system management software, enables administrators to remotely execute and control nearly any task. These management tools and remote functions lower administrative loads, saving organizations time and reducing operational expenses.

eXtended System Control Facility

The XSCF is the core technology of remote monitoring and management capabilities in the Sun SPARC Enterprise M3000 server. The XSCF consists of a dedicated processor that is independent of the server system and runs the XCP. The Domain to Service Processor Communication Protocol (DSCP) is used for communication between the XSCF and the server. The DSCP runs on a private TCP/IP-based or PPP-based communication link between the service processor and each domain. Although input power is supplied to the server, the XSCF constantly monitors the system even when the domain is inactive.

The XSCF regularly monitors the environmental sensors, provides advance warnings of potential error conditions, and executes proactive system maintenance procedures as necessary. For example, the XSCF can initiate a server shutdown in response to temperature conditions that might lead to physical system damage. The XCP running on the service processor enables administrators to remotely control and monitor a domain as well as the platform itself. Using a network or serial connection to the XSCF, operators can effectively administer the server from anywhere on the network. Remote connections to the service processor run separately from the operating system and provide the full control and authority of a system console.

DSCP Network

The DSCP service provides a secure TCP/IP and PPP-based communications link between the service processor and each domain. Without this link, the XSCF cannot communicate with the domain. The service processor requires one IP address dedicated to the DSCP service on the XSCF side of the link and one IP address on the domain side.

eXtended System Control Facility Control Package

The XCP enables users to control and monitor the server system quickly and effectively. The XCP provides a command-line interface (CLI) and Web browser user interface that gives administrators and operators access to all system controller functions. Password-protected

23

Page 27: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

accounts with specific administration capabilities also provide system security for domain consoles. Communication between the XSCF and individual domains uses an encrypted connection based on secure shell (SSH) and Secure Sockets Layer (SSL), enabling secure, remote execution of commands provided by the XCP.

The XCP provides the interface for the following key server functions:

Audit administration including the logging of interactions between the XSCF and the domains

Monitoring and control of power to the components inside the Sun SPARC Enterprise M3000 server

Interpretation of hardware information presented, and notification of impending problems such as high temperatures or power supply problems, as well as access to the system administration interface

Integration with the fault management architecture of Oracle Solaris 10 to improve availability through accurate fault diagnosis and predictive fault analysis

Execution and monitoring of diagnostic programs, such as the OpenBoot PROM (OBP) and power-on self-test (POST)

Role-Based System Management

The XCP supports role-based system access control through the organization of users into groups. Different privileges are assigned to each group. Privileges allow a user to perform a specific set of actions on a specific set of hardware, including physical components, domains, or physical components within a domain. In addition, a user can possess multiple, different privileges on any number of domains.

Platform Management

Oracle Enterprise Manager Ops Center software, as well as other third-party tools, offer advanced management functions that complement the capabilities of the XCP. To simplify integration, the XSCF can communicate to system management tools by enabling an SNMP agent on the service processor. The network interface on the service processor facilitates data transfer to SNMP managers within third-party management applications. SNMP V1, V2, and V3 and concurrent access from multiple SNMP managers are supported.

The service processor SNMP agent can export the following types of information to an SNMP manager:

System information such as chassis ID, platform type, total number of CPUs, and total memory

Hardware configuration

24

Page 28: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Domain status

Power status

Environmental status

The service processor SNMP agent can supply system and fault event information using public management information bases (MIBs). The XSCF supports the configuration of the following two MIBs (configuration commands can be found in Table 8):

XSCF extension MIB (SP-MIB). Provides information on the status and configuration of the platform. For fault events, the SP-MIB sends a trap with basic fault information.

Fault Management MIB (FM-MIB). Records fault event data. The FM-MIB provides the same detailed information as the FMA MIB in an Oracle Solaris domain. This data can help service technicians diagnose failures.

TABLE 8. SERVICE PROCESSOR SNMP AGENT CONFIGURABLE FOR ONE OR BOTH MIBS

MIB CONFIGURATION COMMAND

SP traps only setsnmp enable SP_MIB

FMA traps only setsnmp enable FM_MIB

SP and FMA traps setsnmp enable

Oracle Enterprise Manager Ops Center Software

Controlling a rapidly changing IT infrastructure requires intelligent management tools and an ability to provision servers efficiently. Oracle Enterprise Manager Ops Center is a highly scalable data center management platform that provides organizations with systems lifecycle management and automation processes to help manage data center requirements such as server consolidation, compliance reporting, and rapid provisioning. This management platform helps enterprises to provision and administer both physical and virtual data center assets. Oracle Enterprise Manager Ops Center provides a single console to help discover, provision, update, and manage globally dispersed heterogeneous IT environments, which may include Oracle and non-Oracle hardware running Windows, Linux, and Oracle Solaris operating systems. When used in conjunction with the Sun SPARC Enterprise M3000 server, this enterprise platform can automate the knowledge necessary for patch lifecycle management and maintenance. Oracle Enterprise Manager Ops Center can help system administrators automate software installations, simulation, rollback, compliance checking, reporting, and many other related activities. In addition, Oracle Enterprise Manager Ops Center can be used to discover the embedded service tag technology in the service processor and the domain running on the Sun SPARC Enterprise M3000 server.

Oracle Solaris JumpStart can be implemented in this management solution to provision Oracle Solaris onto the Sun SPARC Enterprise M3000 server. Oracle Enterprise Manager Ops Center helps facilitate and control administrative actions from a central location to ensure accountability and auditing. These automation capabilities can be used for knowledge-based change

25

Page 29: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

management in conjunction with existing configuration management investments. Taking advantage of Oracle Enterprise Manager Ops Center can help organizations create a more-reliable environment that offers considerable cost savings through maintenance reduction and more-rapid recovery down times.

Oracle Solaris 10

With mission-critical business objectives on the line, enterprises need a robust operating environment with the ability to optimize the performance, availability, security, and use of hardware assets. In a class by itself, Oracle Solaris 10 offers many innovative technologies to help IT organizations improve operations and realize the full potential of Sun SPARC Enterprise servers.

Observability and Performance

IT organizations need to make effective use of the power of hardware platforms. Oracle Solaris 10 supports near-linear scalability proportional to the number of CPUs (cores) and memory addressability that reaches well beyond the physical memory limits of even Oracle�’s largest server. The following advanced features of Oracle Solaris 10 provide IT organizations with the ability to identify potential software tuning opportunities and maximize raw system throughput:

Oracle Solaris DTrace is a powerful tool that provides a true system-level view of application and kernel activities, even those running in a Java Virtual Machine. DTrace software safely instruments the running operating system kernel and active applications without rebooting the kernel or recompiling�—or even restarting�—software. By using this feature, administrators can view accurate and concise information in real time and highlight patterns and trends in application execution. The dynamic instrumentation that DTrace provides enables organizations to reduce the time to diagnose problems from days and weeks to minutes and hours, resulting in faster data-driven fixes.

The highly scalable, optimized TCP/IP stack in Oracle Solaris 10 lowers overhead by reducing the number of instructions required to process packets. This technology also provides support for large numbers of connections and enables server network throughput to grow linearly with the number of CPUs and network interface cards. By taking advantage of Oracle Solaris 10 network stack, organizations can significantly improve application efficiency and performance.

The memory handling system of Oracle Solaris 10 provides multiple page size support to enable applications to access virtual memory more efficiently, improving performance for applications that use large memory intensively.

Oracle Solaris 10 multithreaded execution model plays an important role in enabling Sun SPARC Enterprise servers to deliver scalable performance. Improvements to the threading

26

Page 30: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

capabilities in Oracle Solaris 10 occur with every release, resulting in performance and stability improvements for existing applications without recompilation

Availability

The ability to rapidly diagnose, isolate, and recover from hardware and application faults is paramount for meeting the needs of nonstop business operations. Longstanding features of the Oracle Solaris provide for system self-healing. For example, the kernel memory scrubber constantly scans physical memory, correcting any single-bit errors to reduce the likelihood of those problems turning into uncorrectable double-bit errors. Oracle Solaris 10 takes a big leap forward in self-healing with the introduction of the fault manager and service manager features. With these features, business-critical applications and essential system services can continue uninterrupted in the event of software failures, major hardware component breakdowns, and software misconfiguration problems.

Fault manager reduces complexity by automatically diagnosing faults in the system and initiating self-healing actions to help prevent service interruptions. Fault manager diagnosis engine produces a fault diagnosis once discernible patterns are observed from a stream of incoming errors. Following error identification, fault manager provides information to agents that know how to respond to specific faults. Problem components can be configured out of a system before a failure occurs�—and in the event of a failure, this feature performs automatic recovery and application restart. For example, an agent designed to respond to a memory error might determine the memory addresses affected by a specific failure and remove the affected locations from the available memory pool.

Service manager software converts the core set of services packaged with the operating system into first-class objects that administrators can manipulate with a consistent set of administration commands. Using service manager, administrators can take actions on services including start, stop, restart, enable, disable, view status, and snapshot. Service snapshots save the complete configuration of a service, giving administrators a way to roll back any erroneous changes. Snapshots are taken automatically whenever a service starts to help reduce risk by guarding against erroneous errors. Because service manager is integrated with fault manager, when a low-level fault is found to impact a higher-level component of a running service, fault manager can direct service manager to take appropriate action.

In addition to handling error conditions, efficiently managing planned downtime greatly enhances availability levels. Tools included with Oracle Solaris 10, such as Oracle Solaris Flash and Oracle Solaris Live Upgrade, can help enterprises achieve more-rapid and consistent installation of software, upgrades, and patches, leading to improved uptime.

Oracle Solaris Flash enables IT organizations to quickly install and update systems with an Oracle Solaris 10 configuration tailored to enterprise needs. This technology provides tools for

27

Page 31: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

system administrators to build custom rapid-install images�—including applications, patches, and parameters�—which can be installed at a data rate close to the full speed of the hardware.

Oracle Solaris Live Upgrade provides mechanisms to upgrade and manage multiple on-disk instances of Oracle Solaris 10. This technology enables system administrators to install a new operating system on a running production system without taking it offline, with the only downtime for the application being the time necessary to reboot the new configuration.

Security

Today�’s increasingly connected systems create benefits and challenges. While the global network offers opportunities to increase revenue, enterprises must pay close attention to security concerns. The most secure operating system on the planet, Oracle Solaris 10 provides features previously found only in the trust military-grade Oracle Solaris. These capabilities enable the strong controls required by governments and financial institutions, but also benefit all enterprises focused on security concerns and requirements for auditing capabilities.

The user rights management and process rights management capabilities in Oracle Solaris work in conjunction with Oracle Solaris Containers to enable multiple applications to securely share the same domain. Security risks are reduced by granting users and applications only the minimum capabilities needed to perform assigned duties. Best yet, unlike other solutions on the market, no application changes are required to take advantage of these security enhancements.

The security policy in Oracle Solaris 10 can be extended with labeling features previously available only in highly specialized operating systems or appliances. These extensions deliver true multilevel security within a commercial grade operating system, beneficial to civilian organizations with specific regulatory or information protection requirements.

Oracle Solaris 10 provides features that fortify platforms against compromise. Firewall protection technology included within Oracle Solaris 10 distribution protects individual systems against attack. In addition, file integrity checking and digitally signed binaries within Oracle Solaris 10 enable administrators to verify that platforms remain untouched by hackers. Secure remote access capabilities also increase security by centralizing the administration of system access across multiple operating systems.

Virtualization and Resource Management

The economic need to maximize the use of every IT asset often necessitates consolidating multiple applications onto single server platforms. Virtualization techniques enhance consolidation strategies one step further by helping organizations create administrative and resource boundaries between applications within each domain on a server. By taking advantage of Oracle Solaris Containers and Oracle Solaris Resource Manager software, organizations can improve resource use and reduce downtime�—without additional software licensing expenses.

28

Page 32: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

Oracle Solaris Containers

Containers provide a breakthrough approach to virtualization and software partitioning, supporting the creation of many private execution environments within a single instance of Oracle Solaris (Figure 13). Within the Container model, each environment holds a unique identity and maintains resource and namespace isolation. Administrators can configure separate LAN or virtual LAN connections with exclusive IP stacks for individual Containers, creating secure separation of network traffic. By supporting fine-grained control over the assignment of system rights and resources, Containers can ease consolidation efforts.

Applications within containers are isolated, preventing processing in one container from monitoring or affecting processes running in another container. Even a superuser process cannot view or affect activity in other containers. Software fault and security isolation features in Oracle Solaris Containers prohibit poorly behaved applications from impacting other containers. This isolation supports better administrative control, helping organizations eliminate error propagation, unauthorized access, and unintentional intrusions.

Figure 13. Containers isolate applications using flexible software mechanisms.

Hosting multiple applications on one system helps organization realize the use of expensive resources to greater effect. Using Containers can lead to lower costs by helping IT organizations harness and provision otherwise idle compute power into a secure, isolated runtime environment for new deployments. For example, a database, Web server, and batch application each running on its own system can be consolidated onto a single server configured to give each access to one-third of the available system resources. That same server can be automatically reconfigured so that the Web server receives 75 percent of the network bandwidth during peak load conditions. When applied to test and development environments, Containers can minimize the need for dedicated test systems and facilitate the implementation of multiple deployment scenarios with ease. At the end of a testing cycle, administrators can also rapidly duplicate validated configurations for production deployment. With the ability to dynamically allocate resources, Containers help improve resource use without increasing the number of operating system instances to manage.

29

Page 33: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture

30

Oracle Solaris Resource Manager

Resource management tools address the needs of consolidation efforts, which require soft resource boundaries between applications. With no privileges to access underlying hardware, resource management software leverages operating system controls to govern the use of CPU, memory, and I/O. Oracle Solaris Resource Manager software enables system administrators to set and enforce policies that guarantee a share of CPU cycles and virtual memory space to individual applications. Administrators can also set upper limits on process count, number of logins, and connect time for each system user ID. In addition, Oracle Solaris Resource Manager can be used along with other virtualization technologies to further define resource rights for each virtualized boundary. In fact, Oracle Solaris Resource Manager enables the dynamic allocation of processors and individual processor cores to a Container. The power to define and readily adjust compute resource levels within virtualized environments helps enterprises improve hardware use and better guarantee the quality of service for individual applications.

Conclusion

To support the high demand for reliability, manageability, and reduced environmental loads in data centers, infrastructures need to provide ever-increasing performance and capacity along with power conservation and miniaturization. Outfitted with the SPARC64 VII processor developed to provide high performance and low power consumption, a large memory capacity, a reliable architecture, and a system monitoring feature, Oracle�’s Sun SPARC Enterprise M3000 server delivers new levels of power, availability, and ease-of-use to enterprises. Organizations using this server can open the door to a new environment, fostering greater business opportunities and gaining a strategic asset in the quest to get ahead and stay ahead of the competition.

Page 34: Sun SPARC Enterprise M3000Server Architecturehosteddocs.ittoolbox.com/dce_us_en_wp_archi.pdf · Sun SPARC Enterprise M3000 Server Architecture Oracle Enterprise Manager Ops Center

Sun SPARC Enterprise M3000 Server Architecture April 2010 Oracle Corporation World Headquarters 500 Oracle Parkway Redwood Shores, CA 94065 U.S.A. Worldwide Inquiries: Phone: +1.650.506.7000 Fax: +1.650.506.7200 oracle.com

Copyright 2008, 2009, 2010, Oracle and/or its affiliates. All rights reserved. This document is provided for information purposes only and the contents hereof are subject to change without notice. This document is not warranted to be error-free, nor subject to any other warranties or conditions, whether expressed orally or implied in law, including implied warranties and conditions of merchantability or fitness for a particular purpose. We specifically disclaim any liability with respect to this document and no contractual obligations are formed either directly or indirectly by this document. This document may not be reproduced or transmitted in any form or by any means, electronic or mechanical, for any purpose, without our prior written permission.

Oracle and Java are registered trademarks of Oracle and/or its affiliates. Other names may be trademarks of their respective owners. AMD, Opteron, the AMD logo, and the AMD Opteron logo are trademarks or registered trademarks of Advanced Micro Devices. Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. UNIX is a registered trademark licensed through X/Open Company, Ltd. 0110