ServerArchitectures SMPSymmetricMulti Processors

Server ArchitecturesSMP: Symmetric Multi-Processors

Ren J. ChevanceJanuary 2005

Page 2

RJ Chevance

Forewordn This presentation is an introduction to a set of presentations

about server architectures. They are based on the following book:

Serveurs Architectures: Multiprocessors, Clusters, Parallel Systems, Web Servers, Storage Solutions

Ren J. ChevanceDigital Press December 2004 ISBN 1-55558-333-4

http://books. elsevier.com/

This book has been derived from the following one:

Serveurs multiprocesseurs, clusters et architectures parallles

Ren J. ChevanceEyrolles Avril 2000 ISBN 2-212-09114-1

http://www.eyrolles.com/

The English version integrates a lot of updates as well as a new chapter on Storage Solutions.

Contact: www.chevance.com [email protected]

Page 3

RJ Chevance

Organization of the Presentationsn Introductionn Processors and Memoriesn Input/Outputn Evolution of Software Technologies Symmetric Multi-Processors (this document)

SMP Architecture Overview Microprocessors and SMP Operating Systems and SMP SMP with moderate number of processors (8 processors) SMP with Many Processors (16 processors): Nearly UMA Systems, CC-

NUMA, COMA Performance Improvement Provided by SMP Advantages and disadvantages of SMP architecture Partitioning: Virtual Machines, Hardware Partitioning, Logical Partitioning

n Cluster and Massively Parallel Machinesn Data Storagen System Performance and Estimation Techniquesn DBMS and Server Architectures n High Availability Systemsn Selection Criteria and Total Cost of Possessionn Conclusion and Prospects

Page 4

RJ Chevance

SMP Architecture Overviewn Symmetric MultiProcessor Architecture

With SMP, all processorsmay accessto all system resources (memory, I/O). A SMP is operating under the control of one Operating System managing all system resources.

The native programming model is the sharing of memory between processes.

Processor Processor Processor

System ControllerMemory

I/O Controller I/O Controller

Processor -memory-I/O interconnect

I/O bus

or

Systme Operating System

DBMS(s)Middleware

Applications

Hardware resources(processors, memory, controllers and peripherals)

b) SoftwareViewof a Symmetrical Multiprocessora) Hardwareview of a Symmetrical Mutiprocessor

Page 5

RJ Chevance

SMP Architecture Overview(2)

n SMP Programming and Execution Model

Process

Processor

Process

Process

Processor

Processus

Processor

Process

Processor

Process

Processor

Process

a) - Uniprocessor b) - Tightly-Coupled Multprocessor /SMP

Page 6

RJ Chevance

SMP Architecture Overview(3)n Lightweight processes (threads)

o Threads are sharing process resourceso Cost of thread switching is lower than process

switching

Single-threadedUser Process

2GBSharedLibrariesSharedMemory

Stack

Data

Code (text)0

Proc. Context

Multi-threadedUser Process

Kernel codeand data

4GB

2GB

OperatingSystemAddressSpace

u area: Status and control informationassociated with the process

UserAddress Space

2GBSharedLibraries

SharedMemory

Data

Code (text) 0

Proc. Context

Stack

Thread 0 Thread n

Main process

Proc. Context

Stack

Page 7

RJ Chevance

Microprocessors and SMP

n Microprocessors must integrate specific features to operate in an SMP environment

n Cache Coherence - Illustration

Store '4' ->A

Memory

(1)

Processor P1 P2 P3

Cache

Load A Load A

b

A=1

A=1A=1

Memory

(2)

Processor P1 P2 P3

Cache

Load A Load A

b

A=1

A=1A=1A=4

Page 8

RJ Chevance

Microprocessors and SMP(2)n Cache Coherence Protocol (e.g. MESI)

n Memory Consistencyn Synchronization Mechanisms

Invalid Shared

ExclusiveModified

Read miss

LB

LB

U

U

Read miss

External write

External write

External read

Externalread

Write hit

Write hit

Read Hit

Read hit

Read hit

Write hit

External write

U indicates an update of the memoryLB indicates loading a blockinto the cache

Page 9

RJ Chevance

Operating Systems and SMPn Operating systems must be designed for SMP

(e.g. Windows NT) or adpated to SMP (e.g. Unix)n e.g. transforming an OS for SMP:

o MP Safe Operation then MP efficiento Transforming kernel processes into threadso Pre-emptive kernelo Locking to prevent simultaneous access to shared

resources (e.g. data structures)l Granularity, number of locks, duration of locksl Dead locks

o Allocating processes/threads to processors (affinity)o

n Need constant tuning to adapt to changing characteristics such as the number of processors,.

Page 10

RJ Chevance

SMP with moderate number of processorsn By 2003, we can assume that up to 8 is a moderate number

of processors (no need to use sophisticated technologies). This limit is growing with time as microprocessor vendors integrate more and more system capability into their silicon.

n Internal Structure of the AMD Opteron Processor (according to AMD)

DDR MemoryController

DDR MemoryController

OpteronProcessor

Core

OpteronProcessor

Core

L1Instruction

Cache

L1Instruction

Cache

L1Data

Cache

L1Data

Cache

L2Cache

L2Cache

HyperTransportHyperTransport

Page 11

RJ Chevance

SMP with moderate number of processors(2)

n 4 and 8 processor HyperTransport-basedSMP Systems (according to AMD)

HyperTransportTo AGP Bridge

HyperTransportTo AGP Bridge

HyperTransportTo PCI-X bridgeHyperTransportTo PCI-X bridge

HyperTransportTo PCI-X bridgeHyperTransportTo PCI-X bridge SouthbridgeSouthbridge

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

8xAGP8x

AGP

AGP optional

a) 4 way SMP a) 8 way SMP

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

OpteronOpteron

PCI Adapters Memory

Page 12

RJ Chevance

SMP with moderate number of processors(3)n Overview of IBMs X-Architecture (IBM)

Proc. Proc.

Proc. Proc.Mem

ory

Proc.Proc.

Proc.Proc.

Mem

ory

Proc. Proc.

Proc. Proc.Mem

ory

Proc.Proc.

Proc.Proc.

Mem

ory

Page 13

RJ Chevance

SMP with moderate number of processors(4)n Architecture of a 4-Way Itanium-2 Based SMP Server, the

SR870NB4 (Source INTEL)

Itanium 2Itanium 2 Itanium 2Itanium 2 Itanium 2Itanium 2 Itanium 2Itanium 2

System Bus 6.4 GB/s

Scalable NodeController

SNC

Scalable NodeController

SNC6.4 GB/s

DMHDDR

DDR

DMHDDR

DDR

DMHDDR

DDR

DMHDDR

DDR

Memory128 GB

FirmwareHub

I/O HubI/O Hub

Scalability Ports

LegacyI/O Hub

PCI-XBridge

Hub Interface1 GB/s per link

PCI-XBridge

Page 14

RJ Chevance


n Architecture of a 4-Way Xeon MP-based Server (according to ServerWorks)

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

Pentium 4Xeon MP

GC HEGC HE

CIOB-XCIOB-XIMBus

CIOB-XCIOB-X

CIOB-XCIOB-X

PCI-X

DDR 200 DDR 200

DDR 200 DDR 20016 B

CSB5Legacy I/O

CSB5Legacy I/O 32-bit PCI

Page 15

RJ Chevance


n Crossbar-based Architectures: Sun Fire 3800 Architecture (Source: Sun)

Processor& E Cache

Memory8 GB

CPU Data Switch

Processor& E Cache

Memory8 GB

2.4 GB/sec2.4 GB/sec


Processor& E Cache

Memory8 GB

CPU Data Switch

Processor& E Cache

Memory8 GB



Address Repeater

Board Data Switch

4.8 G /sec

4.8 GB/sec

DatapathController

150 millionaddresses/sec

4.8 GB/sec

Page 16

RJ Chevance

SMP with Many Processors 16 processorsn Because of the physical and electrical constraints, larger SMP

systems must use a hierarchy of interconnect mechanisms; in arecursive manner, a large SMP tends to be a collection of smaller systems, each of which is itself an SMP

n Such a hierarchy means that memory access times differ - when aprocessor accesses its own memory, it will see a much shorter access time than when it accesses memory in a distant module. This non-uniformityof memory access times can have a majoreffect on system behavior and on the software running on it

n The degree of uniformity allows us to distinguish two classes of SMP systems.o Systems in which access to any memory takes an identical amount

of time, or very nearly identical are called(respectively) UMA - forUniform Memory Access - or nearly UMA .

o Systems in which there is a noticeable disparity in access times - afactor greater than 2 - are referred to as CC-NUMA, for Cache -Coherent Non-Uniform Memory Access

n This classification is purely empirical, and its practical significance is that in a distinctly NUMA system, software must -for maximum efficiency - take account of the varying accesstimes. The amount of work needed for this adaptation tends toincrease with the NUMA ratio.

Page 17

RJ Chevance

Nearly UMA Systems

n Fujitsu Prime Power System architecture (simplified) (after Fujitsu)

ProcessorsMemoryPCI cardsL1 Crossbar

System card

Up to 8 cards(32 processors)

L2 Crossbar

Cabinet(32 processors)

M

PPPP

PCI

L1

L2 L2

L2L2

Memory latency: 300 ns when a L2 crossbar is part of thaccess path (one may assume 200 ns for local access time)

Page 18

RJ Chevance

Nearly UMA Systems(2)

n HP Superdomeo HP Superdome Cell (after HP)

CellController

MemoryUp to 16 GB

PA

-860

0

PA

-860

0

PA

-860

0

PA

-860

0

Cro

ssba

r

I/OController

AS

IC PCI

AS

IC PCI

AS

ICPCI

AS

ICPCI

1.6 GB/s

6.4 GB/s

Page 19

RJ Chevance


n HP Superdome Architecture (after HP)

I/O

CellC

ross

bar

Cro

ssba

r

Cro

ssba

r

I/O

Cell

Cro

ssba

r

BackplaneBackplane

CabinetCabinet

Memory latencies for a fully-loaded systemare said to be:

within-cell: 260 ns; on the same crossbar: 320 to 350 ns; within the same cabinet: 395 ns; between cabinets: 415 ns.

Page 20

RJ Chevance


n IBM pSeries 690o Structure of Power4 MCM (after IBM)

CPUCore

CPUCore

L3

Dir

.

Shared L2

Chip-Chip Comm.

CPUCore

CPUCore

L3

Dir

.

Shared L2

Chip-Chip Comm.

CPUCore

CPUCore

L3

Dir

.

Shared L2

Chip-Chip Comm.

CPUCore

CPUCore

L3

Dir

.

Shared L2

Chip-Chip Comm.

Multi-chip Module Boundary

L3MemoryCtrl

L3MemoryCtrl

Mem

ory

Mem

ory

GX Bus

GX Bus

L3 MemoryCtrl

L3 MemoryCtrl

Mem

ory

Mem

ory

GX Bus

GX Bus

Page 21

RJ Chevance


o Architecture of IBM pSeries 690 (after IBM)

P PL2

P PL2

P PL2

P PL2

GX

GX

GX GX

L3L3

L3L3

L3L3 L3L3

PPL2

PPL2

PPL2

PPL2

GX

GX

GXGX

L3 L3

L3 L3

L3 L3L3 L3

P PL2

P PL2

P PL2

P PL2

GX

GX

L3L3

L3L3

L3L3 L3L3

PPL2

PPL2

PPL2

PPL2

GX

GX

L3 L3

L3 L3

L3 L3L3 L3

GX GX GXGX

GX Slot

GX Slot

GX Slot

MemorySlot

MemorySlot

MemorySlot

MemorySlot MemorySlot

MemorySlot

MemorySlot

MemorySlot

MCM 0MCM 1

MCM 2 MCM 3

Page 22

RJ Chevance


n z900 Hardware Architecture (after IBM)

PU01

L1

PU02

L1

PU03

L1

PU04

L1

PU05

L1

PU06

L1

PU07

L1

PU08

L1

PU09

L1

L2 cache control chip and L2 cache data chips (16 MB shared )

PU00

L1

MBA

1

MBA

0

STISTI

Memorycard

0

Memorycard

2

Clock

PU0B

L1

PU0C

L1

PU0D

L1

PU0E

L1

PU0F

L1

PU10

L1

PU11

L1

PU12

L1

PU13

L1

L2 cache control chip and L2 cache data chips (16 MB shared )

PU0A

L1

MBA

2

MBA

3

STISTI

Memorycard

1

Memorycard

3

CCE 0 CCE 1

ETR

ETR

Page 23

RJ Chevance


n Architecture of a system with 16 Itanium 2 processors (after INTEL)

Itanium2

Itanium2

Itanium2

Itanium2

SNC

Mem

ory

Fir

mw

are

Hu

b

Itanium2

Itanium2

Itanium2

Itanium2

SNC

Mem

ory

Fir

mw

are

Hu

b

ScalabilityPort Switch

I/O HubLegacy I/OController Hub

PCI-X PCI-X

Itanium2

Itanium2

Itanium2

Itanium2

SNC

Mem

ory

Fir

mw

are

Hu

b

Itanium2

Itanium2

Itanium2

Itanium2

SNC

Mem

ory

Fir

mw

are

Hu

b

ScalabilityPort Switch

I/O HubLegacy I/OController Hub

PCI-X PCI-X

The system is based on 4-processor building blocks, using - here - the Scalability Port Switch and its Scalability Ports.These provide both connection and coherence, and offer a reasonable NUMA factor - Intel indicates a value of 2.2 for this system.

Page 24

RJ Chevance


n Structure Suns 3800(8), 4800(12) and 6800(24) Systems (after Sun)

Fireplane address router

Fireplane data crossbar

CPU/MemoryBoard 4

CPU/MemoryBoard 5

CPU/MemoryBoard 6

CPU/MemoryBoard 3

CPU/MemoryBoard 2

CPU/MemoryBoard 1

I/O Assembly 1I/O Assembly 1




Sun Fire 3800 Server

Sun Fire 4800/4810 Server

Sun Fire 6800 Server

Local memory access time: 180 ns, access to memory on another card takes 240 ns

Page 25

RJ Chevance

CC-NUMA Architecture

n Early CC-NUMA systems were based upon 4-way building blocks (Pentium-based) and used derivatives of the SCI (Scalable Coherent Interface). Early systems were characterized by poor performance

n Principles of CC-NUMA Architecture

Proc.

Cache

Memory

E/SI/O

Coherentnetwork i/f

Module

Coherent Network Interconnect

Proc.

Cache

MmoireMmoire

MmoireMmoire

Memory

E/SE/S

E/SE/S

I/O

Physical Configuration Equivalent Logical Configuration

Proc.

Cache

Memory

E/SI/OCoherentnetwork i/f

Module

Page 26

RJ Chevance

CC-NUMA Architecture(2)

n CC-NUMA performance is very sensitive to locality of information. Large NUMA factor (e.g. above 5) requires extensive software optimizations

n Some examples of possible optimizations whenNUMA factor is large:o Affinity scheduling: arranging that a process, once

suspended, is rescheduled on the same processor on which it last executed in the expectation that the caches contain useful state from the earlier execution

o When allocating memory, prioritize local allocation (i.e., allocation within the module on which the program is executing) above distant allocation

o When a process has had pages paged out, allocate memory for the pages when paged back in on the same module as the process is executing

Page 27

RJ Chevance

CC-NUMA Architecture(3)

n General Architecture of SGI Origin 3000 System (after SGI)

Possessor A Possessor B

HubChip

Memoryand

Directory

I/OCrossbar

Module 0

Scalable Interconnect Network : NUMAlink = 3.2 GB/sec.

Module 1 Module 255

Page 28

RJ Chevance

CC-NUMA Architecture(4)n 32 and 64 processor SGI Origin 3000

configurations (after SGI)

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

32 processors

64 processors

Page 29

RJ Chevance

CC-NUMA Architecture(5)n SGI Origin 3000 - 128-processor Configuration

(after SGI)

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

M

M

R M

M

RM

M

R

M

M R

M

M

R M

M

RM

M

R

M

M R

M

M

R

RR

R

RR

R

R128 processors

Hierachical Fat Bristled Hypercube

Page 30

RJ Chevance

COMA Architecturen COMA Architecture (Cache Only Memory Architecture)

aimed at large SMP: data doesnt have a fixed place in memory (that is, has no fixed address in the memory of a module), and the memory of each module acts as a cache. Data is moved, by hardware, to where it is being accessed, and data that is shared for reading may therefore be resident in several modules at any one time

n Schematic diagram of a COMA architecture

E/SE/S

E/SE/S

I/O

Cache

Proc. Cache(Memory)

Physical Configuration Equivalent Logical Configuration

Proc.

Cache

Memory

E/SI/OCoherent network i/f

Module

Coherent Network Interconnect

Proc.

Cache

Memory

E/SI/OCoherent network i/f

Module

Page 31

RJ Chevance

COMA Architecture(2)n Only one Company (KSR now disappeared) has introduced a

COMA system to the market placen Advantages:

o because memory is managed as a cache, data is always local to the module using it

o many copies of data used read-only may exists simultaneously

n Disadvantages:o the hardware is more complexo directory sizes increase, as does the time necessary to make

allocate data

n As originally described, COMA required a specific microprocessor architecture in the area of memory management (e.g. KSR) and so a very large investment

n S-COMA (Simple COMA), a project at Stanford University), is using standard microprocessors

Page 32

RJ Chevance

SMP, CC-NUMA, COMA: a Summary

n The traditional UMA SMP architecture provides the best character istics as regards memory access time and the simplicity of the required cache coherence protocols

n CC-NUMAs data uniqueness property (0 or 1 copies of an object in memory at any time) simplifies the coherence protocol at the expense of increased access time to data in other modules

n COMAs ability to replicate data copies improves effective access time but results in a greater complexity in the needed cache coherence protocol

VM Extracts0 or 1 copies,managed by

the VM system

VirtualMemory

(VM)Implementation

of VM

Classic SMP CC-NUMA COMA

VM Extracts0..N copies

Managed by cache coherence



Memory



VM Extracts0 or 1 copies,managed by

the VM system

VM Extracts0 or 1 copies

VM Extracts0 or 1 copies



Cache-coherentinterconnect

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Proc.

Cache

Memory Memory MemoryMemory

Page 33

RJ Chevance

Performance Improvement Provided by SMP n A number of factors prevents an N-processor

SMP having N times the performance of a uniprocessor e.g.o Contention for the bandwidth of the processor-

memory interconnect increases with the number of processors

o Contention for shared memory increases with the number of processors

o Maintaining cache coherence means that cache line invalidations, and movement of data from cache to cache, must occur

o Reduction of cache hit rate as a result of more context switches

o An increase in the number of instructions which must be executed to implement a given function because of the existence of locks and the contention for them

n Very few data publicly available about scalability of SMP

Page 34

RJ Chevance

Performance Improvement Provided by SMP(2)n Evolution of the behavior of a system multiprocessor

(transactional application and DBMS) according to the successive versions of software showing the importance of improvements in the OS and the DBMS (the hardware being unchanged)

0

1

2

3

4

5

1 2 3 4 5 6 7 8

Number of Processors

Rel

ativ

e P

erfo

rman

ce

Version i+1

Version i

Page 35

RJ Chevance

Advantages and disadvantages of SMP architecture

Advantages Disadvantages. Scalability of the system at a moderate cost. . Existence of at least one single point of failure - the

operating system. Simple performance increase by adding a card or a processor module

. Maintenance or update activities on the hardware and the OS generally require the whole system to be stopped

. Multiprocessor effectiveness - the system performance increases (within definite limits) as the num ber of processors grows

. Limitation of the number of processors because of access contention in the hardware (e.g., the bus) and software (operating system and DBMSs).

. Simplicity of the programming model . Upgrade capability limited by rapid system obsolescence versus processor generations

. Application transparency - single processor applications continue to work (although only multi- threaded applications can benefit from the architecture)

. Complexity of the hardware.

. Availability: if one processor fails, the system may be restarted with fewer processors

. Adaptation and expensive tuning of the operating system.

. The ability to partition the system . Necessary application adaptation to benefit from the performance available.

Page 36

RJ Chevance

Partitioningn Partitioning divides a single physical SMP system into a number of

independent logical SMPs, each logical SMP running its own OS. Examples of situations where partitioning is useful:o when consolidating several servers into a single system (for investment

rationalization, for example), while maintaining the apparent independence of the servers;

o sharing the same systems between independent activities whose balance changes over time

n Example of system partitioning:

Proc.

Cache

Proc.

Cache

Proc.

Cache

Memory

SMP Physical Configuration

Proc.

Cache

Proc.

Cache

Production System

Proc.

Cache

Proc.

Cache

Test System forNew Versions

(Operating Systemand/or Applications)

Proc.

Cache

Proc.

Cache

DevelopmentSystem

MemoryMemoryMemory

Page 37

RJ Chevance

Partitioning(2)n Techniques of Partitioning

o Virtual Machines (introduced in the late 60s for mainframes and widely used, being now used for servers based upon standard technologies)l Virtual Machine - Principles of Operation

l Each virtual machine has its own address space and Operating System (all can be different)

l The Hpervisor executes privilegedmode instruction issued by guest OS onto real resources

Hypervisor

Proc.Proc. PhysicalPage

PhysicalPage Storage

OperatingSystem

OperatingSystem

HardwareResources

Virtual Machine 1 Virtual

Machine N

App

licat

ion

App

licat

ion

App

licat

ion

App

licat

ion

Page 38

RJ Chevance

Partitioning(3)n Hardware Partitioning

o Consists of defining, for a given physical system configuration, a number of logical systems, each with its own allocation of hardware resources (processor, memory, controllers and peripherals)

n Logical Partitioningo Logical partitioning does not require dedication of the physical resources to

logical systems; rather, the resources are managed by a special operating system which dynamically multiplexes the physical resources correctly between the logical systems

Proc.Proc. Physical Page

Physical Page Storage

OperatingSystem

OperatingSystem

HardwareResources

LogicalMachine 1

LogicalMachine 1

App

licat

ion

App

licat

ion

App

licat

ion

App

licat

ion

Virtual SystemAbstraction Layer

Virtual SystemAbstraction Layer

System Abstraction Layer

Hyp

ervi

sor

Res

ourc

eM

anag

emen

t

Page 39

RJ Chevance

Partitioning(4)

n Dynamic partitioning is key for the flexibility of the operation (otherwise the whole system must be stopped, reconfigured and restarted). Virtual Machines and Logical Partitioning facilitate dynamic partitioning

n Workload Managers:o Complements an OS by ensuring that a given

application (or set of applications) can be afforded a guaranteed slice of the actual physical systems resources (e.g, physical memory, processor time.)

o A Resource Manager reserves sufficient resources to ensure that critical work may be performed as required, while allowing a system so support multiple concurrent applications. When the critical work does not use all pre -allocated resources, the unused resource slots may be used by other applications

Page 40

RJ Chevance

Server Consolidationn The pressure on costs (TCO reduction), information managers are

looking at reducing the number of servers deployed. Several approaches are proposed such as:o physical concentration were several servers are regrouped in the same

location possibly in a cluster configuration (offering high availability if needed);

o concentration were all applications distributed on several servers are supported on a single system (i.e. in a way similar to the traditional mainframe) with technologies such as:l Virtual machine approach l Workload management were a single operating system supports

simultaneously several applications. Resource sharing between applications is being managed according to a set of stated rules.

o data centric were data is supported by a single (high availability) system, offering to application servers the data service (application servers does not support data) ;

o application integration were various applications and associated data are concentrated on a smaller set of servers in order to facilitate the cooperation between applications.

n The choice of a solution depends heavily on the characteristics of the existing configuration and the set of constraints. Obviously, reducing the number of servers contributes to reduce the TCO. Both cost and business continuity issues may prevent to go further than physical concentration.

Documents

ServerArchitectures SMPSymmetricMulti Processors