Upload
cathleen-reynolds
View
221
Download
6
Embed Size (px)
Citation preview
Fabric Area Overview
InfrastructureElectricity, Cooling, SpaceInfrastructureElectricity, Cooling, Space
NetworkNetwork
Batch system (LSF, CPU server)Batch system (LSF, CPU server)
Storage system (AFS, CASTOR, disk server)Storage system (AFS, CASTOR, disk server)
Purchase, Hardware selection,Resource planningPurchase, Hardware selection,Resource planning
InstallationConfiguration + monitoringFault tolerance
InstallationConfiguration + monitoringFault tolerance
Prototype, TestbedsPrototype, Testbeds
Benchmarks, R&D,ArchitectureBenchmarks, R&D,Architecture
Automation, Operation, ControlAutomation, Operation, Control
Coupling of components through hardware and software
GRID services !?GRID services !?
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
Current Machine Room LayoutProblem:Normabarres run one way, services run the other….
Services
Services
Services
Services
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
With the preparations for LHC we have the opportunity to remodel the infrastructure.
528 box PCs 105kW1440 1U PCs 288kW324 disk servers 120kW(?)
Future Machine Room Layout
18m double rows of racks12 shelf unitsor 36 19” racks
9m double rows of racks for critical servers
Aligned normabarres
Building Fabric — I B513 was constructed in the early 1970s and
the machine room infrastructure has evolved slowly over time.– Like the eye, the result is often not ideal…
With the preparations for LHC we have the opportunity to remodel the infrastructure.– Arrange services in clear groupings associated with
power and network connections.» Clarity for general operations plus ease of service restart
should there be any power failure.
– Isolate critical infrastructure such as networking, mail and home directory services.
– Clear monitoring of planned power distribution system.
Just “good housekeeping”, but we expect to reap the benefits during LHC operation.
Building Fabric — II Beyond good housekeeping, though, there are
building fabric issues that are intimately related with recurrent equipment purchase.– Raw power: We can support a maximum equipment
load of 2.5MW. Does the recurrent additional cost of blade systems avoid investment in additional power capacity?
– Power efficiency: Early PCs had power factors of ~0.7 and generated high levels of 3rd harmonics. Fortunately, we now see power factors of 0.95 or better, avoiding the need to install filters in the PDUs. Will this continue?
– Many sites need to install 1U or 2U rack mounted systems for space reasons. This is not a concern for us at present but may become so eventually.
» There is a link here to the previous point: the small power supplies for 1U systems often have poor power factors.
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Fabric ArchitectureLevel of complexity
Batch system, load balancing,Control software, Hierarchical Storage Systems
HardwareHardware SoftwareSoftware
CPUCPU
Physical and logical couplingPhysical and logical coupling
DiskDisk
PC PC Storage tray,NAS server,SAN element
Storage tray,NAS server,SAN element
Motherboard, backplane,Bus, integrating devices(memory,Power supply, controller,..)
Operating system, driver
Network (Ethernet, fibre channel, Myrinet, ….)Hubs, switches, routers
ClusterCluster
World wide clusterWorld wide cluster Grid middleware Wide area network
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics– The batch scheduler– Chip technology– Processors/box– The operating system– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics
» Not much we in IT can do here!
– The batch scheduler– Chip technology– Processors/box– The operating system– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler
» LSF is pretty good here, fortunately.
– Chip technology– Processors/box– The operating system– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology
» Take hyperthreading, for example. Tests have shown that, for HEP codes at least, hyperthreading wastes 20% of the system performance running two tasks on a dual processor machine. There are no clear benefits to running with hyperthreading enabled when running three tasks. What is the outlook here?
– Processors/box– The operating system– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box
» At present, a single 100baseT NIC would support the I/O load of a quad processor CPU server. Quad processor boxes would halve the cost of networking infrastructure—but they come at a hefty price premium (XEON MP vs XEON DP, heftier chassis, …). What is the outlook here?
And total system memory becomes an issue.
– The operating system– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box– The operating system
» Linux is getting better, but things such as processor affinity would be nice.
Relationship to hyperthreading…
– Others?
Batch Subsystem Looking purely at batch system issues, TCO is
reduced as the efficiency of node usage increases. What are the dependencies?– The load characteristics – The batch scheduler– Chip technology– Processors/box– The operating system– Others?
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Storage subsystem
Processors “desktop+” node == CPU server
CPU server + larger case + 6*2 disks == Disk server
CPU server + Fiber Channel Interface + tape drive == Tape server
Simple building blocks:
Storage subsystem — Disk Storage TCO: Maximise available online capacity within
fixed budget (material & personnel).– IDE based disk servers are much cheaper than high
end SAN servers. But are we spending too much time on maintenance?
» Yes, at present, but we need to analyse carefully the reasons for the current load.
Complexities of Linux drivers seem under control, but numbers have exploded. And are some problems related to batch of hardware?
– Where is the optimum? Switching to fibre channel disks would reduce capacity by factor of ~5.
» Naively, buy, say, 10% extra systems to cover failures. Sadly, this is not as simple as for CPU servers; active data on the servers must be reloaded elsewhere.
» Always have duplicate data? => purchase 2x required space. Still cheaper than SAN? How does this relate to …
Storage System — Tapes The first TCO question is “Do we need them?” Disk storage costs are dropping…
Disk Price/Performance Evolution
price in SFr per GByte
1
10
100
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
time since Jan 2000
SF
r/G
B
40 GB disk
60 GB disk
80 GB disk
120 GB
160 GB
180 GB
200 GB
disk server
factor 6 in 3 years
factor 2.5 difference
Non-mirrored disk server
Storage System — Tapes The first TCO question is “Do we need them?” Disk storage costs dropping… But
– Disk servers need system administrators, idle tapes sitting in a tape silo don’t.
– With disk only solution, we need storage for at least twice the total data volume to ensure no data loss.
– Server lifetime of 3-5 years; data must be copied periodically.
» Also an issue for tape, but the lifetime of a disk server is probably still less than the lifetime of a given tape media format.
Assumption today is that tape storage will be required.
Storage System — Tapes Tape robotics is easy.
– Bigger means better cost/slot.
Tape drives: High end vs LTO– TCO issue: LTO drives are cheaper than high end IBM
and STK drives, but are they reliable enough for our use?
» c.f. the IDE disk server area.
Real problem, though is tape media.– Vast portion of the data is accessed rarely but must
be stored for long period. Strong pressure to select a solution that minimises an overall cost dominated by tape media.
Storage System — Managed Storage Should CERN build or buy software systems? How to measure the value of a software system?
– Initial cost:» Build: Staff time to create required functionality» Buy: Initial purchase cost of system as delivered plus staff time
to install and figure for CERN.
– Ongoing cost» Build: Staff time to maintain system and add extra functionality» Buy: License/maintenance cost plus staff time to track releases.
Extra functionality that we consider useful may or may not arrive.
Choice:– Batch system: Buy LSF.– Managed storage system: Build CASTOR.
Use this model as we move on to consider system management software.
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Installation and Configuration Reproducibility and guaranteed homogeneity of
system configuration is a clear method to minimise ongoing system management costs. A management framework is required that can cope with the numbers of systems we expect.
We faced the same issues as we moved from mainframes to RISC systems. Vendor solutions offered then were linked to hardware—so we developed our own solution.
Is a vendor framework acceptable if we have a homogeneous park of Linux systems?– Being honest, why have we built our own again?
Installation and Configuration Installation and configuration is only part of the
overall computer centre management:
ELFms architecture
NodeConfiguration
SystemMonitoring
System
InstallationSystem
Fault MgmtSystem
Installation and Configuration Installation and configuration is only part of the
overall computer centre management: Systems provided by vendors cannot (yet) be
integrated into such an overall framework. And there is still a tendency to differentiate
products on the basis of management software, not raw hardware performance.– This is a problem for us as we cannot ensure we
always buy brand X rack mounted servers or blade systems.
– In short, life is not so different from the RISC system era.
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Monitoring and Control Assuming that there are clear interfaces, why
not integrate a commercial monitoring package into our overall architecture?
Two reasons:– No commercial package meets (met) our
requirements in terms of, say, long term data storage and access for analysis.
» This could be considered self serving: we produce requirements that justify a build rather than buy decision.
– Experience has show, repeatedly, that monitoring frameworks require effort to install and maintain, but don’t deliver the sensors we require.
» Vendors haven’t heard of LSF, let alone AFS.» A good reason!
Hardware Management System A specific example of the
integration problem. Workflows must interface to local procedures for, e.g., LAN address allocation. Can we integrate a vendor solution? Do complete solutions exist?
Request New Machine Install [FIO/IS] Decide New Identity [FIO/OPT]
Install [FIO/IS]
Request Physical Machine Install [FIO/OPT]Physically Install Machine [DCS]
Connect to Network [CS]
Check and Update Information [FIO/OPT]
Request Network Connection [FIO/OPT]
Remedy/HMSFIO/OPT
Import Node Map
FIO/IS
Raise Ticket
Retire Node
DCS
Raise Ticket
Move Machine
Perform db updates & checks
Raise Ticket
Install S/W & put in prod'n
Close Ticket
Remedy/PRMS
Observe
Change Status
Remedy/DCS
Observe
Close Ticket
Change Status
Observe
Close Ticket
Close Ticket
CS
Change Status
Req. n/w conn & dns entry
Update CS DB & DNS
Observe
Confirmation email
Console Management Done poorly now:
We will do better:
TCO issue: Do the benefits of a single console management system outweigh costs of developing our own? How do we integrate vendor supplied racks of preinstalled systems?
Console Management
xxx
pcitfionnn
lxplusnnn
Userapp
CDB – config service
• Machine – port @ head node mapping
• User – machine authorisations
Console server 1
Serverproc
conf
log
Machine 1.1
Machine 1.44
.
.
.
.
RS/232
Console server 75
Serverproc
conf
log
Machine 75.1
Machine 75.44
.
.
.
.
…
Console logrepository
xxx
pcitfionnn
lxplusnnn
Userapp
lxplusnnn
Userapp
CDB – config service
• Machine – port @ head node mapping
• User – machine authorisations
Console server 1
Serverproc
conf
log
Console server 1
Serverproc
conf
log
Machine 1.1
Machine 1.44
.
.
.
.
RS/232
Console server 75
Serverproc
conf
log
Console server 75
Serverproc
conf
log
Machine 75.1
Machine 75.44
.
.
.
.
…
Console logrepository
Agenda Building Fabric
Batch Subsystem
Storage subsystem
Installation and Configuration
Monitoring and control
Hardware Purchase
Hardware Purchase The issue at hand: How do we work within our
purchasing procedures to purchase equipment that minimises our total cost of ownership?
At present, we eliminate vast areas of the multi-dimensional space by assuming we will rely on ELFms for system management and Castor for data management. Simplified[!!!] view:– CPU: White box vs 1U vs blades; install or ready
packaged– Disk: IDE vs SAN; level of vendor integration
HELP! Can we benefit from management software that
comes with ready built racks of equipment in a multi-vendor environment?