29
Current Trends in Data Center COMMISSIONING RICHARD L SAWYER, Strategist - HP Critical Facilities ACG– Chicago April 2013 AGENDA: WHAT IS A DATA CENTER? DIRTY LITTLE SECRET RISK MITIGATION LEVERAGING COMMISSIONING USING FAILURE TO SUCCEED

Current Trends in Data Center COMMISSIONING

Embed Size (px)

DESCRIPTION

Current Trends in Data Center COMMISSIONING. RICHARD L SAWYER, Strategist - HP Critical Facilities ACG– Chicago April 2013. AGENDA: WHAT IS A DATA CENTER? DIRTY LITTLE SECRET RISK MITIGATION LEVERAGING COMMISSIONING USING FAILURE TO SUCCEED. PAN. Status. Alarms. 1. - PowerPoint PPT Presentation

Citation preview

Page 1: Current Trends in Data Center COMMISSIONING

Current Trends in Data CenterCOMMISSIONING

RICHARD L SAWYER, Strategist - HP Critical Facilities

ACG– ChicagoApril 2013

AGENDA:• WHAT IS A DATA

CENTER?• DIRTY LITTLE SECRET• RISK MITIGATION• LEVERAGING

COMMISSIONING• USING FAILURE TO

SUCCEED

Page 2: Current Trends in Data Center COMMISSIONING

What is a Data Center?• By NFPA 70: “Critical

Operations Data System”• By Clients: Where ever I

process data.• By Commissioning

Agents: A power intensive critical space.

PAN

1

Status AlarmsHeatingCoolingDehumidificationHumidification

High TemperatureLow TemperatureLoss of Air FlowHigh HumidityLow HumidityChange Air FiltersLocal Alarm

ON

ALARMPRESENTSILENCE

Liebert system 3

OPEN

Page 3: Current Trends in Data Center COMMISSIONING

Successful Data Center Operations Start with Commissioning• Data Centers are designed to a certain

availability expectation to meet business goals.

• Whether or not they meet the designed goal depends on the contractor.

• Commissioning is the only way to assure the availability of the design is achieved in practice!

Page 4: Current Trends in Data Center COMMISSIONING

It’s all about availability!

Tier 1

Single Generator

or No Generator

Basic UPS for LAN Room,

non-redundant

Single Utility or on Radial line from Loop

99.671% Availability per

Uptime Institute

Tier 2

Generator

N+1 UPS with redundant components

Single Utility Feeders, N+1 Mechanical

System

99.741% Availability per Uptime Institute

Tier 3- Concurrently Maintainable

N+1 Generator System

N+1 UPS with redundant components

One Active, One Passive, Utility Source, N+1

Mechanical System

99.982% Availability per

Uptime Institute

Tier 4- Fault Tolerant

2N Generator System

2N UPS Systems

Dual Active Utility Feeders, 2N

Mechanical System, compartmentalization

99.995% Availability per

Uptime Institute

April 19, 20234

Data Centers have specified design features.

These are investments to deliver a specified availability…….

Page 5: Current Trends in Data Center COMMISSIONING

The cost is huge: Availability is expensive!

April 19, 20235

1. Data center tier costs increase per sq. ft. (sqM) costs2. As tier level increases, build cost rises.3. Costs of Tier IV are almost double those of Tier II.

Tier II, III, IV build costs ($/sq. ft.) related to power density

$-

$500

$1,000

$1,500

$2,000

$2,500

$3,000

$3,500

$4,000

$4,500

$5,000

50 w/sf 100 w/sf 150 w/sf 200 w/sf 250 w/sf 300 w/sf

Tier IV

Tier III

Tier II

HP data, based on a 40,000 sq. ft. raised-floor data center.

A 20K sf Tier III data center costs $35 Million @ 150 w/sf

Page 6: Current Trends in Data Center COMMISSIONING

And the IT investment is even larger!

• A 20,000 square foot data center built to 150 watts/square foot can accommodate 800 racks of IT equipment @3.75 kW per rack.

• This 3,000 to 10,000 servers depending on architecture, form factor and configuration.

• The IT investment in hardware, software and service can amount to 5 to 8 times the data center facility investment.

April 19, 20236

Can you safely assume the data center investment will work as designed from Day One?

Page 7: Current Trends in Data Center COMMISSIONING

Availability interdependency

End-to-end availability is the product of the availability of the IT Architecture times the availability of the Facility Infrastructure (FI).

(Tier 3 FI x MS Server) = Total availability

99.982% x 99.202% = 99.184%

IT architecture and facility infrastructure are interdependent in meeting the data center goal. . . . . the speed of IT recovery is dependent on the speed of facility recovery!

Formula: (Availability of IT) X (Availability of FI) = Total End-to-End Availability

Page 8: Current Trends in Data Center COMMISSIONING

Dirty Little Secret: Data Centers Fail

Failure is:ExpensiveInevitablePredictableManageableUseful

Page 9: Current Trends in Data Center COMMISSIONING

5 YEAR PROBABILITY OF FAILURE

Failure is Inevitable

AFCOM 2007: “Understanding Tier Systems”, Tom Roberts, Rick Sawyer

Page 10: Current Trends in Data Center COMMISSIONING

Predictability of Failure

Page 11: Current Trends in Data Center COMMISSIONING

Utility Utility

UPS

Bypass Bypass

Static

Switch

PDU

Primary Bus 1 Primary Bus 2

UPS

Critical

Load

G GOption 2N2 Utilities

2 Generators

2 ATS

2 UPS Systems

STS

MTBF = 315,766 hours

Availability = 99.9985%

Probability of Failure in

5 years = 12.95%Failure is Predictable

Page 12: Current Trends in Data Center COMMISSIONING

Good News! Failure is Manageable

STRATEGY TO SURVIVE:• Design to Survive• Map Foreseeable

Failures• Develop SOP’s, MOP’s,

EOP’s• Commission! Test,

Document, Train

Page 13: Current Trends in Data Center COMMISSIONING

Absence

Initial

Repeatable

Managed

Defined

Optimizing

No dedicated data center, processing is in office space

Data Center is basic server or network room, in a dedicated space having minimal dedicated infrastructure systems

Data center has dedicated cooling, generators, UPS, fire, security and monitoring systems

Data center has concurrent maintainability features

Data center systems have redundant features for resiliency (N+1)

Fault tolerant system features (2N)

Design to Survive

Using ITSM Capability Maturity Model to assess Facility Infrastructure Design

Page 14: Current Trends in Data Center COMMISSIONING

M

M

CRAC CRAC CRAC CRAC

pdu

UPS

Cold

Aisle

pdu

UPS

Hot

Aisle

pdu

UPS

pdu

UPS

F

I

R

E

S

E

C

U

R

HEAT

REJECT

HEAT

REJECT

EPO

SYSTEM

MONITOR

WEBLINK

Zoned Availability- Scalable Mission Critical infrastructure using Central UPS and Rack based UPS for 2N redundancy

Site Availability – 99.995%

Hot

Aisle

Cold

Aisle

Cold

Aisle

CRACUPS

Battery

Central UPS for one “N” side, scalable

UPS System

Rack based UPS Systems as needed for

2N redundancy

Page 15: Current Trends in Data Center COMMISSIONING

Map Foreseeable FailuresSPOF Matrix - Common Single Points of Failure

Check observed SPOFs found in the survey

Electrical There is one utility supply with no standby generator.

Multiple generators are connected via a single paralleling switchgear

There is one transfer switch where the generator and utility are switched.

The UPS and Static Bypass are fed off of the same circuit breaker.

The UPS output distribution is controlled by one circuit breaker.

The UPS synchronization is controlled by one external circuit.

There is one electrical path to the critical load with no redundancy or automatic bypass provisions.

There is one step-down transformer in the critical electrical path, or step down transformers are in series if multiple.

There is one static switch in series with the UPS output.

All power is fed through one piece of supply electrical switchgear.

There is an EPO circuit that disconnects all electrical power.

There is a switchgear ground fault protection circuit that disconnects all electrical power distribution.

All power is fed through one piece of electrical distribution switchgear to the critical load

There is one set of electrical cables from utility supply to critical power supplies.

There is one set of electrical cables from critical power supply to critical power distribution.

The HVAC critical cooling system is supplied from one motor control center.

The HVAC critical cooling system is supplied from one piece of distribution switchgear.

The heat rejection system (i.e., cooling towers) are fed from one electrical distribution point.

Critical pumps are fed/controlled from one electrical distribution point.

HVAC Water supply is from one distribution point.

The chilled water piping system is a single loop system.

The condenser water piping system is a single loop system.

The glycol piping system is non-redundant.

There are no redundant air handling units supplying the critical load areas.

The building management system can only be operated/controlled from a single point.

The building management system is required for default HVAC system operation.

The water treatment system is not monitored for free chlorine content or biological contamination.

There is only one method, or piece of equipment to provide adequate critical space cooling.

The heat rejection system is non-redundant.

The fire detection system interrupts air flow to the critical load spaces without verifying sensors.

There is an EPO circuit that interrupts cooling to the critical load.

There are common valves that can fail, interrupting chilled water, condenser water or supply water.

Page 16: Current Trends in Data Center COMMISSIONING

Test, Document, Train

Page 17: Current Trends in Data Center COMMISSIONING

Develop MOP’s, SOP’s, EOP’s

Real time monitoring, continuous improvement features

Absence

Initial

Repeatable

Managed

Defined

Optimizing

No operational processes formally in place or measured

Maintenance and operations are not site specific or complete, ad hoc and depend on staff memory/knowledge

Standard, Maintenance and Emergency Operating Procedures exist and are site specific

Procedures are associated with asset management systems and are tracked to completion, effectiveness

Documentation is complete, available, compliance is measured and trended

Automate Servers

Automate Networks

Automate Storage

1

Runbook Automation

3

3

2

2

2

Page 18: Current Trends in Data Center COMMISSIONING

O&M MGE EPS 8000UPS System A, Module 01

Based on best available data 05/11- Verify against As-Builts

• Simplified One-Line power supply diagram

• Simplified One-Line UPS system diagram

– Normal power flow diagrams

– Emergency power flow diagrams

– Automatic Transfer Control diagram

• Location of equipment

• Start-Up and Shut-Down procedure

• Emergency response procedure

• Recommended maintenance practices

• Reference Engineering Prints

• Reference MGE EPS 8000 Operations and Maintenance Manual

Page 19: Current Trends in Data Center COMMISSIONING

SG-3A01SG-3A02

SG- 3B01SG-3B02

B-3A04B-3A29

B-3B04B-3B33

kk

ATS-31A01 ATS-31B01

13.8 kV

480V

13.8 kV

480VT-31A01 T-31B01

Automatic Transfer Control

CB-01A001CB-01A002

CB-01B001CB-01B002

SG-01A01 SG-01B01

SG-01A02 SG-01B02

Load Bus Synchronization Control

Bypass Power Flow to UPS A01

For Maintenance on Modules or Module Failure Mode

To SG-01A03Critical UPS Load A

To SG-01B03Critical UPS Load B

UPS Systems A01 & B01

From SG-0A04 From SG-0A04

NONO

NO NC

NCNC

Based on best available data 05/11- Verify against As-Builts

Page 20: Current Trends in Data Center COMMISSIONING

Process for installing a new IT server

Install in rack

Order DeliveryPhysical Inspection

Software verification

Data test of software

Burn-in functional test

Firmware verification

Network assignment

Integration with existing

systems

Online production

Page 21: Current Trends in Data Center COMMISSIONING

Process for “installing” a new datacenter

Construct Physical inspection

Failure mode tests

Design

System-level tests

Capacity tests

Equipment startup

Equipment tests

Controls and monitoring

tests“Pull-the-plug” integrated test

Turn over to IT and Operations

Page 22: Current Trends in Data Center COMMISSIONING

The Value of Commissioning• Assures design performance is

achieved following construction• Verifies performance levels

– Capacity– Availability (redundancies)

• Provides documentation base for SOP’s, MOP’s, and EOP’s

• Opportunity for “hands-on” training of operations staff which they may never see for years!

– Video taping of procedures– Monitoring and alarm testing

with response procedures– “New Employee” training guide

development

IT investment is 3-5X the data center investment. Commissioning assures the IT architecture support systems work, and can be recovered quickly when they fail.

Page 23: Current Trends in Data Center COMMISSIONING

Leverage Facility Commissioning1. Involve everyone: IT,

management, vendors, contractor, engineers and operating staff.

2. Manage your documents – capture everything methodically.

3. Test everything that can be safely tested.

4. Video tape procedures, especially risk mitigation procedures for SPOF’s.

Know your data center!

Page 24: Current Trends in Data Center COMMISSIONING

Commissioning Trends

• Standardized procedures to test standardized systems

• Capacity testing to verify efficiency at all load levels• Staff training during the commissioning process• Video taping of test procedures for future training• Integrated testing of raised floor areas before IT

equipment is installed• Digital data logging of system performance during

commissioning to lower cost and provide better information.

Page 25: Current Trends in Data Center COMMISSIONING

Typical Integrated Test

April 19, 202325

Utility

UPSBypass

Static Switch

PDU

Primary Bus 1

UPS

CriticalLoad

G G

Load banks are installed to simulate critical load

Static switch sources are failed to test performance

UPS redundancy is tested by failing modules and system

Utility is failed to test transfer switch and generator

performance

Generator capacity and redundancy is tested by failing

units

Digital meters record

performance at critical load

Page 26: Current Trends in Data Center COMMISSIONING

Things happen……

Page 27: Current Trends in Data Center COMMISSIONING

Use Failure as an Opportunity

• When you’re down, you’re down.• Use the downtime to access, maintain or

modify systems you can’t get to any other time– Verify breaker operation – “retro commission”!– Inspect and repair equipment in a powered down

condition– Tie in valves and breakers for future use– Test systems and operations procedures

Plan recovery procedures to leverage downtime opportunity for maintenance, testing and training!

Page 28: Current Trends in Data Center COMMISSIONING

Summary• Modern office building contain high power data

center spaces• Availability of those spaces is a key client

demand• Design can only do so much, performance must

be proven- Through Commissioning!• Actual availability is an operational issue.• Data center performance is contingent on a

strong commissioning program from the start!

Page 29: Current Trends in Data Center COMMISSIONING

Questions?

Richard L. SawyerStrategist, HP Critical Facility [email protected]