A Case Study Julie J. Smith Wachovia Corporation Superdome Implementation at a Glance

A Case Study

Julie J. SmithWachovia Corporation

Superdome Implementation at a Glance

Agenda Business Challenges Strategy Consolidation Superdome Executive Summary Project Scope and Timeline Migration Philosophy Migration Phases Current Status

Business Challenges Last year First Union and Wachovia initiated

a merger, creating the nations Fourth largest financial services institution

Corporate initiative to relocate remote data centers to a centralized location

Meet and exceed OCC availability and disaster recovery objectives

Increase stockholder value by reducing total cost of ownership, consolidating current environment, and leveraging new technologies

Strategy Identify projects which had the most to

gain by: consolidating to a new hardware platform and

reducing support maintenance costs providing high availability architecting a more robust disaster recovery

platform TCO Study was completed by HP and

helped to identify which projects would benefit the most from consolidation, upgrades, etc.

Why we chose Consolidate Reduce maintenance cost by eliminating

older or obsolete hardware Reduce number of servers to maintain

while actually increasing total TPM’s from 768,000 to 1,600,000

Provide robust hardware environment and increasing high availability components

Provide hardware foundation for disaster recovery initiatives

Why we chose Superdome Scalability High-Availability features inherent

to the hardware Performance data

Hardware Identified for Consolidation (6) V Class Servers (10) K Class Servers (4) N Class Servers (2) T Class Servers

Executive Summary Migrate 22 servers representing four

business units to four Superdome servers (2 production and 2 DEV/DR). Includes over 50 applications residing on 22 servers, approximately 15 Terabytes of storage in production, development, and test

Facilitate current data center move and enable applications to meet merger objectives.

Project Scope Install four Superdome 64000 servers in new data

centers Attach to Next-Generation network infrastructure

which is completely redundant Attach to all SAN storage and reconfigure I/O layout Implement new backup strategy using TSM Re-design new High-Availability Clusters and

packages Migrate existing applications to Superdome Perform full Disaster Recovery tests for each project Decommission obsolete hardware

TimelineMay October

Just Six Months….

Moratorium MergerComplete

Migration PhilosophyRely on what we know works HP LVM MC Service GuardMake improvements that must be made:

Network SAN Disk and LVM layout Backup/Recovery software MC Service Guard Basic DR implementation Maintenance strategy

Phases of Migration1. Discovery2. Design3. Implementation4. Migration5. Support and Maintenance6. Future

DiscoveryWhat can improve? performance availability disk and file

system layout backup windows network

infrastructure system standards

What works well?Humm….

Design Four Superdomes in identical

configuration 2 Superdomes in Production

Cluster at one location 2 Superdomes in Dev/DR Cluster at

alternate location 4 partitions each 4 MC Service Guard Clusters

Fibre Channel100BaseT

MC/SG Heartbeat for Partition 1 cluster

SD #1

Users LAN Backbone - primary *

SMSPC Console

AdvanceStack Hub-8TXEPower

1 3 5 10

15+

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 81 515

30

65+

1X 2X 3X 4X 5X 6X 7X8MDI

-X8MDI

100Base-TXClass II

Repeater

HPJ3235

A

Mgmt. LANConsoleLAN

Hub

5 5

8 8

SD #2

5 5

88

Users LAN Backbone - stand-by *

Storage Area Network

EMCSYMMETRIX

3830

2EMCSYMMETRIX

3830

2EMCSYMMETRIX

3830

2EMCSYMMETRIX

3830

2




AdvanceStack Hub-8TXEPower

1 3 5 10

15+

1 2 3 4 5 6 7 8

1 2 3 4 5 6 7 81 515

30

65+

1X 2X 3X 4X 5X 6X 7X8MDI

-X 8MDI

100Base-TXClass II

Repeater

HPJ3235

A

( Peripheral cabinet for each SuperDome is not shown)

SD #2SD 64-way with:

4 hardware partitions16 cells

44 active CPUs20 iCOD CPU128 GB RAM

Each partition has:2 x I/O chassis

2 x Core I/O card5 x FC

3 x Ultra2 LVD SCSI2 x 4-port 100 BaseT card

(Partition 4 (HR) has 4 x 4-port 100BaseT card)EMC Storage Subsystems(will be specified by customer)

SD #1SD 64-way with:

4 hardware partitions16 cells

44 active CPUs20 iCOD CPU128 GB RAM

Each partition has:2 x I/O chassis

2 x Core IO card5 x FC

3 x Ultra2 LVD SCSI2 x 4-port 100 BaseT card

(Partition 4 (HR) has 4 x 4-port 100BaseT card)

Production SuperDome Configuration(Test/DR SuperDome has the exactly same configuraion as the Production SuperDome)

Server Mapping to Superdome PartitionsPartition based

Failover (Four Partitions per eachSuperDom

e)

Commercial

V2200/10-CPU

Commercial

V2200/10-CPU

Partition 1 (4 Cells, 8 Active CPUs)







Partition 1 (4Cells, 8 Active CPUs)

SD #164-Way

16 Cells, 44 CPUs, 128 GB Memory

SD #264-Way

16 Cells, 44 CPUs, 128 GB Memory

EIM

V2600/10-CPU

K410/4-CPU

Finance

K570/6-CPU

K580/4-CPU

Finance

N4000/4-CPU

K410/4-CPU

K100/1-CPU

EIM

V2250/12-CPU

V2200/10-CPU

K410/4-CPU

HR

V2200/10-CPU

V2200/6-CPU

HR

V2200/10-CPU

Storage Design Current layout

Some direct attached arrays

Some attachment via SAN Outdated EMC Timefinder

version used for database replication

Logical volumes are not striped and possibly on single spindle

Disk array is mirrored and not striped

2 fibre controllers on each host

Target layout All SAN attached storage Install current EMC

Timefinder version and rewrite production scripts to replicate database and to backup file systems offline

Stripe logical volumes utilizing LVM Distributed options to optimize across multiple spindles

Continue with mirrored array configuration

4 Fibre controllers on each partition to attach to storage

Volume Group & Logical Volume Layout Scripts are used to build each

environment Run Inq and create device list Pvcreate Vgcreate with alternate links and

physical volume groups (PVG) Lvcreate (lvcreate –D y –s g –r N) Newfs & fsck Create vg map files & ftp to alt system

Naming standards Vgpkg10 /pkg

Disaster Recovery Strategy Currently all tape recovery from

TSM master at remote location Future design is still being

determined, but is based on Business Unit’s priority and budget.

MetroCluster and Oracle Replication are being evaluated.

Current Status To date, there are 6 production

applications that are live There are over 40 more to go

between now and mid-October DR testing to be complete by October Performance tests are still being run,

but tests show very positive results…. We’re SMOKIN’!

Questions or DiscussionThanks for attending

Documents

A Case Study Julie J. Smith Wachovia Corporation Superdome Implementation at a Glance