20
Infrastructure Reliability Common Systems Group Experience @ UW Madison Roger Hanson 5 Jan 2005

Infrastructure Reliability Common Systems Group Experience @ UW Madison Roger Hanson 5 Jan 2005 Common Systems Group Experience @ UW Madison Roger Hanson

Embed Size (px)

Citation preview

Infrastructure ReliabilityInfrastructure Reliability

Common Systems GroupExperience @ UW Madison

Roger Hanson5 Jan 2005

Common Systems GroupExperience @ UW Madison

Roger Hanson5 Jan 2005

2

University of Wisconsin-MadisonUniversity of Wisconsin-Madison

3

OverviewOverview

• Basics – Redundant Hardware

• Test Environments

• Change Management

• Version control

• Testing processes

• Collaboration

• Service Management

• Basics – Redundant Hardware

• Test Environments

• Change Management

• Version control

• Testing processes

• Collaboration

• Service Management

4

BackgroundBackground

• MyUW Portal

• WiscMail campus Mail Service

• In Production in 2001

• New Complex Environments

– Layer 4 Switching

– Directory Enabled Systems

– ES Storage Area Networks

• MyUW Portal

• WiscMail campus Mail Service

• In Production in 2001

• New Complex Environments

– Layer 4 Switching

– Directory Enabled Systems

– ES Storage Area Networks

5

• Campus Portal

• Access to over 130 modules

• 1.8M Logins in Sept. 04

• 49K+ Unique Logins in Sept.

• Campus Portal

• Access to over 130 modules

• 1.8M Logins in Sept. 04

• 49K+ Unique Logins in Sept.

6

Hardware - PortalHardware - Portal

Sun E280

Dell

Apache Web Server 2

WLS Plugin

Sun E280

WLS Node 1

EFS

My UWMadison

WL

S A

dm

inis

tra

tive S

erv

er

Layer 4 Switch 2Layer 4 Switch 1

Dell

Apache Web Server 1

WLS Plugin

Sun E450

OracleApplicationDatabase

WLS Node 2

EFS

My UWMadison

7

• Campus Mail system

• Nearly 90K accounts

• Daily Message Peak over 3M messages

• Service objective

– Never down

– Message delivery in less than 2 minutes

• Campus Mail system

• Nearly 90K accounts

• Daily Message Peak over 3M messages

• Service objective

– Never down

– Message delivery in less than 2 minutes

8

Hardware - EmailHardware - Email

144.92.197.152MAILST1

----happy

144.92.197.251 Priv

144.92.197.153MAILST2

---sleepy

144.92.197.252 Priv

144.92.197.154MAILST3

---sneezy

144.92.197.253 Priv

144.92.197.146MAILST4

---dancer

144.92.197.214 Priv

144.92,197.147MMP1

----prancer

144.92.197.221 Priv

144.92.197.148MMP2

----vixen

144.92.197.222 Priv

Wiscmail.wisc.eduLayer 4 Switch144.92.197.133

SPAM-2spitfire

128.104.1.226

SPAM-3zero

128.104.1.227

SPAM-1mustang

128.104.1.225144.92.197.213 Priv

dasher---

144.92.197.145/xxxSMTP1/SMTPAUTHx

144.92.197.219 Privdonner

---144.92.197.184/xxx

SMTP2/SMTPAUTHx

144.92.197.218 Privblitzen

---144.92.197.183/142

SMTP3/SMTPAUTH3

lists.services.wisc.edu

kodos144.92.104.60

kang144.92.104.61

Public Side Layer 4 Switchsmtp.wiscmail.wisc.edu - 144.92.197.138

smtpauth.wiscmail.wisc.edu - 144.92.197.134

admin.wiscmail.wisc.edu144.92.104.153

filters.wiscmail.wisc.edu144.92.197.154

AVGATE/oceanus144.92.104.17

LDAP & SMTP/AV

La

yer

4 S

witc

hsp

am

.se

rvic

es.

wis

c.e

du

12

8.1

04

.1.1

99

ES

S9

00

GB

ES

S1

50

GB

ES

S6

00

GB

144.92.197.216LDAP2aerate

144.92.197.157

144.92.197.215LDAP1liquefy

144.92.197.156

Pri

vate

Sid

e L

aye

r 4

Sw

itch

lda

p.d

oit.

wis

c.e

du

14

4.9

2.1

97

.19

6

admin.wiscmail.wisc.edu144.92.197.163

filters.wiscmail.wisc.edu144.92.197.165

on144.92.197.237

ES

SE

SSWiscMail Service Design

Spring 2004WiscMail ClientsPOP, IMAP, Web

WiscMail Plus AdminSpam/Filtering Admin

Stats Admin 144.92.197.155WISCMAIL

bashfulLDAP, SMTP/AV

iDA & Mail

ES

S1

50

GB

WiscNet ClientsPOP, IMAP, Web

Steve Kohlbeck12/16/04

ES

S4

50

GB

SM

TP

In

bo

un

d (

MX

)

SM

TP

In

bo

un

d

(MX

)

Internet

SPAM/Filtering Cluster SMTP/AV Cluster

Multiplexor Cluster

LDAP Cluster

Store Cluster

SM

TP

Ou

tbo

un

d Lists

Auth for Login

& Attributes

Au

th fo

r W

iscM

ail

SM

TP

& S

MT

P_

Au

th

Auth & AttributesTo Ldap

Internet

To

mh

ubWiscMail SMTP

1. SMTP Inbound (MX)2. SMTP Outbound 3. SMTP Inbound (POP/IMAP Client)4. SMTP Inbound (WebMail)5. SMTP Post Filtering Loop6. SMTP Post Lists AV ScanS

MT

P/S

MT

P_

AU

TH

fro

m

Ma

il C

lie

nts

Internet

Mhub/Lists/ClassLists

AV Scanning144.92.197.206 Privhermes

---144.92.197.190/141

SMTP4/SMTPAUTH2

144.92.197.205 Privheimdall

---144.92.197.159/140

SMTP5/SMTPAUTH1

SPAM-4corsair

144.92.197.166

SPAM-5hellcat

144.92.197.168

La

yer

4 S

witc

hn

ew

spa

m.s

erv

ice

s.w

isc.

ed

u1

44

.92

.19

7.1

33

Auth & Attributes for D

elivery to

Mailbox & Forwards

WiscMail ClientsPOP & IMAP

Stats

Stats

Stats

WebMail SMTP

WebMail SMTP

144.92.197.201LDAP3

ES

S

SPAM-6stuka

144.92.197.167

Private Network Mail Delivery

144.92.197.202LDAP4

ES

S

Departmental Mail & HAN

WM

+ A

ccou

nt M

gmt

Quota Stats

144.92.197.203 Privsunloan

---144.92.197.155

SMTP99

144.92.197.152MAILST1

----happy

144.92.197.251 Priv

144.92.197.153MAILST2

---sleepy

144.92.197.252 Priv

144.92.197.154MAILST3

---sneezy

144.92.197.253 Priv

144.92.197.146MAILST4

---dancer

144.92.197.214 Priv

144.92,197.147MMP1

----prancer

144.92.197.221 Priv

144.92.197.148MMP2

----vixen

144.92.197.222 Priv

Wiscmail.wisc.eduLayer 4 Switch144.92.197.133

SPAM-2spitfire

128.104.1.226

SPAM-3zero

128.104.1.227

SPAM-1mustang

128.104.1.225144.92.197.213 Priv

dasher---

144.92.197.145/xxxSMTP1/SMTPAUTHx

144.92.197.219 Privdonner

---144.92.197.184/xxx

SMTP2/SMTPAUTHx

144.92.197.218 Privblitzen

---144.92.197.183/142

SMTP3/SMTPAUTH3

lists.services.wisc.edu

kodos144.92.104.60

kang144.92.104.61

Public Side Layer 4 Switchsmtp.wiscmail.wisc.edu - 144.92.197.138

smtpauth.wiscmail.wisc.edu - 144.92.197.134

admin.wiscmail.wisc.edu144.92.104.153

filters.wiscmail.wisc.edu144.92.197.154

AVGATE/oceanus144.92.104.17

LDAP & SMTP/AV

La

yer

4 S

witc

hsp

am

.se

rvic

es.

wis

c.e

du

12

8.1

04

.1.1

99

ES

S9

00

GB

ES

S1

50

GB

ES

S6

00

GB

144.92.197.216LDAP2aerate

144.92.197.157

144.92.197.215LDAP1liquefy

144.92.197.156

Pri

vate

Sid

e L

aye

r 4

Sw

itch

lda

p.d

oit.

wis

c.e

du

14

4.9

2.1

97

.19

6

admin.wiscmail.wisc.edu144.92.197.163

filters.wiscmail.wisc.edu144.92.197.165

on144.92.197.237

ES

SE

SSWiscMail Service Design

Spring 2004WiscMail ClientsPOP, IMAP, Web

WiscMail Plus AdminSpam/Filtering Admin

Stats Admin 144.92.197.155WISCMAIL

bashfulLDAP, SMTP/AV

iDA & Mail

ES

S1

50

GB

WiscNet ClientsPOP, IMAP, Web

Steve Kohlbeck12/16/04

ES

S4

50

GB

SM

TP

In

bo

un

d (

MX

)

SM

TP

In

bo

un

d

(MX

)

Internet

SPAM/Filtering Cluster SMTP/AV Cluster

Multiplexor Cluster

LDAP Cluster

Store Cluster

SM

TP

Ou

tbo

un

d Lists

Auth for Login

& Attributes

Au

th fo

r W

iscM

ail

SM

TP

& S

MT

P_

Au

th

Auth & AttributesTo Ldap

Internet

To

mh

ubWiscMail SMTP

1. SMTP Inbound (MX)2. SMTP Outbound 3. SMTP Inbound (POP/IMAP Client)4. SMTP Inbound (WebMail)5. SMTP Post Filtering Loop6. SMTP Post Lists AV ScanS

MT

P/S

MT

P_

AU

TH

fro

m

Ma

il C

lie

nts

Internet

Mhub/Lists/ClassLists

AV Scanning144.92.197.206 Privhermes

---144.92.197.190/141

SMTP4/SMTPAUTH2

144.92.197.205 Privheimdall

---144.92.197.159/140

SMTP5/SMTPAUTH1

SPAM-4corsair

144.92.197.166

SPAM-5hellcat

144.92.197.168

La

yer

4 S

witc

hn

ew

spa

m.s

erv

ice

s.w

isc.

ed

u1

44

.92

.19

7.1

33

Auth & Attributes for D

elivery to

Mailbox & Forwards

WiscMail ClientsPOP & IMAP

Stats

Stats

Stats

WebMail SMTP

WebMail SMTP

144.92.197.201LDAP3

ES

S

SPAM-6stuka

144.92.197.167

Private Network Mail Delivery

144.92.197.202LDAP4

ES

S

Departmental Mail & HAN

WM

+ A

ccou

nt M

gmt

Quota Stats

144.92.197.203 Privsunloan

---144.92.197.155

SMTP99

9

Basics – Redundant HardwareBasics – Redundant Hardware

• Clustered Server Environment

• Spares (Hot/Warm/Cold)

• Automated Load Balancing

• Automated fail over

• Clustered Server Environment

• Spares (Hot/Warm/Cold)

• Automated Load Balancing

• Automated fail over

10

Test EnvironmentsTest Environments

• Test Cycle

– Test

– Development

– QA

– Production

• QA (also called Integrated Test Environment)

• Test Cycle

– Test

– Development

– QA

– Production

• QA (also called Integrated Test Environment)

11

Change ManagementChange Management

• Use of Change Information System

– Tracking

– Notification

• Use of Code Migration Request process

– Files promoted

– Configuration steps

– Test process

– Backout plans

• Use of Change Information System

– Tracking

– Notification

• Use of Code Migration Request process

– Files promoted

– Configuration steps

– Test process

– Backout plans

12

Version ControlVersion Control

• Use CVS

– http://www.gnu.org/software/cvs/

– Develop in private or shared environments

– Code is published into repository

– Code is then copied to environment (dev, test, qa, and prod)

13

Testing ProcessTesting Process

• Unit testing

• Integrated Testing (QA)

• Log analysis from testing

• Written test plans

• Load Tests

• Testing tools (Empirix)

• System Monitoring (Wiley Introscope)

• Unit testing

• Integrated Testing (QA)

• Log analysis from testing

• Written test plans

• Load Tests

• Testing tools (Empirix)

• System Monitoring (Wiley Introscope)

14

CollaborationCollaboration

• Wiki

• Document Repository/Sharing

• Email Lists

• IM

• E-mail

• Wiki

• Document Repository/Sharing

• Email Lists

• IM

• E-mail

15

Service ManagementService Management

• Major direction at UW to improve reliability

• CIO asking for 5 9s on key systems

• Consulting assistance

• Manage the service not the servers

• Adopt customer’s perspective

• Major direction at UW to improve reliability

• CIO asking for 5 9s on key systems

• Consulting assistance

• Manage the service not the servers

• Adopt customer’s perspective

16

Service ManagementService Management

• Models

– Information Technology Library

– Based on British Telecom agency processes

– Service Support processes

• Incident management

• Problem management

• Change management

• Release management

• Configuration management

• Models

– Information Technology Library

– Based on British Telecom agency processes

– Service Support processes

• Incident management

• Problem management

• Change management

• Release management

• Configuration management

17

Service ManagementService Management

• Models

– Microsoft Operations Framework

• Combines ITIL processes with recommendations for technical processes

• http://www.microsoft.com/mof

18

Next stepsNext steps

• Define service level objectives for key services

• Determine how to measure service reliability

• Engage Data Center staff

• Define service level objectives for key services

• Determine how to measure service reliability

• Engage Data Center staff

19

ObservationsObservations

• Infrastructure complexity

– Teams of specialists

• Funding for environments

• Staffing

• Process costs

• Infrastructure complexity

– Teams of specialists

• Funding for environments

• Staffing

• Process costs

20

QuestionsQuestions

Roger Hanson Internet Infrastructure Applications

[email protected]

Roger Hanson Internet Infrastructure Applications

[email protected]