31
University of Washington Computing & Communications Networking Update Terry Gray Director, Networks & Distributed Computing University of Washington UW Medicine IT Steering Committee 16 January 2004 20 February 2004

ppt

Embed Size (px)

Citation preview

Page 1: ppt

University of Washington Computing & Communications

Networking Update

Terry GrayDirector, Networks & Distributed Computing

University of Washington

UW Medicine IT Steering Committee16 January 2004

20 February 2004

Page 2: ppt

University of Washington Computing & Communications

Outline

• In our last episode…– Context– Expanded Partnership– Recent Problems

• Today– Systemic Problems and Progress– Network Security Chronology– Design Issues

Page 3: ppt

University of Washington Computing & Communications

Context: A Perfect Storm• Increased dependency on network apps• Decreased tolerance for outages• Decades of deferred maintenance...• Inadequate infrastructure investment• Some old/unfortunate design decisions• Some extraordinarily fragile applications• Fragmented host management• Increasingly hostile security environment• Increasing legal/regulatory liability• Importance of research/clinical leverage

Page 4: ppt

University of Washington Computing & Communications

Key Elements of the Partnership

• Changed: C&C now responsible for...• In-building network implementation and

operational support for med ctrs, clinics• Med center network design “for real”

• Not Changed: C&C still responsible for... • Network backbone, routers• Regional and Internet connectivity• SoM and Health Sciences networking

Page 5: ppt

University of Washington Computing & Communications

Why the Partnership Makes Sense• Consistency, interoperability, manageability• Leverage C&C networking expertise• Clinical/research hi-performance network needs• 24x7 Network Operations Center (NOC)• Advanced network management tools• Avoid design/build organizational conflicts• Beyond the network...

hope to share distributed system architecture and network computing expertise

Page 6: ppt

University of Washington Computing & Communications

Recent Problems

• Oct 29: Partial router failure reveals escalation procedure problems

• Oct 30: Security breach triggers connectivity and server problems

• Nov 12: 13 minute power outage triggers extended server outage

• Dec 12: Router upgrade uncovers wiring error, which triggers multicast storm

(None of these were related to the network transition, save perhaps timing of #4)

Page 7: ppt

University of Washington Computing & Communications

System Elements

• Environmentals (Power, A/C, Physical Security)• Network• Client Workstations• Servers• Applications• Personnel, Procedures, Policy, and Architecture

Failures at one level can trigger problems at another level; need Total System perspective

Page 8: ppt

University of Washington Computing & Communications

Reasonable Questions

• What’s up with C&C’s alarm system vendor?• If power was out for only 14 minutes, why

was service out for multiple hours?• What can we say about an app so fragile that

a net interruption of a few seconds requires a server reboot?

• What can we say about thin clients built on top of thick (WinXP) operating systems?

• What can we say about a network where one wiring fault can disable most of the net?

Page 9: ppt

University of Washington Computing & Communications

Systemic Problems and Progress

Page 10: ppt

University of Washington Computing & Communications

Systemic Network Problems(NB: these pre-date Tom et al)

• Old infrastructure (e.g cat 3 wire)• Non-supportable technologies (e.g. FDDI)• Non-supportable (non-geographic) topology• Expensive shortcuts (e.g. cat5 mis-terminated)• Security based on individual IP addresses• Subnets with clients and critical servers• Documentation deficiency

• Contact database• Device location database• Critical device registry

Page 11: ppt

University of Washington Computing & Communications

Systemic General Problems

• Ever-increasing system complexity, dependencies• Departmental autonomy • Un-controlled hosts• Un-reliable power and A/C in equipment rooms• No net-oriented application procurement standards

• Are HA and DRBR expectations realistic?• Are backup plans workable?

Page 12: ppt

University of Washington Computing & Communications

Some Numbers

UW Total(incl UWMedicine)

HealthSciences(incl SoM)

MedicalCenters

Subnets 1022 52 145

Devices 70,000 >8,000 10,000

Page 13: ppt

University of Washington Computing & Communications

Network Device Growth

Note: Most dips reflect lower summer use; last one is a measurement anomaly

Page 14: ppt

University of Washington Computing & Communications

Network Traffic Growth (linear)

Page 15: ppt

University of Washington Computing & Communications

Network Traffic Growth (log)

Page 16: ppt

University of Washington Computing & Communications

Near-term Progress and Plans• Agreement on standard maintenance window• Created “Top 10” list --creeping to Top 20 :)• Static addressing work-around (success!)• FDDI, VLAN elimination• Subnet splits/upgrades (1500 computers)• Equipment upgrades• Router consolidation, dedicated subnets,

separate med center backbone• Equipment, outlet location database updates• Initial wireless deployment

Page 17: ppt

University of Washington Computing & Communications

Design Review and Cost Estimates

• Biggest cost: physical infrastructure & wireplant upgrades

• NetVersant engaged for cost estimation project• Cisco engaged for network architecture review• We recommend similar reliability/design

assessment for servers, apps & procedures

Page 18: ppt

University of Washington Computing & Communications

Design Issues

Page 19: ppt

University of Washington Computing & Communications

Design Tradeoffs

• Networks = Connectivity; Security = Isolation• Fault Zone size vs. Economy/Simplicity• Reliability vs. Complexity• Prevention vs. (Fast) Remediation• Security vs. Supportability vs. Functionality

Differences in NetSec approaches relate to:• Balancing priorities (security vs. ops vs. function)• Local technical and institutional feasibility

Page 20: ppt

University of Washington Computing & Communications

Tradeoff Examples

• Defense-in-depth conjecture (for N layers)– Security: MTTE (exploit) ≈ N**2

– Functionality: MTTI (innovation) ≈ N**2

– Supportability: MTTR (repair) ≈ N**2

• Perimeter Protection Paradox (for D devices)– Firewall value ≈ D– Firewall effectiveness ≈ 1 / D

• Border blocking criteria– Threat can’t reasonably be addressed at edge– Won’t harm network (performance, stateless block)– Widespread consensus to do it

• Security by IP address

Page 21: ppt

University of Washington Computing & Communications

Network Security Credo

• Focus first on the edge(Perimeter Protection Paradox)

• Add defense-in-depth as needed

• Keep it simple (e.g. Network Utility Model)

• But not too simple (e.g. offer some policy choice)

• Avoid – one-size-fits-all policies– cost-shifting from “guilty” to “innocent”– confusing users and techs (“broken by design”)

Page 22: ppt

University of Washington Computing & Communications

Preserving the Net Utility Model

• What is it?

• Why important?

• Incompatible with perimeter security?

• Too late to save?• NUM-preserving perimeter defense

– Logical Firewalls– Project 172

• Foiled by static IP addressing…– Requires all hosts be reconfigured

Page 23: ppt

University of Washington Computing & Communications

Lines of Defense

• Network isolation for critical services.

• Host integrity. (Make the OS is net-safe.)

• Host perimeter. (Add host firewalling)

• Server sanctuary perimeter.

• Network perimeter defense.

• Real-time attack detection and containment.

Page 24: ppt

University of Washington Computing & Communications

Network Security Chronology• 1990: Five anti-interoperable networks• 1994: Nebula shows network utility model viable• 1998: Defined border blocking policy• 2000: Published Network Security Credo• 2000: Added source address spoof filters• 2000: Proposed med ctr network zone• 2000: Proposed server sanctuaries• 2001: Ban clear-text passwords on C&C systems• 2001: Proposed pervasive host firewalls• 2001: Developed logical firewall solution• 2002: Developed Project-172 solution• 2003: Slammer, Blaster… death of the Internet• 2003: Developed flex-net architecture

Page 25: ppt

University of Washington Computing & Communications

Next-Gen Network Architecture• Parallel networks; more redundancy• Supportable (geographic) topology• Med center subnets = separate backbone

zone• Perimeter, sanctuary, and end-point defense• Higher performance• High-availability strategies

• Workstations spread across independent nets• Redundant routers• Dual-homed servers

Page 26: ppt

University of Washington Computing & Communications

Success Metrics

• Tom’s• Nobody gets hurt• Nobody goes to jail

• Terry’s• “Works fine, lasts a long time”• Low ROI (Risk Of Interruption)

• Steve’s• Four Nines or bust!

Page 27: ppt

University of Washington Computing & Communications

Success Metrics II

• We all want:• High MTTF, Performance and Function• Low MTTR and support cost

• The art is to balance those conflicting goals• we are jugglers and technology actuaries

Page 28: ppt

University of Washington Computing & Communications

Success Metrics III

• How many nines?• Problem one: what to measure?

• How do you reduce behavior of a complex net to a single number?

• Difficult for either uptime or utilization metrics

• Problem two: data networks are not like phone or power services…• Imagine if phones could assume anyone’s number• Or place a million calls per second!

Page 29: ppt

University of Washington Computing & Communications

Concerns, Future Challenges• Mitigating impact of closed networking:

• Needs of the many vs. needs of the few• Pressure to make network topology match

administrative boundaries• Complex access lists• False sense of security• Increased MTTR

• Next-generation threats: firewalls won’t help• Security vs. High-Performance• Wireless• Balancing innovation, operations, & security

Page 30: ppt

University of Washington Computing & Communications

Lessons• Five 9s is hard (unless we only attach phones?)• Even host firewalls don’t guarantee safety • Perimeter firewalls may increase user confusion, MTTR• Nebula existence proof: security in an open network• Even so… defense-in-depth is a Good Thing• It only takes one compromise inside to defeat a firewall• Controlling net devices is hard --hublets, wireless• The cost of static IP configuration is very high• Net reliability & host security are inextricably linked• Never underestimate non-technical barriers to progress

Page 31: ppt

University of Washington Computing & Communications

Questions? Comments?