12
CSE 190: Internet E- Commerce Lecture 14: Operations

CSE 190: Internet E-Commerce

Embed Size (px)

DESCRIPTION

CSE 190: Internet E-Commerce. Lecture 14: Operations. Operations. Everything it takes to keep a web site up and running, 24x7 Deployment Process Monitoring (SNMP) Build system Link rot Maintenance window Load testing Browser compliance Log rotation Database backups Disk failure - PowerPoint PPT Presentation

Citation preview

Page 1: CSE 190: Internet E-Commerce

CSE 190: Internet E-Commerce

Lecture 14: Operations

Page 2: CSE 190: Internet E-Commerce

Operations• Everything it takes to keep a web site up and running, 24x7

– Deployment Process– Monitoring (SNMP)– Build system– Link rot– Maintenance window– Load testing– Browser compliance– Log rotation– Database backups– Disk failure– Router failure– Robots– Staffing– Data centers

• Expense of running a high availability site is comparable to running a physical store front

Page 3: CSE 190: Internet E-Commerce

Deployment Process

• Proceeds in three phases– Development

• Within corporation, not accessible outside

– Stage• Within internet environment• UAT run here• Only operations staff may access

– Live• Accessible to outside world

Page 4: CSE 190: Internet E-Commerce

Monitoring

• SNMP (Simple Network Management Protocol)– Used to monitor both hardware, software– Provides: Counters, Values, Triggers, Statistics– Remote control of services– Information stored in MIB (Management Information

Base)– RMON sometimes used as alternative to SNMPv2

• Software– HP OpenView

Page 5: CSE 190: Internet E-Commerce

Maintenance Window• Installation

– Standard: J2EE standard web service descriptor (XML file with tarball of files)

– InstallShield– Custom installation scripts

• Upgrades– Defined time on Friday or weekend to upgrade site, posted on web site– Process:

• Front page linked to ‘Site down’• Load balancer redirected if appropriate• Application stops accepting new clients• (Pause) Application terminates all active sessions• Application upgraded• Sanity checks performed• Servers rebooted• Load balancer restored

Page 6: CSE 190: Internet E-Commerce

Link Rot

• Link rot: the continual process by which links become invalid over time

• Tracked with custom tools

• Best practice: Pages have permanent URLs

• Referral field:– Tracking this in logs shows who’s linking to

what URL on your site

Page 7: CSE 190: Internet E-Commerce

Load Testing• Network load (60% bandwidth max)

– Average page size (~20-30k)• CPU load: Occurs at least three levels

– HTTP level– Application level– DB query level– Metrics: maximum number of simultaneous users, latency vs. users

• Memory usage (256 M – 1 G per machine)• Disk I/O load

– 1 Gb per machine typical• Tools

– Mercury Interactive: WinRunner– Segue: SilkTest– Rational: SiteLoad– Microsoft: WCAT

Page 8: CSE 190: Internet E-Commerce

Browser Compatibility

• Cost of testing proportional to the number of platforms you’re compatible with

• The same product isn’t the same on different operating systems– E.g. IE4.5 isn’t the same on Mac vs. Windows

• Incompatible DOMs between MS, Netscape, Mozilla

• Browser archive– http://browsers.evolt.org/

Page 9: CSE 190: Internet E-Commerce

Robots• Robots: Automatically traverse web pages to retrieve documents, link

structure, data• Used for:

– Indexing– HTML validation– Link validation– Mirroring

• Problems:– Too much rapid access from single IP– May be indexing dynamic, obsolete data

• Robot exclusion file:# /robots.txt file for mysite.com

User-agent: webcrawlerDisallow:

User-agent: lycra Disallow: /

User-agent: *• Disallow: /jsp

Disallow: /logs

Page 10: CSE 190: Internet E-Commerce

Failure Models• Mean Time To Failure (MTTF) = average amount of time the system is up• Mean Time between Failures (MTBF) = average amount of time between failures• Mean Time To Repair (MTTR) = average amount of time the system is down after it

fails - active repair time (diagnostics and repair)• Mean Down Time (MDT) - average amount of time system is down after it fails - active

repair time + preventive maintenance + logistics time (time spent waiting for personnel, etc)

• Intrinsic availability: Mean Time To Failure (MTTF) Mean Time To Failure (MTTF) + MTTR

• Operational availability: Mean Time Between Failure (MTBF) Mean Time Between Failure (MTBF) + MDT

Burn in Useful Life Wear out Integration Useful Life Obsolete & test

Hardware Failure Rate Software Failure Rate

Page 11: CSE 190: Internet E-Commerce

When things go wrong

• Network operations– Software recovers from common failures– Network staff paged by email if server not

available (via SNMP)– Usually rotating assignment

• Application developers may be called in if restarting servers, etc. fails completely. Only if it doesn’t look like a network problem.

Page 12: CSE 190: Internet E-Commerce

Data Centers

• Data centers: Host your machines in their own premises– Also called “colocation”

• Features– Security: controlled entrance, exit– Weather: maintained temperature, humidity– Power: Backup power, available circuits– Bandwidth: OC-192 connections– Monitoring: 24/7 staff, may reboot misbehaving machines

• Machines typically arranged in “cages”; 1u, 2u machines• Server blades• Examples

– NTT / Verio– Exodus / Global Crossing