By: Swetha Kendyala ske009@latech

Preview:

DESCRIPTION

Software Rejuvenation. By: Swetha Kendyala ske009@latech.edu. Introduction. - PowerPoint PPT Presentation

Citation preview

By:Swetha Kendyala

ske009@latech.edu

Software Rejuvenation

• When software applications execute continuously for long periods of time, the processes corresponding to the software in execution age or slowly degrades with respect to the effective usage of their system resources. Process aging will affect the performance and eventually cause the application to fail.

Introduction

What is Software Rejuvenation?

• The act of gracefully terminating an application and immediately restarting

• Goal: Prevents unexpected error termination by terminating the program before it suffers an error

Intended Use• Software rejuvenation is primarily indicated for

servers where applications are intended to run indefinitely without failure

Why do applications fail?• Process Aging: gradual degradation of application

performance, over time, that may lead to premature program termination

Causes• Memory leaks• Unreleased file locks• File descriptor leaking• Etc.

Software Rejuvenation• Periodic preemptive rollback of continuously

running applications to prevent failures in the future

Transition Model For SW withoutRejuvenation

Transition Model For SW with Rejuvenation

Downtime and cost without rejuvenation

• Pf =

• Downtimew/o r(L) = Pf * L

• Costw/o r(L) = Pf * L * cf

2111

1

rrr

Downtime and cost with rejuvenation

• Pp =

• Pf =

• Pr =

• P0 =

• Downtimew r(L) = (Pf + Pr) * L

• Costw r(L) = (Pf * cf + Pr * cr) * L

24

34

111

rr

rr

r

Pr P*1

Prr P*3

4

Prr P*2

4

Thresholds - Goal

Goal is to stay in S0 for the longest amount of time

Thresholds cont.• To see how r4 affects downtime and cost, lets

differentiate the previous equations with respect to r4

Thresholds cont.• Downtime:

• If r3 is dominant, the derivative becomes negative and downtime decreases when r4 increases thus rejuvenate at state Sp

• If r3 is small, slow recovery from SR, downtime increases as r4 increases

]3)1(1[ 2 rr r

Thresholds cont. • Cost =

• When cr is dominant, cost increases as r4 increases, implies no rejuvenation benefit

• When cr is small, cost decreases as r4 increases

][ 21)21()32(rrrr

rrr fcc

Thresholds cont. • Overall, costs need to be calculated for individual

programs

• For best results: perform rejuvenation at state SP (r4 = ∞) or don’t perform rejuvenation (r4 = 0)

Example 1• MTBF = 12 months; = 1/(12*30*24)• Takes 30 min to recover from unexpected error; r1 = 2• Base Longevity is seven days; r2 =1/(7*24)• If rejuvenation is performed, mean repair time after

rejuvenation is 20 minutes; r3 = 3• Ave. Cost of unscheduled downtime due to failure, cf,

is $1,000/hour• Ave. Cost of scheduled downtime during rejuvenation,

cr, is $40/hour

Software Rejuvenation

No rejuvenation

(r4 = 0)

Once Every three Week

r4 = 1/(2*7*24)

Once Every Two Weeks

r4 =1/(1*7*24)

Hours of Downtime 0.490 5.965 8.727

Cost of Downtime

490 554 586

Software Rejuvenation

No rejuvenation

(r4 = 0)

Once Every month

r4 = 1/(20*24)

Once Every Two Weeks

r4 =1/(4*24)

Hours of Downtime 7.19 6.83 6.36

Cost of Downtime

3.6k 2.48k 1.11k

Example 2• MTBF = 3 months; = 1/(3*30*24)• Takes 30 min to recover from unexpected error; r1 = 2• Base Longevity is three days; r2 =1/(3*24)• If rejuvenation is performed, mean repair time after

rejuvenation is 10 minutes; r3 = 6• Ave. Cost of unscheduled downtime due to failure, cf,

is $5,000/hour• Ave. Cost of scheduled downtime during rejuvenation,

cr, is $5/hour

Software Rejuvenation

No rejuvenation

(r4 = 0)

Once Every three Week

r4 = 1/(11*24)

Once Every Two Weeks

r4 =1/(4*24)

Hours of Downtime 1.94 5.70 9.52

Cost of Downtime

9675.25 7672.43 5643.31

Example 3• MTBF = 3 months; = 1/(3*30*24)• Takes 2 min to recover from unexpected error; r1 =

0.5• Base Longevity is 10 days; r2 =1/(10*24)• If rejuvenation is performed, mean repair time after

rejuvenation is 10 minutes; r3 = 6• Ave. Cost of unscheduled downtime due to failure,

cf, is $5,000/hour• Ave. Cost of scheduled downtime during

rejuvenation, cr, is $5/hour

Implementation• Implementation of Software Rejuvenation is fairly

easy. • Cron Jobs can be set to restart the application at

various intervals• watchd can be used to detect if applications have

failed and restart them

Real World Examples• BILL-DATS II Collector

– Billing collection system used by AT&T long-distance network

– Set to rejuvenate after 1 week– Hasn’t prematurely failed after several year

• “S” Scientific Speech synthesis system• Long running scientific application• Used to process several hundred sentences over the

course of many days• Found to fail after 100 sentences• Rejuvenates after 15

Conclusions:• Decision to use Software Rejuvenation depends on

predetermined failure rates and associated costs.

• r4 = 0 , No rejuvenation

• r4 = ∞ , Rejuvenation

Questions???

Recommended