17
Cognizant Technology Solutions -1- Using Queuing Theory to Estimate the FTE in Production Support Projects Mahua Seth Arunabha Sengupta Cognizant Technology Solutions, Kolkata Abstract: Mainstay of the software industry still happens to be Maintenance projects. Production Support and Bug Fix form a major chunk of the work type in these Maintenance projects. While there are production support projects of varied types and complexity, a large number of them consist of addressing customer calls and applying immediate fixes to the reported problems. In stable, steady projects, problems reported tend to be of a consistent frequency and complexity over a large period of time. Yet, one faces problems in balancing the conflicting needs of defined Service Level Agreements, resource utilization, budget and ensuring proper system stability. This is getting more and more relevant today as more projects move towards fixed- bid business model. This paper demonstrates the use of Queuing theory in estimating the number of resources necessary to form the support team so that the project is optimized in terms of resource cost, service and waiting time for incoming problem reports, keeping the unaddressed number of problem reports in check, and minimize the idle time of resources.

Using Queuing Theory to Estimate the FTE in Production ...minisites.qaiglobalservices.com/PML07/Presentation_Plenary/Mahua... · Cognizant Technology Solutions -1- Using Queuing Theory

Embed Size (px)

Citation preview

Cognizant Technology Solutions

-1-

Using Queuing Theory to Estimate the FTE in

Production Support Projects

Mahua Seth

Arunabha Sengupta Cognizant Technology Solutions, Kolkata

Abstract:

Mainstay of the software industry still happens to be Maintenance projects. Production

Support and Bug Fix form a major chunk of the work type in these Maintenance projects.

While there are production support projects of varied types and complexity, a large

number of them consist of addressing customer calls and applying immediate fixes to the

reported problems. In stable, steady projects, problems reported tend to be of a consistent

frequency and complexity over a large period of time. Yet, one faces problems in

balancing the conflicting needs of defined Service Level Agreements, resource

utilization, budget and ensuring proper system stability. This is getting more and more

relevant today as more projects move towards fixed- bid business model.

This paper demonstrates the use of Queuing theory in estimating the number of

resources necessary to form the support team so that the project is optimized in terms of

resource cost, service and waiting time for incoming problem reports, keeping the

unaddressed number of problem reports in check, and minimize the idle time of

resources.

Cognizant Technology Solutions

-2-

1.0 Introduction

Outsourcing business has been growing exponentially and has evolved to include more

diverse and complex projects these days. However the bulk of the mainstay revenue is

still dominated by maintenance projects where support is the primary service. These

projects traditionally used to be on a time and material pricing model. The shift in recent

times is towards total ownership outsourcing. More often than not, the Fixed Bid model is

the preferred pricing model.

Given this change in the industry scenario, it becomes difficult on part of the

companies to estimate the person power requirements especially in a fixed bid scenario

such that the necessary margins are maintained. This estimate today needs to be more

structured, accurate and scientific.

Problem Statement

A large number of maintenance projects in today’s software industry actually

comprises of routine production support jobs. It involves predictable fixes of bugs which

are common in nature and belong to a known range of problems. These bugs, over a

steady period, attain a more or less uniform and predictable rate of occurrence.

Cognizant Technology Solutions

-3-

A common problem faced by most of these types of projects is to optimally

estimate for the number of Full Time Employees (FTE) to be allocated. This problem is

due to the lack of availability of too many robust estimation methodologies for the

maintenance type of project. Function Points, Use Case Points and Feature Points are

techniques more suited to estimate size and effort for projects of development type

The constraints for this allocation can be manifold as per the project requirements.

The typical constraints are Service Level Management, Cost / Resource Utilization, and

Profitability Optimization.

From the service point of view, it can be transformed to the problem of server

estimation, keeping in mind the business needs of top-line and bottom-line for the

project. This paper looks at applying the principles of Queuing Theory for Optimal FTE

allocation.

2.0 Applying Queuing Theory

In the steady state phase of a maintenance project, bugs are raised by the customer

and reported to the project team at a given rate. There are a number of engineers

allocated to the project who respond to the reported bugs . On the arrival of a bug, if one

or more FTE is free, the bug is handled at once. If all the FTE’s are busy, the bug waits

its turn till one of the FTEs become free.

Cognizant Technology Solutions

-4-

During the steady state of the service, when the bugs are of a regular and

predictable nature, it can be assumed that over a long period the solution time for the

bugs become more or less constant.

The problem can be further elaborated. to estimate for the number of FTEs to be

assigned in order to optimize the project in terms of resource loading/ utilization,

resource cost, time to respond to the bugs, waiting time for a bug etc.

This can be formulated as a single queue multi server Queuing Theory problem

where the arriving bugs can be considered as customers in a queue and the engineers

allocated as FTEs can be considered to be the servers.

Queuing Theory Formulae

For a single queue multi server system we have the following:

If

λ = rate of arrival of customers,

µ = 1/(Time to service a customer) and

ρ = λ/µ

Solutions exist only for the cases where the number of servers n is such that

ρ/n < 1, otherwise number of customers in queue become out of bound.

For such cases, we have the following formulae.

Cognizant Technology Solutions

-5-

p0 = { 1 + ρ/ 1! + ρ2 /2! + …..+ ρn /n! + (ρn+1 / (n*n!) ) / (1- ρ/n) }-1

pk = (ρk /k! )* p0 ( 1 ≤ k ≤ n)

pn+r = (ρn+r / nr n! )* p0 ( 1 ≤ r)

where

p0 is the probability that all the servers are idle,

pk is the probability that k servers are busy,

pn+r is the probability that all servers are busy and r customers are in queue.

Given these, we have the following equations for the system characteristics:

Average nos. of customers in queue : _ r = (ρ/n) pn (1- ρ/n)-2

Average nos. of customers in system: _ _ z = r + ρ

Average waiting time in system : _ _ tw = z / λ

Average waiting time in queue _ _ tq = r / λ

Translating this series of equations in the production support scenario, we have:

λ = Arrival rate of bugs (per hour)

µ = 1/(time taken to solve a bug) = Bugs solved per hour by one FTE

p0 is the probability that all the FTEs are idle,

Cognizant Technology Solutions

-6-

pk is the probability that k FTEs are busy,

pn+r is the probability that all FTEs are busy and r bugs are in queue.

Given these, we have the following equations:

_ r = (ρ/n) pn (1- ρ/n)-2 = Average nos. of bugs in queue _ _ z = r + ρ = Average nos. of bugs in system (in queue and in the process of being solved) _ _ tw = z / λ = Average waiting time of bugs in system (in queue and in the process of being solved) _ _ tq = r / λ = Average waiting time of bugs in queue

3.0 Solving the FTE Allocation Problem

The Queuing Theory Principles can be used to optimally estimate the number of

FTEs required for a particular Production Support project based on the arrival rate of

bugs and solution time, depending upon the various typical project requirements:

a) Service Level Agreement (for response)

b) Service Level Agreement (for solution)

c) Permissible number of bugs in queue (in the process of solution and waiting for

response)

d) Optimizing for Project cost (FTE cost)

e) Minimizing idle time of FTEs

Cognizant Technology Solutions

-7-

3.1 Case Study 1

System Behavior

Bug Arrival Rate : 20 per day

Average Solution Rate : 3 hrs for each bug

Project Requirements

Service Level Agreement for the Time

to respond to bug

<=30 minutes

Service Level Agreement for Solution < = 4 hours 30 minutes

In this problem, the different system characteristics can be observed by

varying the number of FTEs to arrive at the optimal solution for the Service Level

Agreements.

Cognizant Technology Solutions

-8-

So, from this table, it is observed:

To ensure SLA for solution <=4.5 hours, that is average waiting time for a

bug in system to be less than 4.5 hours, the minimum number of FTE needed is 9.

To ensure SLA for response <=0.5 hour, that is average waiting time for a

bug in queue to be less than 0.5 hour, the minimum number of FTE needed is 10.

Hence the optimal number of FTE to meet the Service level agreements is 10.

Cognizant Technology Solutions

-9-

3.2 Case Study 2

System Behavior

Bug Arrival Rate : 8.75 per day

Average Solution Rate : 0.9 hr for each bug

Project Requirements

Service Level Agreement Time to

solution

<=1.2 hours

High resource utilization All resources cannot be idle for more

than 37% of the time.

In this problem, the different system characteristics can be observed by

varying the number of FTEs to arrive at the optimal solution for the Service Level

Agreement and Resource Utilization.

Cognizant Technology Solutions

-10-

So, from this table, it is observed:

To ensure SLA for solution ≤1.2 hours, that is average waiting time for a

bug in system to be less than 1.2 hours, the number of FTEs need to be greater

than 2.

However, to meet the utilization requirement, the number of FTEs should

be limited to 3.

Hence the solution of 3 FTE suits both constraints.

Cognizant Technology Solutions

-11-

3.3 Case Study 3

System Behavior

Bug Arrival Rate : 15 per day

Average Solution Rate : 1 hr for each bug

Project Requirements

Bugs should be solved ASAP Minimize waiting time for solution

Resource utilization needs to be high All FTEs cannot be idle for more than

15% of the time

In this problem, the different system characteristics and probability of all

resources being free, p0, can be observed by varying the number of FTEs to arrive

at the optimal solution for the waiting time and resource utilization

Cognizant Technology Solutions

-12-

So, from this table, it is observed:

The average waiting time in the system is 8.25 hours for a bug with 2

FTEs, 1.34 hours with 3 FTEs, 1.0681 hours with 4 FTEs and progressively lower

values till it converges to 1 hour (solution time) for number of FTEs > 8.

However, from 5 FTEs onwards, the percentage time that all the FTEs are

free becomes more than 15%.

So, in this case, the optimal solution is arrived at when number of FTEs is

equal to 4.

Cognizant Technology Solutions

-13-

3.4 Case Study 4

System Behavior

Bug Arrival Rate : 12 per day

Average Solution Rate : ½ hr for each bug

Project Requirements

Minimize Resource Cost Minimize nos. of FTEs

There should not be more than 2 bugs

unattended at any given point of time

Number of bugs in queue ≤3

In this problem, the different system characteristics and probability of

bugs in queue, pn+r , can be observed by varying the number of FTEs to arrive at

the optimal solution for the waiting time and resource utilization

Cognizant Technology Solutions

-14-

So, from this table, it is observed:

The probability of there being 3 bugs in the queue converges to 0 when the

number of FTEs is 4 or more.

Hence, in this case, the optimal solution is 4 FTEs.

Cognizant Technology Solutions

-15-

3.5 Case Study 5

System Behavior

Bug Arrival Rate : 15 per day

Average Solution Rate : 2 hr for each bug

Project Requirements

Client agrees to pay for 4 FTEs

The project team needs to agree to an

SLA for response and solution.

Desirable SLA for client < 4 hours

In this problem, the different system characteristics can be observed by

varying the number of FTEs to arrive at the optimal solution for the Service Level

Agreement and Resource Utilization.

Cognizant Technology Solutions

-16-

So, from this table, it is observed:

For 4 FTEs, the average waiting time for response to bugs is 6.9 hours and

the average time to solve a bug is 8.9 hours.

Hence, SLA can be ideally defined as 7 hours for response and 9 hours for

solution.

However, it can be suggested to the client that increasing the number of

FTEs to 5 will bring down the SLA for response to 1 hour and SLA for solution to

3 hours , which is desirable to the client.

Cognizant Technology Solutions

-17-

4.0 Conclusion

The suitability of applying Queuing theory to solve FTE allocation problems for various

project constraints is evident from the above case studies. However, the case studies

cover just a small number of scenarios and there can be numerous other decisions that

can be addressed by judicious use of the concepts. Hence, Queuing theory can be

appropriately used to solve optimization problems across all Production Support Projects.

References:

1. Gnedenko, Boris V., Theory of Probability, CRC Publishers, 1998

2. Ocharov, E and Wentel, L, Applied Problems in Probability Theory, Mir

Publishers, 1986