11
White Paper The costs of poor network or application performance include both lost productivity and the time spent fixing problems that could otherwise be spent on proactive work. Quantifying that cost is much harder, but today more than ever before IT teams need to make a clear business case for the improvements needed to optimise performance of their enterprise network and applications and hence of the business as a whole. Making a business case for network and application performance improvement COUNTING THE COST OF NETWORK EFFICIENCY

White Paper - Cloudinary

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: White Paper - Cloudinary

White Paper

The costs of poor network or application

performance include both lost productivity and

the time spent fixing problems that could

otherwise be spent on proactive work.

Quantifying that cost is much harder, but

today more than ever before IT teams

need to make a clear business case for the

improvements needed to optimise performance

of their enterprise network and applications and

hence of the business as a whole.

Making a

business case

for network

and application

performance

improvement

COUNTING THE COST OF NETWORK EFFICIENCY

Page 2: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com2

White Paper

This White Paper looks at the financial benefits of improving network and application performance in terms of the impact on the business

as well as the capital and operating expenditure, and how IT teams can make a business case for the tools they need to identify problems,

resolve them quickly and optimise performance.

Planning ahead: why IT budgets need to allow for performance problems CIT budgets are built around the cost of infrastructure, applications and the personnel to manage them – a combination of CAPEX (capital

expenditure) and OPEX (operating expenditure). They do not traditionally take account of equipment failure and downtime. Businesses like

to assume that their network will work as expected and deliver the SLAs agreed with the IT department.

However, in the real world, performance issues of varying degrees of magnitude are bound to occur from time to time and will necessarily

require the IT team to address. Tracking down a problem in today’s increasingly complex enterprise networks is difficult. A performance

problem could be located anywhere across the WAN, LAN, WLAN or at a remote site, and could be in the network, an application or

a service provided by a third party e.g. cloud service provider. Identifying and rectifying such a problem requires time, expertise and

equipment, resources that could otherwise be spent optimising performance and working on new initiatives that will add value and help

the business increase its competitive edge.

Quantifying the cost of improved network performance to the business so that solutions can be included and justified in the IT budget

is not straightforward. How do you calculate the costs associated with optimising an aspect of the network in order to solve a problem

that has not occurred yet, or work out the ROI of something that helps you solve more it quickly? Should you take into account wasted

employee time waiting for an application to respond, and what about lost revenue due to customers abandoning a process that is taking

too long?

These questions become more urgent as organisations become increasingly dependent on critical business applications. When drawing

up their IT budgets, they need to consider the impact of investment in IT (or lack of it) on their overall revenue. This could be positive,

through for example increased transactions or improved customer satisfaction through a faster response to enquiries, or alternatively

negative, due to slow response times or even downtime.

The challenge for IT teams is to identify what resources they need to optimise performance, and then to justify those costs in a way that

can be understood and bought into by their commercial colleagues, addressing CAPEX, OPEX and business performance when consider TCO.

Making the business case for the network The first step in addressing this problem is for the business to understand that the money spent on setting up, operating and managing a

network is not a ‘sunk’ cost – despite what many executives may think. Instead, they need to understand that the network is a core part

of what their organisation does, and plays a key role in achieving business goals. No board meeting should swallow such costs, however

creative their P&L accounts team are in grouping network costs into ‘unrecovered spend’! For this reason alone IT efficiency should be a

priority item for any corporate agenda.

Very few organisations could operate without a network – not because of the intrinsic value of the network hardware and software, but

because they and their staff rely on the applications that the network supports. Users depend on the network to power their application,

and if the network is running slowly, their applications will slow down too. So there is a direct correlation between network performance

and application performance.

Why business units should care about network performance

• An alarming 40% of customers will abandon a website after one or two bad experiences.

• In a dealing room in New York, London or Hong Kong, a 1ms latency of network delay can cause a $1 million difference in each

transaction.

• A manufacturing company might lose $250,000 as a result of having to shut down a production line because their just-in-time

(JIT) inventory systems go offline.

Page 3: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com3

White Paper

Spending on the network can therefore be justified if it is clearly linked to improving application performance and hence improving

productivity. This means a change in approach for the network team. No longer is their role simply about CPU utilisation and network

packets, but about ensuring that the network provides the best transportation and support for business applications. Their challenge is

twofold:

1. To persuade executive that their organisation’s network is a key business asset

2. To identify budget priorities: where changes in the network are needed to avoid problems arising or improve application performance

and hence productivity.

Executive dashboards

Business focused dashboards are needed to provide a snapshot of IT efficiency across the enterprise. Green identifies services that are

performing as required, while yellow indicates degraded services and red flags services that are having critical performance issues. A closer look

shows how many sites, servers and applications are affected and the period of time for which they have been affected.

Informed decisions need to be evidence based

This does not mean that organisations should simply throw money at their network. They need to find a way to optimise network

performance while reducing operational cost. This means having a clear understanding of what is going on across the network and with the

applications it supports. It is important to avoid making hasty decisions that are not evidence based. Every change should have a specific

purpose, whether solving a current problem, optimising performance or preparing the organisation and its network for the future. Any

action needs to be justified in terms of cost – ideally quantitatively, but failing that qualitatively.

Organisations need solutions and tools which will provide performance information and help them identify the causes of problems, or areas

where improvements can be made. This enables them to avoid making a reactive decision, perhaps driven by emotion or business pressure,

and instead make decisions based on performance data. The cost of these tools and solutions also has to be justified; the IT team will have

to demonstrate their value through speeding up problem-solving or ensuring that changes made as a result of the information provided will

increase productivity.

“Price is what you pay. Value is what you get.”

Warren BuffettAmerican business magnate, investor and philanthropist

Page 4: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com4

White Paper

Three stages of improving performance Network improvements can be categorised into three broad areas: reducing downtime by identifying and solving problems more quickly;

optimising the performance of existing infrastructure (i.e. doing more with what you have); and identifying priority areas where

improvements are needed (making the best use of your budget). We will look at each of these in turn.

In each case, the IT team needs to find answers to questions which get to the heart of the problem, such as:

• Which applications are consuming the most bandwidth and is this justified for business reasons?

• Are less important applications mistakenly prioritised over those which are most critical to the business?

• Is routine firefighting to maintain ‘business as usual’ preventing engineers working on strategic projects which will create business

value and result in a step change in performance?

1. Performance problems are expensive and often remain unsolved Every network will have performance issues from time to time. However, the scale of the problems many companies simply put up with –

and their users have to tolerate – is shocking. According to Forrester1, 31% of performance issues take more than a month to resolve or

are never resolved at all.

To avoid being part of these statistics, the IT team needs a system or process to provide a timely alert that a problem has occurred. The

worst case scenario is to find out through a call from a user, in which case they are already on the back foot. However, this occurrence

is all too common, according to Gartner1, which says that 70% of the time IT organisations learn about performance problems from end

users. Although organisations may have one or more network management tools in place, the alerts in many tools have to be manually

configured by setting the system to ping or discover all the devices in each broadcast domain. This gives the network team a lot of

different systems to watch, as well as coping with equipment that is not monitored.

This system also has to show when incidents have been handled by network redundancy – for example, when traffic is rerouted through a

back-up route because a link goes down. While the network may be able to cope with the problem, if the team are not alerted they will not

know to get it fixed, leaving them vulnerable going forward.

Once they are aware that something is not performing as it should, the IT team needs a way to identify root cause and resolve the problem

as quickly as possible, before it has a significant impact on the business. Reducing mean time to resolution (MTTR) will save the organisation

money as well as improving end-user productivity by reducing the staff time needed to work on the problem. According to Cisco2, 49% of the

total cost of ownership (TCO) of operating a network is labour costs, so reducing the labour needed to solve problems is clearly something that

can contribute to budget justification.

One key aspect of reducing MTTR is identifying whether the problem is with the network or an application. The increasing interdependence

of network and applications make this more and more difficult, as point tools only show what is happening in an individual system, not the

interdependencies. Experience shows that the network team are usually the first to be blamed, even if the problem is subsequently found to be

elsewhere.

Network management systems (NMS) and packet analysis will show whether a problem is in the network, but cannot see and analyse live traffic

at the transactional level so cannot help in troubleshooting application issues. Application Performance Management Systems typically support

auto-discovery of all the applications in the network, but if an application is running slowly they find it difficult to identify if the problem is

application or network based. So time can be wasted passing problems between network and application teams, rather than getting quickly to

root cause, while users continue to experience performance problems.

To solve problems quickly, IT teams need timely alerts and detailed information to help them get to root cause more quickly. They need an

accurate view of everything that can compromise application performance, from poorly executed application code to an overloaded server or

load balancer.

The average number of ‘limited outages’ is 11 every two years and the estimated cost of data centre downtime across industries is over $5,000 per minute.

Sources: Ponemon Institute© Research Report: 2013 Report on Data Center Outages and eWeek® article: Unplanned IT Downtime

Page 5: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com5

White Paper

Application Owners

The Application Performance dashboard shows the slowest applications, sites and servers. A quick glance identifies if that degradation is

caused by the network, server think-time or amount of data transferred between clients and servers.

Network Engineers

The Site Performance dashboard shows the busiest sites and network interfaces along with a display of the most utilised applications. Clicking

on a graph will provide a drilldown to more detail.

In order to justify the cost of appropriate systems to the business, they need to consider:

- the cost to the business of downtime

- the time required to identify and solve problems, in terms of labour cost

- time wasted passing problems between network and application teams, both labour cost and delay in solving the problem.

Although these costs cannot always be quantified, a consideration of them will shine a light on the potential benefits. Here is the scope of

problem one company faced in identifying the root cause of a problem and how they addressed the situation.

Page 6: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com6

White Paper

The cost of a two second delayNetherlands cooperative Flora Holland sells more than 20,000 different types of plants and

flowers at auction and processes $12.5 billion items and $4.4 billion each year. It uses an

auction clock environment, in which plants and flowers are brought into a room and bids

are made by in-person and online buyers. A database pushes information to the auction clock, which displays product, quantity, price, and the

time left at auction. Bidders push a button to buy flowers based on the price displayed on the clock at that precise moment. As soon as the

product is sold, a new product is displayed. The company relies on optimal network response times to its auction applications and databases

to present, track and sell plants and flowers as quickly as possible.

FloraHolland experienced a response time delay between its database and the auction clock. This slowed the clock display by up to two seconds

between each auction - unacceptable when the company is auctioning more than 50 million items each day. The IT team needed a way to

pinpoint the issue quickly and determine if the root cause lay with the database, the application or the network.

FloraHolland evaluated multiple technologies before implementing a Visual TruView network and application performance monitoring and

troubleshooting solution from Fluke Networks. This monitored performance from the application data query across the network to the database

and back and immediately identified the database as the root cause, taking an extra second or two to process the query and respond.

FloraHolland’s IT team optimised the database and fixed the problem without wasted effort scrutinising the entire network and application,

as the issue was isolated immediately. The result saves FloraHolland two hours of accumulated delays each year, equaling $200,000 in annual

savings, along with an improved brand image for its operations.

2. Optimising existing infrastructure: prevention is better than cure In many respects successful IT solutions are based on anticipation and staying one step ahead to avoid problems before they happen or at the

very least mitigate the potential impact of something not going according to plan. Problem solving is only one aspect of managing a network

- IT teams need to find ways to do more with less i.e. to optimise their existing infrastructure and get the best performance from everything

from routers to servers to wireless bandwidth. For example, they need to manage the trade-off between bandwidth and performance, which

requires a clear understanding of the location of and reasons for congestion. It is not just over-utilised links that may cause a problem; under-

utilised links can drain resources by using up budget that could be allocated to other, over-utilised links. There is more information on this

topic in our White Paper on capacity planning. http://www.flukenetworks.com/Optimising-bandwidth-whitepaper

Today’s networks and applications are increasingly virtualised, which adds a layer of abstraction and makes performance management more

complex. It becomes difficult to provide a single view of application usage from data centre to desktop, because a single physical server can

power multiple machines. With database servers, application servers, email servers, print servers and file servers all potentially sharing the

same piece of hardware, tracking network and application performance in order to optimise it becomes much more difficult as there is usually

less physical evidence available than in the traditional environment in which servers and applications are tightly coupled.

The continued growth of cloud services adds to the lack of visibility. Both in-house teams and cloud service providers will have agreed SLAs,

but in a complex network they need systems which can identify when and why performance is falling short and whose responsibility it is in

order to achieve the required performance standards.

There is a second aspect of optimising existing infrastructure which also has cost implications, but may be less frequently considered.

Optimisation is not just about identifying where changes are needed – the IT team also needs to prioritise those improvements. Gartner3

advise that, because poor network and application performance impact infrastructure costs as well as productivity, organisations need to focus

on the user experience and capture data that enables them to fix the “right” problem first. For example, if two routers are performing badly –

one at a remote office and one supporting a critical business application – engineers need to fix the one that has the biggest business impact

(i.e. cost) first.

Thirdly, many organisations use automation to speed up changes and reduce human error. However, this can create technical errors if not

implemented correctly; a patch may not be applied, or a table in a database may be deleted by mistake. Effective network optimisation

includes both automating processes to reduce labour costs and improve accuracy and repeatability, and then ensuring that the automation has

been applied correctly.

Page 7: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com7

White Paper

Automated discovery and infrastructure diagrammingAn automated performance map enables engineers to view the entire enterprise network and to ‘see’ what is going on in the network, who is using

it, where they are connected and what the path is from ‘here’ to ‘there’.

Thus in order to optimise performance of their existing infrastructure, IT teams need systems which will:

• monitor the performance of all equipment across the network

• provide instant real-time bandwidth usage reports on top applications, conversations and hosts, so they can identify links which

may need additional bandwidth and those which can be reduced

• monitor cloud service delivery and ensure providers are meeting their SLAs through monitoring the link to the cloud and providing

QoS/CoS reporting

• identify poor performance e.g. where the paths of applications or servers are running slowly, so that the slowest and most critical

paths can be addressed

• automate processes, verify that changes and upgrades have worked and ensure that they have not had a negative impact on

performance elsewhere.

They have to justify the cost of these systems in terms of the performance improvements that will result.

Here is how one organisation is testing the rollout of new systems and to support capacity planning.

Ensuring upgrades deliver the desired resultsGIAL was created by the city of Brussels to provide and manage their IT services, and also provides IT solutions for other municipal

organisations, public administration offices, hospitals and more, throughout the region. The telecom team at GIAL lacked good visibility into

the communication flow of the network and the performance of applications. With no insight into how applications impacted the network,

GIAL could not effectively test the rollout of new applications or accurately conduct capacity planning.

GIAL worked closely with the Belgacom Group to select Fluke Networks’ Visual TruView. It delivers a comprehensive application-aware network

performance monitoring solution that gives the team complete insight into performance, allowing them to detect and resolve network and

application issues quickly, before they impacted end-users. The team will also use TruView for capacity planning and when rolling out new

applications. With customer requirements for bandwidth and connectivity constantly evolving, the team at GIAL is always working to upgrade

the network. When new applications are developed or deployed, the team is responsible for understanding and planning for the impact.

Page 8: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com8

White Paper

“Adding new links and other hardware is a costly expense that we have to validate to our leadership team,” said Kris Vanbiervliet, team

leader for telecom and security at GIAL. “Before TruView, this was a tedious process. But now, with the visibility we have and the reporting

capabilities of the system, we monitor utilisation and quickly and effectively make a case for upgrades,” Vanbiervliet said. “The numbers don’t

lie, and with a report in hand, we’re able to streamline the capacity planning process.”

GIAL customers are also constantly developing custom applications or deploying new applications. “The big questions are: How will this

application perform on the network and will it impact the network. Before TruView it was trial and error, which caused a lot of mischief. Now,

with TruView, we can test before we fully deploy and make changes if needed before it impacts users.”

3. Spending budget where it will make the most difference

There will always be a need for strategic developments to a network, whether to improve performance beyond what can be done with existing

hardware and software, to support new applications or to enable the organisation to move into new areas. For example, an organisation may

take a strategic decision to support BYOD, requiring increased wireless bandwidth and perhaps application virtualisation to keep data in the

data centre rather than on the mobile device.

Organisations are under increasing pressure to do more with less, and so they need to know where to spend their limited budget for maximum

impact on performance. They will need to make a business case for the expenditure and explain the potential return on investment, whether

qualitatively or quantitatively. This requires appropriate performance data from their network to show the current situation and project the

future state.

For example, if they decide additional bandwidth is needed they have to demonstrate where and why, which requires an in-depth

understanding of capacity use and planning. They need to show where applications or servers are running slowly to make a case for projects

such as server upgrades. After changes such as virtualisation, WAN optimisation or data centre consolidation have been implemented, they

need to demonstrate that the projected improvements have been delivered. Organisations need to minimise the risk of making changes, and a

solution which can show why a change will be beneficial will assist in mitigating that risk.

Page 9: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com9

White Paper

Network Investments

Analyse the content and makeup of network usage to determine whether or not to expand existing bandwidth or invest in technologies such as

WAN acceleration. Validate real results in pilot schemes prior to making a financial commitment.

In order to justify network improvements, then, IT teams need to be able to state:

• the network impact of planned strategic changes and the improvements that will be required

• the potential ROI on planned improvements.

In other words, they need detailed information in order to make informed projections and ultimately budget decisions. As one organisation

found, identifying a potential problem in a new application at an early stage saved considerable time and effort in getting the new

functionality up and running.

Getting it right first time

A large transportation and logistics company had developed a new application which it was about to roll out across the business. However, at

the final testing stage they discovered a problem. They believed that this was in the application, and were about to carry out a major redesign.

At this point Fluke Networks provided its Visual TruView solution for a proof of concept trial. This immediately showed that the problem was

not in fact the application but in the network. The organisation was therefore able to address the problem at source, avoiding the time and

expense of redesigning, recoding and retesting the application and rolling it out for a second time.

Score ‘As’ by knowing what is happening in your networkThis could be summed up with the three A’s: anticipation, awareness and alert. The key to addressing all three aspects of network performance

is actionable intelligence: knowing exactly what is happening in the network and applications, end to end, across layers 1-7. This information

must also be usable. It is not enough to know what is happening at the point of measurement; truly understanding the situation requires

historic information to enable real-time and retrospective data analysis, and the ability to

drill down to obtain detail, for example to examine a transaction that may be introducing

latency to all upstream transactions.

It requires a combination of data points from both application and network performance

methodologies. This will enable network and application teams to work together to ensure

optimal performance of applications and network, whether that is through faster problem

solving, optimisation of existing infrastructure or making a case of improvements.

A newly emerging solution for this problem is AANPM – Application Aware Network

Performance Management. This takes an application-centric view of everything happening

across the network, providing end-to-end visibility of the network and applications and their

interdependencies, and enabling engineers to monitor and optimise the end user experience.

It does not look at applications from a coding perspective, but in terms of how they are

deployed and how they are performing.

One of the advantages of an AANPM system is that it will replace multiple separate network

and application performance management systems which provide separate point solutions,

each which has to be purchased and maintained. A study from Enterprise Management

Associates4 found that network teams use an enormous suite of disparate tools for

monitoring and troubleshooting. The majority use 4-10 tools, with some relying on as many

as 25 different solutions. This can be extremely costly in terms of purchasing, learning and

maintenance, as well as the lack of integration. Choosing an AANPM solution which uses

open standards and can manage equipment from multiple vendors both saves money and

ensures that organisations are not locked into a single vendor when purchasing network

equipment, which can be an important cost-saver.

Page 10: White Paper - Cloudinary

Fluke Networks www.flukenetworks.com10

White Paper

Measure performance to provide actionable analyticsWhen looking at an AANPM system, IT teams have to consider its cost against the potential benefits delivered. This may be in terms of faster

problem-solving, reducing downtime, increasing productivity and saving time spent on troubleshooting which can instead be spent on network

optimisation. It may be by helping the organisation do more with its existing resources, thus reducing the need to buy additional equipment

or bandwidth, or by ensuring that budget is spent in areas that will have maximum impact on performance.

An AANPM system will help organisations deploy both resources and people more effectively, and to understand the costs associated with each

potential decision. It enables the organisation to create a dashboard for the business which will show performance against SLAs, and help

non-technical executives to understand the impact of IT on overall performance and revenue.

In today’s world where IT budgets have to ensure maximum ROI, AANPM systems such as Visual TruView from Fluke Networks provide the

visibility to speed up problem-solving, identify areas for improvement and provide the data to justify new investment where it is most

needed. TruView is also simple to implement and easy to use with minimal training, providing a faster time to value than many alternatives –

something else which has to be taken into account when looking at overall cost.

References

1. Article ’15 reasons why you need APM in 2014’ http://apmdigest.com/15-reasons-why-you-need-apm-in-2014-1

2. Cisco infographic http://www.flickr.com/photos/cisco_pics/6231020843/sizes/o/in/photostream/

3. “How to Significantly Reduce IT Infrastructure and Operations Costs”, Jay Pultz, Gartner, published 24th April 2013.

4. EMA Impact Briefs: “Taking End-to-End AANPM to a New Level” and “Visual TruView Unifies Network & Application Performance Management”, Jim Frey,

Enterprise Management Associates.

Page 11: White Paper - Cloudinary

White Paper

Fluke Networks operates in more than 50 countries

worldwide. To find your local office contact details,

go to www.flukenetworks.com/contact

Corporate Office:Fluke NetworksP.O. Box 777 Everett, WA USA 98206-07771-800-283-5853e-mail: [email protected]

European Office:Fluke NetworksP.O. Box 1550, 5602 BN EindhovenGermany 0049-(0)682 2222 0223France 0033-(0)1780 0023UK 0044-(0)207 942 0721e-mail: [email protected]

11

©2014 Fluke Corporation All rights Reserved. 6003718A

Solutions from Fluke NetworksTo help network managers streamline capacity planning and obtain complete visibility into utilisation, Fluke Networks has developed its Visual TruView monitoring appliance and OptiView XG network engineers tablet to provide improved capabilities for capacity planning and complete visibility into bandwidth utilisation.Visual TruView and the OptiView XG have configurations that provide both 1Gbps and 10Gbps connectivity.

Visual TruViewTM ApplianceTruView provides the ability to track, baseline, trend and monitor individual

application performance of every end user experience, enterprise-wide through a

highly customizable dashboard. It also provides high volume packet archival at

10Gbps line rate and comprehensive VoIP/Video monitoring and troubleshooting.

Visual TruView is:

• Simple – it can produce actionable data in less than 30 minutes after installation, and has built-in auto-discovery and configuration

and an intuitive web interface

• Intelligent, with self-learning performance baselines, time correlated views and guided workflows

• Complete, providing monitoring and troubleshooting capabilities in a single appliance and enterprise scalability through a distributed

architecture. It offers five tools in one solution: response time monitoring, retrospective packet analysis, network traffic analysis, device

performance monitoring and VoIP performance monitoring.

More information at www.flukenetworks.com/truview

OptiView XG® – Automated network and application analysisThe OptiView XG is the first tablet specifically designed for the Network Engineer. It automates

root-cause analysis of network and application problems allowing the user to spend less time on

troubleshooting and more time on other initiatives. It is designed to support deployment of new

technologies, including unified communications, virtualization, wireless and 10 Gbps Ethernet.

The result is that new initiatives get up and running faster and network stay productive even in

these days of smaller teams.

More information at www.flukenetworks.com/xg

About Fluke Networks

Fluke Networks is the world-leading provider of network test and monitoring solutions to speed the deployment and improve the

performance of networks and applications. Leading enterprises and service providers trust Fluke Networks’ products and expertise to help

solve today’s toughest issues and emerging challenges in WLAN security, mobility, unified communications and data centres. Based in

Everett, Washington, the company distributes products in more than 50 countries.

For more information on our network and application performance solutions, visit www.FlukeNetworks.com/instantvisibility