13
The State of Analytics in IT Operations White Paper IT Operations Management

The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

The State of Analytics in IT Operations

White PaperIT Operations Management

Page 2: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

1www.microfocus.com

IntroductionIf you lived through the AI (artificial intelligence) hype of the 1990s or earlier, you might be skeptical about seeing the term in such frequent use these days. AI can mean many different things, and always has.

It has also changed names a few times, from IT Operations Analytics (ITOA) to Algorithmic IT Ops, AIOps, and Cognitive Operations. But over the past two years, with influential analyst firms like Gartner and Forrester getting on board with the term, AI is getting more respect, and it’s getting more practical.

According to a recent Forrester survey, ITOA is the number one application of AI technology that busi-nesses are considering. Also high on Forrester’s list are business insight and security, all of which are related to ITOA at a fundamental level.

IT Ops Analytics all starts with data collection or monitoring data. Analytics is dependent on data and lots of it, often called Big Data. Data is the food that fuels analytics, without it analytics has nothing to look at to find patterns or anomalies that provide us insight.

IT Operations Analytics holds considerable promise for making day-to-day IT Ops work easier. But what does this mean for IT Ops specialists who aren’t trained in analytics? Do they now need to take classes in data science and machine learning, and learn to write the algorithms that lie at the heart of analytics capabilities?

No. But it does mean that IT Ops specialists should be at least familiar with the kinds of analytics be-ing used, increasingly, in their industry. They should take advantage of whatever analytic capabilities are embedded in their tools, and they should know when to seek guidance from other teams in the organiza-tion – security, big data, business intelligence teams, for example—when they have questions or want to improve their analytics skills.

There’s a lot to consider. Here is an overview of what’s happening today in the ITOA space, along with some expert advice.

IT Ops Teams: Don’t Panic Over AnalyticsCompared to specialists in security or big data, where analytics is a core part of the job description, the analytic skills within an IT Ops organization tend to be relatively low, which is to be expected. The technology and the field itself is fairly new within IT Ops. Besides, “analytics within IT Ops isn’t usually something that demands a data scientist,” says Michele Goetz, principal analyst with Forrester Research, who specializes in business insights, artificial intelligence, information management, architecture, and strategy.

IT operations analytics holds considerable promise for making day-to-day IT Ops work easier.

Page 3: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

White PaperThe State of Analytics in IT Operations

2

“IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams, if possible,” says Goetz. This might include a security, or business intelligence, or big data team that can provide basic help or training for those just getting their feet wet with analytics.

“But what I see is that IT Ops specialist tend to rely on the analytics capabilities within the platforms and solutions acquired for the IT Ops organization,” says Goetz. “These are not the same capabilities you find in security analytics tools, which represent some of the most sophisticated capabilities on the market.” Instead, these are tools to help performance, monitor and predict spending, what Goetz calls “the block-and-tackle job of running and maintaining the platform, keeping the lights on, being agile to support busi-ness needs. These are the things that require operational analytics.”

Some teams are running models that give them a better understanding about cost to performance, they’re managing resources with tools that can give them more detail than the higher-level performance metrics they might have used in the past. “This is not big, sophisticated predictive and prescriptive modelling, that you might see in other parts of the business. The best IT Ops teams are looking to mine a little deeper into the system data that comes from their infrastructure or learning about the types of queries they can run against the system and figure out better styles of workload management,” Goetz says.

So, don’t panic if you don’t have the skills to be more sophisticated with analytics. But you may want to begin exploring machine learning, “at least take a look at what your own solutions offer for embedded machine learning in the operations,” says Goetz. “This will bring you up a level over the coming year.”

Controlling The Spend: The Number One Target For IT Ops AnalyticsAnalytics, in addition to all its other users, has a focus on the spending side of IT Ops. You’re trying to lower you cost to performance. What is the total cost of ownership for your technologies? How do you right size your resources, and what’s happening with your outsourcing and contracting? The goal is to get smarter about the resources you need, the investments you need to make.

CIOs are constantly under pressure to contain their budgets, which means justifying expenditures against the business value. And much of the analytics that can help with this comes out of the box for many tools.

“But what I see is that IT Ops specialist tend to rely on the analytics capabilities within the platforms and solutions acquired for the IT Ops organization,” says says Michele Goetz, principal analyst with Forrester Research.

Page 4: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

3www.microfocus.com

“Take automated data warehouses,” Goetz continues. “Based on years of understanding how data centers run in the cloud, what those workloads are, all that understanding is built into the tools.”

Rather than requiring users to figure out their particular environment, the machine smarts can guide you via patterns that come preconfigured. Meanwhile, “vendors of IT Ops technology continue to learn how different types of workloads and administrative tasks are informing how you’re managing and optimizing those environments as well as managing it toward your cost to performance models,” Goetz says. “This guidance is not as targeted, of course, as if a Google or Facebook data scientist were opening the box and tweaking the model. But this is going to be much more of the norm than needing a data scientist in-house. The pretrained environments are usually sufficient.”

IT Service Management: Analytics Drives Issue ResolutionAn efficient, authoritative IT service desk can be a business’s best line of defense when customers call with critical software problems. It can put the customer at ease, with faster ticket resolution, and even faster resolution when the problem involves a known issue can be resolved via smart self-service based on analytics. “Good analytics leverages information across a number of different sources to help a worker opening up a ticket,” says Jeff Jamieson, CEO of Whitlock Infrastructure Solutions. “A problem is described, and the analytics engine underneath can tell you, essentially, ‘wait... we just had 15 other people log this same problem.’ We’re seeing our customers adopting machine learning as a way to drive down the time and cost of tickets.” (More on machine learning below.)

As ITSM teams monitor business services they provide to customers, capabilities like smart search, smart ticket, virtual agents for 24x7 support, and social collaboration, all based on machine learning and analytics, help meeting related service level agreements (SLAs). These issues can cover a broad range of areas and may have to do with business processes like order-to-cash or infrastructure services like email.

But the biggest ITSM payoff for analytics may lie in understanding how long it takes to resolve tickets. Identifying root causes—and providing solutions—before they become widely reported problems im-proves your business’s reputation, reduces labor costs, and leads to better services.

Good analytics leverages information across a number of different sources to help meet service level agreements (SLAs).

Page 5: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

White PaperThe State of Analytics in IT Operations

4

Anomaly Detection and Resolution via ChatOpsAs IT Ops teams use tools to define baselines for normal operations, they’re setting the stage for anomaly detection – the ability to find what’s out of spec or overloaded, conditions that something is out of bounds. “When there’s an outage or a failure, there’s a common reaction on the business side: ‘Hey, we pay all this money for monitoring tools; why didn’t you catch this problem?’” says Jamieson.

“Typically, you can only catch things that you anticipate. The things that drive our customers crazy are events that they can’t even imagine—events, for example, based on a piece of infrastructure that no one has a clue was there.

“But the beauty of analytics-driven anomaly detection is that you don’t have to know everything that might go wrong. While there are millions of log files that have captured what’s going on in your environment, analytics can point you to 3, 4, or 6 areas that seem to be most relevant to your problem, based on data. This is a new type of opportunity.”

With analytics built into performance monitoring tools, IT Ops teams may have the ability to review timelines for performance on specific servers and see where and when performance took a hit. The next step is to find out why, which is root cause analysis.

ChatOps can speed up this process since automation can alert IT Ops teams about an issue in play. “ChatOps runs the gamut from service management, where an agent taking customer calls can use bots to improve efficiency,” says Jamieson “or in core monitoring systems where ChatOps helps you quickly pull together experts and the right folks to explore a problem, and review suggestions made by the chatbots at work within the integration.”

Plus, by leveraging an autonomous agent or a bot to source up data right away, you can discover quickly if your business systems are aligned to resolve the issue, says Jamieson.

“Auto resolution of events based on specific criteria is the goal. It’s expensive to operate a help desk, to have Level 2, Level 3 engineers distracted by having to solve gobs of problems, to have war rooms of people trying to solve problems instead of doing their regular jobs. All of that is a huge cost to IT.”

When AI is paired with runbooks, automated remediation becomes reality.

“But the beauty of analytics-driven anomaly detection is that you don’t have to know everything that might go wrong. While there are millions of log files that have captured what’s going on in your environment, analytics can point you to 3, 4, or 6 areas that seem to be most relevant to your problem, based on data. This is a new type of opportunity,” says Jeff Jamieson, CEO of Whitlock Infrastructure Solutions

Page 6: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

5www.microfocus.com

Being able to distill millions of log file data points down to a few key anomalies still requires human in-telligence to examine the output. But automation and analytics working together can drive down costs compared to older methods.

Machine Learning 101As Michele Goetz noted above, the analytics that come built-in with popular IT Ops tools for monitoring, load balancing, etc. will generally offer what teams need for operations analysis, at least enough to get started. This capability is usually supported by machine learning – the use of preconfigured algorithms that, over time, allow a system to alert and often respond to conditions set by the user.

Figure 1. This analytics dashboard shows both anomaly detection in the upper right and log analytics lower right where 2.9M log messages were processed to find 20 significant ones. Courtesy: Micro Focus

Page 7: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

White PaperThe State of Analytics in IT Operations

6

Although creating and manipulating complex algorithms typically requires advanced training in data sci-ence, putting them to use is less complicated.

“What you’re trying to accomplish is fairly straightforward,” says Torrey Jones, principal analyst at Greenlight Group. “You take a set of information—the more the better - and split that into ‘good’ or desired informa-tion, and ‘bad’ or undesired information. You will most likely have some unknown information left over.”

Once you have identified these subsets in the data, you feed them into the machine, which is a math-ematical equation that operates on the unknown subset of information. This is the process of “training,” providing the machine a base level of understanding for what you want and don’t want in your data. Over time, as you feed the machine unknown information (i.e., metric and log data from your IT infrastructure and systems), the machine is able to discern good or bad data on its own. “The decisions are based upon the original subsets you gave it, but the mathematical calculation is self-teaching. The more information you give it, the better the machine gets at determining if the data reveals a desired or undesired state” – i.e., conditions are normal, or conditions are anomalous and need attention.

“Of course, it’s still up to you, the human, to tell the machine that something it processed as good is actu-ally bad - or vice versa,” says Jones. Over time, with more human-based corrections and more data, the machine gets very good at predicting anomalies in the data. “Note that an anomaly can be good or bad. In IT Ops, we are typically only concerned with the bad anomalies, things that indicate a failure condition may be occurring or has occurred.”

IT Ops Analytics and the Cloud“As more applications move to the cloud, the CIO tends to have more sleepless nights,” says Stefan Bergstein, chief software architect for hybrid cloud at Micro Focus. Executives must ensure that the en-tire infrastructure, which is audited and indirectly managed by the lines of business, is safe and secure. “Whether your infrastructure is on-prem or in the cloud, you want to prevent information leakage, you don’t want open ports, etc. It means that enforcing best practices and compliance is key and any kind of analytics and machine learning that can identify configuration settings or patterns of usage that suggest an anomaly or a breach... all of that is critical.”

As important as anomaly detection and root cause analysis is, the ability to offer procurement guidance to users can be just as important a use case. Bergstein explains: “Say I’m a user requesting a service on an

As important as anomaly detection and root cause analysis is, the ability to offer procurement guidance to users can be just as important a use case.

Page 8: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

7www.microfocus.com

instance of the cloud. I want to know the best size and location for the machine. If I’m located in Germany and I want to deploy something, the system should know that I will be best served by information I get from Frankfurt, because the server there meets the compliance requirements governing my request. The data should not be acquired, in this case, across national borders. Or perhaps the data should simply not leave the country.”

Based on either best practices or hard coded rules, as well as history that the system learns over time, a system should suggest the right settings or workloads, the image type, and machine type. All of this makes it faster and easier for the user to get the right configuration in place.

“It’s also important to know how long a task is taking,” says Bergstein. “If I’m deploying a configuration or a patch to my networked devices, I want to predict how long the change is going to take. Or if I’m requesting services on the cloud, and I know the process will take only a few minutes, I can wait for the completion in front of my monitor. But if it will take longer, then I should probably schedule that for a later time.”

The Overlap of Security and IT OpsAs analytics capabilities continues to improve across the full spectrum of IT tools, the boundaries between security, operations, business intelligence, and ITOA are getting blurry. Take for example network traffic analytics. Tools like Cisco’s NetFlow have worked for many years in the on-prem environment to monitor IP network traffic going in or out of a system. Now cloud providers such as AWS allow you to access net flow data that can be used to detect anomalies for security purposes.

Network management tools should be able to analyze that data as well. “Only a few years ago this sort of information was not available for cloud analytics,” says Bergstein. “The benefit is showing correlations between traffic: Do I have too much or too little traffic between specific machines?” This is a key use case for analytics in cloud management. More on networking analytics below.

While security teams are certainly using a variety of analytics tools to keep data, applications, and systems safe by looking for threats, it’s often the case the IT Ops teams using analytics have a wider purview, says Jeff Jamieson. “They see anomalies of all types, affecting both performance and economics. IT Ops might work with security teams to feed what they have done into a much broader set of logs and data feeds.”

IT Ops teams using analytics see anomalies of all types, affecting both performance and economics.

Page 9: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

White PaperThe State of Analytics in IT Operations

8

For example, if a security team is using a SIEM (security information and event management) system, IT Ops teams can leverage those systems as a log-capture facility. “We see our customers ingesting data from Splunk, ArcSight, and Logstash as a source for log file information,” says Jamieson. “All of that can be rolled into a single system for anomaly detection and other analytics purposes.”

In an ideal world, the practice of security isn’t just about protecting perimeters and discovering anomalies, but also tying back into the business, understanding how it operates, and how IT Ops fit into that model.

“Operations should be federated between all operational units with shared responsibilities,” says Michele Goetz. “What I’m beginning to see is that security, privacy, regulatory, legal, and compliance all becoming intertwined within the CISO tool suites. Security concerns aren’t limited to the security specialists in a company. They need to broaden, and take into consideration business operations holistically. Besides, you have to realize that the security breaches actually occur in the IT operations space.”

Analytics for the Essentials: Networking and Backup/RecoveryThere are at least two other areas of IT Operations where analytics is playing an increasing role: network-ing – whether traditional or virtual—and backup and recovery methods.

Analytics in Networking Unexpected network traffic – which can slow performance considerably – is often caused by unauthorized network device configuration changes to physical, SDN, and virtual controllers. (Gartner reports that 40% of mission-critical service outages are caused by configuration-related issues.) The analytics should, ide-ally, alert networking staff not only to the drop in performance but also to any configuration change that may be its root cause. This will focus operations staff on a configuration check first, which is much more efficient than an events-only model, or a log-file based triage model.

“The correlation between out-of-compliance device configuration and a network performance hit can be achieved with a general analytics tool”, says Frank Bonifazi, product marketing manager, network op-erations management at Micro Focus. “But it requires mining data from multiple sources, and significant network domain knowledge. For example, knowing that her company’s network is well-designed, a network professional might suspect that CPU over-utilization is being caused by configuration changes”. In the illustration below, we can see an out-of-the-box correlated view superimposing configuration events over a specific device performance graph.

Analytics should, alert networking staff not only to a drop in performance but also to any configuration change that may be its root cause.

Page 10: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

9www.microfocus.com

Analytics in Backup and RecoveryEven backup and recovery processes are benefiting from analytics. As the traditional function of the data center expands to include the cloud, data resides in multiple locations, gets accessed by local and remote users, and is often spread across the organization in different versions, formats, and media.

The essential question is whether the IT teams that manage the backup environment are equipped to identify issues such as unbalanced use of backup resources, inability to meet the target service-level agreements (SLAs) for mission-critical applications, or resolve future resource conflicts or other system issues before they lead to outages and data loss.

Figure 2. This graph shows two configuration changes (vertical blue lines) time-correlated with out-of-normal CPU utilization (purple curve). In this case, the first change was done without approval, and the second configuration change was to return it to the approved company compliance policy. Courtesy: Micro Focus

Page 11: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

White PaperThe State of Analytics in IT Operations

10

Another goal in backup/recovery analytics is reducing the capitalization expenditure (CAPEX) and operating expenditure (OPEX) with high utilization of the infrastructure. This can keep administrators from resorting to reactionary approaches to problem resolution that often lead to complicated future challenges.

Key use cases for analytics in backup/recovery include:

■ Real-time predictive analytics that provide insight into daily use of the backup process, as well as future performance and capacity gaps regarding data sets and infrastructure.

■ “What-if” scenario evaluations that help teams understand whether or not SLAs are achievable, and suggest best ways to balance the demands of new data sets within the existing infrastructure.

Figure 3. This dashboard for a backup and recovery system shows users at a glance how many backup sessions failed, when, and on which media they failed.

Page 12: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

11www.microfocus.com

■ Storage capacity planning for monitoring an ongoing data growth and how the available storage media is being filled. I.e., if data continues to grow at the current rate, how much new storage will be needed before storage shortage occurs?

■ Identifying potential resource conflicts and systematic issues before they cascade into outages and data loss.

Use IT Ops Analytics for Better Business AlignmentIf you’re able to see and understand what’s running through your IT Ops environment—through your archi-tecture—and use analytics to map these things to what your stakeholders care most about, your value as an IT Ops specialist will become more obvious to the business.

Goetz poses three key questions that IT management cares about: “Where are they bottlenecked with limited resources? Where has the technology failed to meet their needs? What friction is there?”

“The better enterprise architecture teams are using analytics to see what’s happening on their landscapes and managing that back to the requests coming in,” she says, “and looking at the utilization by the business teams themselves so that they can make decisions like impact analysis or reuse within their environments. They’re coming at it in a smarter way, rather than just taking requests like in a deli and trying to get those done.”

If you can put analytics on top of your own operational practices, the business wins: You can reduce costs, and you get IT Ops into better alignment with business goals “rather than building a bunch of platforms that will just start collecting dust through low adoption,” Goetz notes.

Ultimately, you will understand where resources can be deployed and reused, and you’ll help IT leadership make better investment decisions.

Learn More Atwww.microfocus.com/opsbridgewww.microfocus.com/SMAwww.microfocus.com/NOMwww.microfocus.com/dataprotector

If you can put analytics on top of your own operational practices, the business wins: You can reduce costs, and you get IT Ops into better alignment with business goals.

Page 13: The State of Analytics in IT Operations · The State of Analytics in IT Operations 2 “IT Ops teams in mid to large size orgs often tap into the analytic know-how within other teams,

www.microfocus.com

Additional contact information and office locations:www.microfocus.com

162-000172-001 | M | 08/18 | © 2018 Micro Focus or one of its affiliates. Micro Focus and the Micro Focus logo, among others, are trademarks or registered trademarks of Micro Focus or its subsidiaries or affiliated companies in the United Kingdom, United States and other countries. All other marks are the property of their respective owners.