Online IT Ticket Automation Recommendation Using ...Online IT Ticket Automation Recommendation Using Hierarchical Multi-armed Bandit Algorithms Qing Wang Tao Li y S. S. Iyengarz Larisa

Online IT Ticket Automation Recommendation Using HierarchicalMulti-armed Bandit Algorithms

Qing Wang∗ Tao Li † S. S. Iyengar‡ Larisa Shwartz§ Genady Ya. Grabarnik¶

Abstract

The increasing complexity of IT environments urgent-ly requires the use of analytical approaches and auto-mated problem resolution for more efficient delivery ofIT services. In this paper, we model the automationrecommendation procedure of IT automation servicesas a contextual bandit problem with dependent arms,where the arms are in the form of hierarchies. Intu-itively, different automations in IT automation services,designed to automatically solve the corresponding ticketproblems, can be organized into a hierarchy by domainexperts according to the types of ticket problems. Weintroduce a novel hierarchical multi-armed bandit algo-rithms leveraging the hierarchies, which can match thecoarse-to-fine feature space of arms. Empirical experi-ments on a real large-scale ticket dataset have demon-strated substantial improvements over the conventionalbandit algorithms. In addition, a case study of dealingwith the cold-start problem is conducted to clearly showthe merits of our proposed algorithms.

1 Introduction

1.1 Background Driven by the rapid changes in theeconomic environment, business enterprises constantlyevaluate their competitive position in the market andattempt to come up with innovative ways to gain com-petitive advantage. Value-creating activities cannot beaccomplished without solid and continuous delivery ofIT services. The growing complexity of IT environmentsdictates an extensive use of analytic combined with au-tomation. Incident management is one of the most crit-ical processes in IT service management (ITSM) as itresolves incidents and restores provisioned services.

A typical workflow of ITSM is illustrated in Fig-

∗School of Computing and Information Sciences, Florida In-ternational University. [email protected]†School of Computing and Information Sciences, Florida In-

ternational University. [email protected], deceased December 2017‡School of Computing and Information Sciences, Florida In-

ternational University. [email protected]§IBM T.J. Watson Research Center. [email protected]¶Dept. Math & Computer Science, St. John’s University.

[email protected]

Applications

Customer Sever

Monitor

Alerts

Event Management System

Tic

kets

manual

tickets

IPC System

ticket

resolution

(1) (2)

(3)

(6)(4) IT Automaton Services

escalated

tickets

monitoring

tickets

Problem Server

Execute the automation

Feedback (success/failure)

+

(5) Enrichment Engine

hierarchies

+

Automation Engine

Ontology Modeling

Relations

Knowledge

Base

Classes

Human Engineers

Recommend

an automation Feedback

Figure 1: The overview of ITSM workflow.

ure 1. It usually includes six steps: (1) As an anoma-ly detected, an event is generated and the monitoringemits the event if it persists beyond a predefined du-ration. (2) Events from an entire IT environment areconsolidated in an enterprise event management sys-tem, which upon results of quick analysis, determineswhether to create an alert and subsequently an incidentticket. (3) Tickets are collected by an IPC (Incident,Problem, and Change) system [22]. (4) A monitoringticket, identified by IT automation services for poten-tial automation (i.e., scripted resolution) based on theticket description. In case the issue could not be com-pletely resolved, this ticket is then escalated to humanengineers. (5) In order to improve the performance of ITautomation services and reduce human efforts for esca-lated tickets, the workflow incorporates an enrichmentengine that uses data mining techniques (e.g., classifi-cation and clustering) for continuous enhancement ofIT automation services. Additionally, the informationis added to a knowledge base, which is used by the ITautomation services as well as in resolution recommen-dation for tickets escalated to a human. (6) Manuallycreated and escalated tickets are forwarded to humanengineers for problem determination, diagnosis, and res-olution, which is a very labor-intensive process.

1.2 Motivation In today’s economic climate, IT ser-vice provider is expected to focus on innovation and

Copyright c© by SIAM

Unauthorized reproduction of this article is prohibited

assisting customers in their core business areas. Timethe experts spend on fixing operational issues has to beminimized. Enterprise IT Automation Services [2] havebeen introduced into ITSM as an engine for automatedcorrective actions (i.e., scripted resolutions) and closureof incident records.

Figure 2 shows an example of an IT service manage-ment ticket that was automatically generated by a moni-toring system, and successfully fixed by the automationengine. The summary and monitoring class (i.e., analert key) of the ticket provides an initial symptom de-scription, which is used for automation service to identi-fy existing automation or lack thereof. If the problem isresolved by the recommended automation, the value of“AUTORESOVLED” will be marked non-zero. To im-prove the efficiency of the recommending strategies ofthe automation engine, it is essential to understand howthe symptoms could be mapped to the correspondingscripted resolutions. This is the initial motivation forour study. Based on preliminary studies [22, 19, 28, 18],we have identified three key challenges in virtual engi-neering technology.

ALERT_KEY cpc_cpuutil_gntw_win_v3

CLIENT_ID HOSTNAMEORIGINAL

SEVERITYOSTYPE COMPONET

SUBCOMP

OMET

136LEXSBWS01

VH2 WIN WINDOWS CPU

TICKET

SUMMARY

AUTOMATON_NAME CPC:WIN:GEN:R:W:System Load Handler

AUTO

RESOVLED

CPU Workload High. CPU 1,

busy 99% time.

1

TICKET

RESOLUTION

The CPU Utilization was quite reduced,

hence closing the ticket.

OPEN_DTTM

2016-04-30

12:43:07

Figure 2: A sample ticket in ITSM and its correspondingautomaton.

Challenge 1. How do we appropriately solve the well-known cold-start problem in IT automation services?Most recommender systems suffer from a cold-startproblem. This problem is critical since every systemcould encounter a significant number of users/items thatare completely new to the system with no historicalrecords at all. The cold-start problem makes recom-mender systems ineffective unless additional informa-tion about users/items is collected [13, 6], which is acrucial problem for automation engine as well, since itcannot make any effective recommendation that trans-lates into significant human efforts. Multi-armed ban-dit algorithm can address the cold-start problem, whichbalances the tradeoff between exploration and exploita-tion, hence, maximizing the opportunity for fixing thetickets, while gathering new information for improvingthe goodness of the ticket and automation matching.Challenge 2. How do we utilize the interactive feed-back to adaptively optimize the recommending strategiesof the enterprise automation engine to enable a quickproblem determination by IT automation services?The automation engine (see Figure 1) automaticallytakes action based on the contextual information of the

ticket and observes the execution feedback (e.g., suc-cess or failure) from the problem server. The currentstrategies of the automation engine do not take advan-tage of these interactive information for continuous im-provement. Based on the aforementioned discussion,we present an online learning problem of recommendingan appropriate automation and constantly adapting theup-to-date feedback given the context of the incomingticket. This can be naturally modeled as a contextu-al multi-armed bandit problem, which has been widelyapplied into various interactive recommender system-s [10, 27, 25]. To the best of authors’ knowledge, itis the first study to formulate the strategies of the au-tomation recommendation in IT automation services asa contextual bandit problem.Challenge 3. How do we efficiently improve the per-formance of recommendation using the automation hi-erarchies of IT automation services?Domain experts usually define the taxonomy (i.e., hi-erarchy) of the IT problems explicitly (see Figure 3).Correspondingly, the scripted resolutions (i.e., automa-tions) also contain the underlying hierarchical problems’structure.

For example, a ticket is generated due to a failureof the DB2 database. The root cause may be databasedeadlock, high usage or other issues. Intuitively, if theproblem was initially categorized as a database problem,the automated ticket resolutions have a much higherprobability to fix this problem, than if it hasn’t beencategorized as such, and all other categories (e.g., filesystem and networking) are now taken into considera-tion. We formulate this as a contextual bandit problemwith dependent arms organized hierarchically, whichcan match the feature spaces from a coarse level first,and then be refined to the next lower level of taxonomy.The existing bandit algorithms can only explore the flatfeature spaces by assuming the arms are independent.

All

File System Database Networking

HDFS NAS DB2 Oracle Cable NIC

Figure 3: An example of taxonomy in IT tickets.

1.3 Contribution To the best of our knowledge, thispaper is the first work to formulate the automationrecommendation in IT automation services as a multi-armed bandit problem by considering the dependenciesamong arms in the form of hierarchies. We demonstratethis approach on the automation recommendation for ITservice management. The contribution mainly focus-es on proposing hierarchical multi-armed bandit algo-rithms to overcome the aforementioned three key chal-lenges. The key features of our contribution include:



• A new online learning approach, designed to (1)solve the cold-start problem, and (2) continuouslyrecommend an appropriate automation for the in-coming ticket and adapt based on the feedback toimprove the goodness of match between the prob-lem and automation in IT automation services.

• Utilization of the hierarchies, integrated into banditalgorithms to model the dependencies among arms.

The effectiveness and efficiency of our proposed methodsare verified on a large dataset of tickets from IBM GlobalServices.

The remainder of this paper is organized as follows.The related work is presented in Section 2. In Section 3,we give the mathematical formalization of the problem.The solution to the problem is provided in Section 4.Section 5 describes comparative experiments and an em-pirical case study conducted over the real ticket data,which demonstrate the efficacy of the proposed algo-rithms. Finally, Section 6 summarizes and concludesthe paper.

2 Related Work

In this section, we provide a short survey of the liter-ature related to the automated IT service managemen-t and multi-armed bandit algorithms with dependentarms.

The automation of IT service management is large-ly achieved through service-providing facilities in com-bination with automation of subroutine procedures suchas problem detection, problem determination, and tick-et resolution recommendation for the service infrastruc-ture. Automatic problem detection is typically real-ized by system monitoring software, such as IBM TivoliMonitoring [3] and HP OpenView [1]. Tang et al. [17]proposed an integrated framework to minimize the falsepositive and maximize the coverage for system faultdetection to optimize this procedure. For problemdetermination, a hierarchical multi-label classificationmethod [22, 26] was proposed to classify the problemtypes in the monitoring IT tickets. In order to deter-mine the root cause, authors [23, 21, 24, 25] analyzedthe historical events to reveal the underlying temporalcausal relationship between these sequential data. Au-tomated ticket resolution recommendation [16] is a bigchallenge in IT service management since it requires vastdomain knowledge about the target infrastructure.

Traditional recommendation technologies in ITSMfocus on recommending the proper resolutions to a tick-et reported by the system’s user. Recently, Wang etal. [19] proposed a cognitive framework that enablesan automation improvement through resolution recom-mendation utilizing the ontology modeling technique.

A deep neural network ranking model [28] was utilizedfor recommendation of the best top-n matching resolu-tions by quantifying the qualify of each historical reso-lution. However, previous literature tried to resolve aticket reported by the system’s user. To the best of ourknowledge, none of the existing studies has attempted toaddress the aforementioned challenges for those ticketsautomatically generated by a monitoring system, andit is the first time to leverage the multi-armed banditmodel to optimize the online automation procedure viafeedback in IT automation services.

Most recently, an interactive recommender systemhas become increasingly significant due to the abun-dance of online services. For an individual user, theycontinuously refine the recommendation results usingup-to-date feedback [27]. The tradeoff between explo-ration and exploitation inherent in learning from inter-active feedback has been well dealt with using contex-tual multi-armed bandit algorithms [14, 7, 10, 15, 25].However, most prior work (e.g., Upper ConfidenceBound (UCB) [5] and Thompson sampling [7]) assumesindependent arms, which rarely holds true in reality.Since the real-world items tend to be correlated witheach other, a delicate framework [12] is developed to s-tudy the bandit problem with dependent arms. In lightof the topic modeling techiques, Wang et al. [18] comeup with a new generative model to explicitly formulatethe item dependencies as the clusters on arms. Pandeyet al. [11] used the taxonomy structure to exploit de-pendencies among the arm in the context-free banditsetting. CoFineUCB approach [20] utilized a coarse-to-fine feature hierarchy to reduce the cost of exploration,where the hierarchy was estimated by a small number ofexisting user profiles. In contrast, we study the contex-tual bandit problem with a given hierarchical structureof arms, where the hierarchy is constructed by domainexperts based on the features of items.

3 Problem Formulation

In this section, we provide a mathematical formulationof the problem and describe the new contextual multi-armed bandit algorithms, which can utilize a taxonomydefined by domain experts explicitly depicting the de-pendencies among arms. A glossary of notations men-tioned in this paper is summarized in Table 1.

3.1 Basic Concepts and Terminologies Let A ={a(1), ..., a(K)} denote a set of automations (i.e., scriptedresolutions) feasible in IT automation system, where Kis the number of the automations. Every time a ticketis reported, the online IT automation recommendationprocess selects an automation a(i) ∈ A using contextualinformation (i.e., the symptom description in the ticket)

Copyright c© by SIAMUnauthorized reproduction of this article is prohibited

Table 1: Important Notations

Notation Description

a(i) the i-th arm.

A the set of arms, A = {a(1), ..., a(K)}.H the hierarchy (taxonomy) defined by domain experts.X d -dimensional context feature space.xt the context at time t.

rk,t the reward (payoff) of pulling the arm a(k) at time t.

r̂k,t the predicted reward (payoff) for the arm a(k) at time t.π the policy for pulling arm sequentially.Rπ the cumulative reward of the policy π.Sπ,t the sequence of (xi, π(xi), rπ(xi)) observed until time

t = 1, ..., T .

θk the coefficients predicting reward of the arm a(k).

σ2k the reward prediction variance for arm a(k).α, β the parameters of the distribution of σ2

k.µθ,Σθ the parameters of the distribution of θ.

and recommends it as a possible resolution for the ticket.Specifically, the contextual information for the reportedticket at time t is represented as a feature vector xt ∈ X ,where X denotes the d-dimensional feature space. Afterrecommending an IT automation a(k) at time t, itscorresponding feedback is received, indicating whetherthe ticket has been successfully resolved or not.

We formalize the online IT automation recommen-dation process as a contextual multi-armed bandit prob-lem where automations are constantly recommendedand the underlying recommendation model is instant-ly updated based on the feedback collected over time.In general, a contextual multi-armed problem involvesa series of decisions over a finite but possibly unknowntime horizon T . In our formalization, each automationcorresponds to an arm. Pulling an arm indicates its cor-responding automation is being recommended, and thefeedback (e.g., success or failure) received after pullingthe corresponding arm is used to compute the reward.

In the contextual multi-armed bandit setting, ateach time t = [1, T ], a policy π makes a decision forselecting an automation π(xt) ∈ A to perform an actionaccording to the contextual vector xt of the currentticket. Let rk,t denote the reward for recommending anautomation a(k) at time t, whose value is drawn froman unknown distribution determined by the context xtpresented to automation a(k). The total reward receivedby the policy π after T iterations is

(3.1) Rπ =

T∑t=1

rπ(xt).

The optimal policy π∗ is defined as the one with maxi-mum accumulated expected reward after T iterations,

(3.2) π∗ = arg maxπ

E(Rπ) = arg maxπ

T∑t=1

E(rπ(xt)|t).

Our goal is to identify a good policy for maximizing thetotal reward. Herein we use reward instead of regret

to express the objective function, since maximization ofthe cumulative reward is equivalent to minimization ofregret during the T iterations [27].

Before selecting the optimal automation at time t,a policy π is updated to refine a model for reward pre-diction of each automation according to the historicalobservations Sπ,t−1 = {(xi, π(xi), rπ(xi))|i = [1, t − 1]}.The reward prediction helps to ensure that the policyπ includes decisions to increase the total reward. Thereward rk,t is typically modeled as a linear combinationof the feature vector xt given as follows:

(3.3) rk,t = xTt θk + ξk,

where θk is a d-dimensional coefficient vector, and ξkdenotes an observation noise, a zero-mean Gaussiannoise with variance σ2

k, i.e., ξk ∼ N (0, σ2k). Then,

(3.4) rk,t ∼ N (xTt θk, σ2k),

and our objective function in Equation 3.2 can bereformulated as:

(3.5) π∗ = arg maxπ

T∑t=1

Eθπ(xt)(xTt θπ(xt)|t).

To address the aforementioned problem, contextualmulti-armed bandit algorithms have been proposed tobalance the tradeoff between exploration and exploita-tion for arm selection, including ε-greedy, Thompsonsampling, LinUCB, etc.

Thompson sampling is one of the earliest heuris-tic methods to address the contextual bandit problems,belonging to the probability matching family [4]. Themain idea is to allocate the pulling chance accordingto the probability that an arm produces the maximumexpected reward given the context xt at time t. Particu-larly, Thompson sampling method learns and maintainsthe posterior distribution of the parameters in the re-ward prediction model for each arm. At every time t,Thompson sampling first samples the model parame-ters from its posterior distribution learnt at time t− 1.The sampled parameters together with the contextualinformation xt are used for reward prediction. The ar-m with maximum predicted reward is then selected topull. Based on the feedback after pulling at time t, theposterior distribution of the model parameters for theselected arm at time t is updated and ready for armselection at time t+ 1.

LinUCB [10], an extension of the UCB algorith-m [5], is another contextual bandit algorithm. It pullsthe arm with the largest score computed by combin-ing both reward expectation and deviation, which arecomputed in light of the reward prediction model.



Although different multi-armed bandit algorithmshave been proposed and extensively adopted in diversereal applications, most of them do not take the depen-dencies between arms into account. In the IT environ-ment, the automations (i.e., arms) are organized withits taxonomy, i.e., a hierarchical structure. The follow-ing section will introduce our approach to make use ofthe arm dependencies in the bandit settings for IT au-tomation recommendation optimization.

3.2 Automation Hierarchy In IT automation ser-vices, the automations can be classified with a pre-defined taxonomy. It allows us to reformulate the prob-lem as a bandit model with the arm dependencies de-scribed by a tree-structured hierarchy.

Let H denote the taxonomy, which contains a setof nodes (i.e., arms) organized in a tree-structuredhierarchy. Given a node a(i) ∈ H, pa(a(i)) and ch(a(i))are used to represent the parent and children sets,respectively. Accordingly, we have Property 3.1.

Property 3.1. If pa(a(i)) = ∅, node a(i) is assumedto be the root node. If ch(a(i)) = ∅, then a(i) is a leafnode, which represents an automation. Otherwise, a(i)

is a category node when ch(a(i)) 6= ∅.

Since the goal is to recommend an automation forticket resolving and only a leaf node of H representsan automation, the recommendation process cannot becompleted until a leaf node is selected at each timet. Therefore, the multi-armed bandit problem for ITautomation recommendation is reduced to select a pathofH from root to a leaf node, where multiple arms alongthe path are sequentially selected with respect to thecontextual vector xt at time t.

Let pth(a(i)) be a set of nodes, consisting of allthe nodes along the path from root node to a(i) in H.Further, assume πH(xt|t) to be the path selected bypolicy π in light of the contextual information xt attime t. Hence, we can have Property 3.2 for every armselection policy π.

Property 3.2. Given the contextual information xt attime t, if a policy π selects a node a(i) in the hierarchy Hand receives positive feedback (i.e., success), the policyπ receives positive feedback as well by selecting the nodesin pth(a(i)).

Let rπH(xt|t) denote the reward obtained by thepolicy π after selecting the multiple arms along the pathπH(xt|t) at time t. The reward is computed as follows,

(3.6) rπH(xt|t) =∑

a(i)∈πH(xt|t),ch(a(i))6=∅

rπ(xt|ch(a(i))),

where π(xt|ch(a(i))) represents the arm selected fromthe children of a(i), given the contextual informationxt.

Therefore, after T iterations, the total reward re-ceived by the policy π is computed as below,

(3.7) RπH =

T∑t=1

rπH(xt|t).

The optimal policy π∗ with respect to H is determinedby(3.8)

π∗ = arg maxπ

E(RπH) = arg maxπ

T∑t=1

E(rπH(xt)|t).

The reward prediction for each arm is conducted byEquation 3.4, and then the optimal policy can beequivalently determined by

π∗ =

arg maxπ

T∑t=1

(∑

a(i)∈πH(xt|t),ch(a(i))6=∅

Eθπ(xt|ch(a(i)))

(xTt θπ(xt|ch(a(i)))|t))

(3.9)

Both Thompson sampling and LinUCB mentionedabove will be incorporated into our new learning modelsthat leverage the hierarchies defined by domain expert-s. In such settings, bandit algorithms can achieve fasterconvergence by exploring feature space hierarchically.

4 Solution & Algorithm

In this section, we propose the HMAB (HierarchicalMulti-Armed Bandit) algorithms for exploiting the de-pendencies among arms organized hierarchically.

As presented in Equation 3.4, the reward rk,t de-pends on random variable xt, θk and σ2

k. We assumeθk and σ2

k follow a conjugate prior distribution, NormalInverse Gamma (abbr., NIG) distribution. σ2

k is drawnfrom the Inverse Gamma (abbr., IG) distribution shownin Equation 4.10.

(4.10) p(σ2k|αk, βk) ∼ IG(αk, βk),

where αk and βk are the predefined hyper parametersfor the IG distribution. Given σ2

k, the coefficient vectorθk is generated by a Gaussian prior distribution withthe hyper parameter µθk and Σθk :

(4.11) p(θk|µθk ,Σθk , σ2k) ∼ N (µθk , σ

2kΣθk),

At each time t, a policy π will select a pathπH(xt|t) from H according to the context xt. As-suming a(p) ∈ πH(xt|t) is the leaf node (i.e., an au-tomaton), then we have pth(a(p)) = πH(xt|t). Af-ter recommending the automation a(p), a reward rp,t



is obtained. Since the reward rp,t is shared by al-l the arms along the path pth(a(p)), a set of triplesF = {(xt, a(k), rk,t)|a(k) ∈ pth(a(k)), rk,t = rp,t} areacquired. A new sequence Sπ,t is generated by incor-porating the triple set F into Sπ,t−1. The posterior dis-tribution for every a(k) ∈ pth(a(k)) needs to be updatedwith the new feedback sequence Sπ,t. The posterior dis-tribution of θk and σ2

k given Sπ,t is a NIG distributionwith the hyper parameter µθk , Σθk , αk and βk. Thesehyper parameters at time t are updated based on theirvalues at time t− 1:(4.12)

Σθk,t = (Σ−1θk,t−1+ xtx

Tt )−1

µθk,t = Σθk,t(Σ−1θk,t−1

µθk,t−1+ xtrk,t)

αk,t = αk,t−1 +1

2βk,t = βk,t−1+

1

2[r2k,t + µTθk,t−1

Σ−1θk,t−1µθk,t−1

− µTθk,tΣ−1θk,t

µθk,t ]

Algorithm 1 The algorithms for HMAB

1: procedure main(H, π, λ) . main entry, π is the policy

2: for t← 1, T do3: Initialize parameters of a(m) ∈ H to αm, βm,

Σθm = Id, µθm = 0d×1

4: Get contextual vector xt ∈ X5: for each path P of H do

6: Compute the reward of P using Equation 3.6, by

calling EVAL(xt, a(k), π) for each arm a(k) ∈ P7: end for

8: Choose the path P ∗ with maximum reward

9: Recommend the automation a(∗) (leaf node of P ∗)10: Receive reward r∗,t by pulling arm a(∗)

11: UPDATE(xt, P ∗, r∗,t, π)12: end for

13: end procedure

14:15: procedure eval(xt, a(k), π) . get a score for a(k), given xt16: if π is TS then

17: Sample σ2k,t according to Equation 4.10

18: Sample θk,t according to Equation 4.11

19: return r̂k,t = xTt θk,t20: end if21: if π is LinUCB then

22: return r̂k,t = xTt µθk,t−1+ λσk,t−1

√xTt Σ−1

θk,t−1xt

23: end if

24: end procedure25:26: procedure update(xt, P, rt, π) . update the inference. . P

is the path in H, rt is the reward.

27: for each arm a(k) ∈ P do28: Update αk,t, βk,t, Σθk,t , µθk,t using 4.12

29: end for30: end procedure

Note that the posterior distribution of θk and σ2k

at time t is considered as the prior distribution of time

t+1. On the basis of the aforementioned inference of theleaf node a(k), we propose HMAB algorithms presentedin Algorithm 1 developing different strategies includingHMAB-TS(H, α, β) and HMAB-LinUCB(H, λ).

Online inference of our hierarchical bandit problemstarts with MAIN procedure. As a ticket xt arrivesat time t, the EVAL procedure computes a score foreach arm of different levels. In each level, the armwith the maximum score is selected to be pulled. Afterreceiving reward by pulling an arm, the new feedback isused to update the HMAB algorithms by the UPDATEprocedure.

5 Experiment Setup

To demonstrate the efficiency of our proposed algo-rithms, we conduct a large scale experimental studyover a real ticket dataset from IBM Global Services.First, we outline the general implementation of the base-line algorithms for comparison. Second, we describe thedataset and evaluation method. Finally, we discuss thecomparative experimental results of the proposed andbaseline algorithms, and present a case study to demon-strate the effectiveness of HMAB algorithms.

5.1 Baseline Algorithms In the experiments, wedemonstrate the performance of our methods by com-paring the following baseline algorithms:

1. Random: a random item recommended to the tar-geted user without considering the contextual in-formation.

2. EpsGreedy(ε): a random arm with probability εselected, as well as the arm of the largest predictedreward r̂k,t with probability 1 − ε, where ε is apredefined parameter.

3. TS(α, β): Thompson sampling described in Sec-tion 3.1, randomly draws the coefficients from theposterior distribution, and selects the item with thelargest estimated payoff according to Equation 3.4.Both α and β are hyper parameters. We initial αand β with the same value.

4. LinUCB(λ): the parameter λ is used to calculate thescore, a linear combination of the expectation anddeviation of the reward. The arm with the largestscore is selected. When λ = 0, it is equivalent tothe Exploit policy.

Our methods proposed in this paper include:

1. HAMB-EpsGreedy(H, ε): a random arm with prob-ability ε is selected, and the arm of the highestestimated reward r̂k,t with probability 1 − ε withrespect to the hierarchy H, which is a predefinedparameter as well as ε.



2. HMAB-TS(H, α, β): it denotes our proposed hi-erarchical multi-armed bandit with Thompson

sampling outlined in Algorithm 1. H is the taxon-omy defined by domain experts. α and β are hyperparameters.

3. HMAB-LinUCB(H, λ): it represents our proposed al-gorithm based on LinUCB presented in Algorithm 1.Similarly, H is the hierarchy depicting the depen-dencies among arms. And the parameter λ is givenwith the same use in LinUCB.

5.2 Dataset Description Experimental tickets arecollected by IBM Tivoli Monitoring system [3]. Thisdataset covers from July 2016 to March 2017 with thesize of |D| = 116, 429. Statistically, it contains 62 au-tomations (e.g., NFS Automation, Process CPU SpikeAutomation, and Database Inactive Automation) rec-ommended by the automation engine to fix the corre-sponding problems. The execution feedback includingsuccess, failure and escalation, indicates whether theproblem has been resolved or needs to be escalated tohuman engineers. These collected feedback can be uti-lized to improve the accuracy of recommended results.Thereby, the problem of automation recommendationcan be regarded as an instance of the contextual banditproblem. As we mentioned above, an arm is an au-tomation, a pull is to recommend an automation for anincoming ticket, the context is the information vectorof ticket’s description, and the reward is the feedbackon the result of the execution of recommended automa-tion on the problem server. An automation hierarchy Hshown in Figure 4 with three layers constructed by do-main experts is introduced to present the dependenciesamong automations. Moreover, each record is stampedwith the open time of the ticket.

#All Automations=62

#Application=6 #Database=30 #Unix=17

ntp restart automation

jvm healthcheck automation

db2 database instance down

automation

tablespace automation

hostdown automation

process cpu spike

automation

Others

e.g., escalation automation

...

... ... ...

Figure 4: An automation hierarchy defined by domainexperts.

We now discuss how to construct ticket features forthe experiments. To reduce the noise of the data, thedomain experts only selected the categorical attributes(e.g., ALERT KEY, CLIENT ID, SEVERITY andOSTYPE) with high representative information of tick-ets. These categorical information of a ticket is encod-ed as a binary vector [9]. In addition, we augmented aconstant feature with value 1 for all vectors. Therefore,each ticket is represented as a binary feature vector xof dimension 1,182.

5.2.1 Evaluation Method We apply the replayermethod to evaluate our proposed algorithms on theaforementioned dataset. The replayer method is firstintroduced in [9], which provides an unbiased offlineevaluation via the historical logs. The main idea ofreplayer is to replay each user visit to the algorithmunder evaluation. If the recommended item by thetesting algorithm is identical to the one in the historicallog, this visit is considered as an impression of this itemto the user. The ratio between the number of userclicks and the number of impressions is referred to asClick-through rate (CTR). The work in [9] shows thatthe CTR estimated by the replayer method approachesthe real CTR of the deployed online system. In thisproblem, a user is a ticket, and an item is an automation.A user visit means a ticket comes into IT automationservices, and a user click indicates the ticket has beensuccessfully solved by the recommended automation.

5.2.2 Relative Success Rate Optimization forOnline Automation Recommendation In this sec-tion, we discuss the performance evaluation for eachproposed algorithm on the dataset. The averaged re-ward (i.e., the overall success rate of the correspondingautomations) is considered as the metric in the exper-iments. We define it as the total reward divided bythe total number of times a given automation has beenrecommended (i.e., 1

n

∑nt=1 rt). It is obvious that the

higher the success rate, the better the performance ofthe algorithm. To avoid the leakage of business-sensitiveinformation, the relative success rate is reported, whichis the overall success rate of an algorithm divided by theoverall success rate of random selection.

In contrast to the baseline algorithms outlined inSection 5.1, the corresponding HMABs configured withdifferent parameter settings achieve much better perfor-mance on the dataset shown in Figure 5, Figure 6 andFigure 7, respectively. To be clarified, we set the pa-rameter λ > 1 of LinUCB and HMAB-LinUCB in theexperiments deliberately to reveal the merits of HMAB-LinUCB because their performance are almost equalwith λ < 1. By observing the results, we find thatHMAB-LinUCB has the best performance comparedwith other algorithms. Through these substantial ex-periments, we conclude that the proposed algorithmsoutperformed the strong baselines with the assumptionthat arms are independent.

5.2.3 A Comparative Case Study In order tobetter illustrate the merits of the proposed algorithms,we present a case study on the recommendation for anescalated ticket in IT automation services.

As mentioned above, the recommendation for an es-



0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18Bucket Index

0.5

1.0

1.5

2.0

2.5

3.0

3.5R

ela

tive S

ucc

ess

Rate

Relative Success Rate on different bucket (bucket size = 5,000)

EpsGreedy_0.1

EpsGreedy_0.3

EpsGreedy_0.5

EpsGreedy_1.0

HMAB-EpsGreedy_0.1

HMAB-EpsGreedy_0.3

HMAB-EpsGreedy_0.5

HMAB-EpsGreedy_1.0

Figure 5: The Relative Success Rate of EpsGreedy andHMAB-EpsGreedy on the dataset is given along each timebucket with diverse parameter settings.

calated ticket can be regarded as a cold-start problemdue to the lack of the corresponding automations. Inother words, there is no historical records for resolvingthis ticket. Note that both our proposed HMABs andconventional MABs are able to deal with the cold-startproblem by exploration. To compare their performance,we calculate the distribution of the recommended au-tomations over different categories (e.g., database, unix,and application). Figure 9 presents an escalated ticket,which records a database problem. Such a problem hasbeen repeatedly reported over time in the dataset. S-ince this ticket reports a database problem, intuitivelythe automations in the database category should havea high chance of being recommended. The categorydistributions of our proposed HMABs and conventionalMABs are provided in Figure 8, as well as the base-line category distribution, which is the prior categorydistribution obtained from all the automations of thehierarchy. From Figure 8, we observe that 1) com-pared with TS, HMAB-TS explores more automationsfrom the database category; and 2) in HMAB-TS thedatabase category has the highest percentage among allth automation categories. This shows that our proposedHMABs can achieve better performance by making useof the predefined hierarchy.

To further illustrate the effectiveness of HMABs,we provide the detailed results of recommended au-tomations for the escalated tickets. As shown in Fig-ure 9, automations from the database category (e.g.,database instance down automation, db2 database in-active automation) are frequently recommended accord-ing to the context of the ticket, which clearly indicatethe issue is due to the inactive database. By checkingthe recommended results, domain experts figure out thedatabase instance down automation, one of the toprecommended automations, can successfully fix such acold-start ticket problem, which clearly demonstrate theeffectiveness of our proposed algorithms.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Bucket Index

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

Rela

tive S

ucc

ess

Rate


TS_0.01

TS_0.05

TS_0.1

TS_1.0

HMAB-TS_0.01

HMAB-TS_0.05

HMAB-TS_0.1

HMAB-TS_1.0

Figure 6: The Relative Success Rate of TS and HMAB-TSon the dataset is given along each time bucket with diverseparameter settings.

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19Bucket Index

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

Rela

tive S

ucc

ess

Rate


LinUCB_0.5

LinUCB_3.0

LinUCB_3.5

LinUCB_4.0

hierarchyUCB_0.5

hierarchyUCB_3.0

hierarchyUCB_3.5

hierarchyUCB_4.0

Figure 7: The Relative Success Rate of LinUCB and HMAB-LinUCB on the dataset is given along each time bucket withdiverse parameter settings.

Figure 8: The comparison of category distribution on therecommended automations.

ALERT_KEY ac2_dbinact_grzc_std

TICKET

SUMMARY

AUTOMATON

NAMEEscalation Handler

Database fin91dmo

status is inactive.

TICKET

RESOLUTION

The database is down. It has

been restarted, hence

closing the ticket.

RECOMMENDED

CATEGORYRECOMMENDED AUTOMATON

DATABASE

UNIX

APPLICATION

OTHERS

(%)

57.13

20.12

(1) database instance down automation; (2) db2 database

inactive automation; (3) mysql database offline automation.

(1) asm space check diskgroup dbautomation; (2) hostdown

automation; (3) certification expiration automation.

17.71

5.04

(1) ntp restart automation; (2) mq manager down automation.

(1) system load automation; (2) others.

Figure 9: The exploration by HMAB-TS of a cold-start ticketcase.



6 Conclusion

This paper is the first work to model automation rec-ommendation in IT automation services as a contextualmulti-armed bandit problem, where the arms are or-ganized in the form of a taxonomy. We first providethe mathematical formulation of this problem. Next wesuggest algorithms for the solution to the problem. Toshow the effectiveness of our proposed solutions, empir-ical experiments are conducted on a real ticket datasetcompared with conventional bandit algorithms, whichassume that the arms are independent. In a case studyof solving a cold-start problem, our proposed algorithmsshow a better performance due to usage of the hierarchy.

There are several directions for future research. Weplan to use multi-document summarization technologiesbased on the problem inference to create a scriptedresolution for a “new” ticket, and automatically createthe automation in IT automaton services. Second, we’regoing to apply deep learning technology [8] for theticket-feature representation to effectively process bothcategorical and non-categorical attributes of IT tickets.

7 Acknowledgements

The work was supported in part by the National Sci-ence Foundation under Grant Nos. IIS-1213026, CNS-1126619, and CNS-1461926.

References[1] HP OpenView : Network and Systems Manage-

ment Products. http://www8.hp.com/us/en/software/

enterprise-software.html.

[2] IBM Enterprise IT Automation Services. www.redbooks.

ibm.com/redpapers/pdfs/redp5363.pdf.

[3] IBM Tivoli : Integrated Service Management. http://ibm.

com/software/tivoli/.[4] S. Agrawal and N. Goyal. Thompson sampling for contex-

tual bandits with linear payoffs. In ICML, pages 127–135,

2013.[5] P. Auer. Using confidence bounds for exploitation-

exploration trade-offs. Journal of Machine Learning Re-search, 3(Nov):397–422, 2002.

[6] S. Chang, J. Zhou, P. Chubak, J. Hu, and T. S. Huang. Aspace alignment method for cold-start tv show recommenda-

tions. In Proceedings of the 24th International Conferenceon Artificial Intelligence, pages 3373–3379. 2015.

[7] O. Chapelle and L. Li. An empirical evaluation of thompsonsampling. In Advances in neural information processing

systems, pages 2249–2257, 2011.[8] A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet

classification with deep convolutional neural networks. In

Advances in neural information processing systems, pages

1097–1105, 2012.[9] L. Li, W. Chu, J. Langford, T. Moon, and X. Wang. An

unbiased offline evaluation of contextual bandit algorithmswith generalized linear models. JMLR, 26:19–36, 2012.

[10] L. Li, W. Chu, J. Langford, and R. E. Schapire. A

contextual-bandit approach to personalized news articlerecommendation. In WWW, pages 661–670. ACM, 2010.

[11] S. Pandey, D. Agarwal, D. Chakrabarti, and V. Josifovski.

Bandits for taxonomies: A model-based approach. In SDM,

pages 216–227. SIAM, 2007.[12] S. Pandey, D. Chakrabarti, and D. Agarwal. Multi-armed

bandit problems with dependent arms. In ICML, pages721–728. ACM, 2007.

[13] A. I. Schein, A. Popescul, L. H. Ungar, and D. M. Pennock.

Methods and metrics for cold-start recommendations. InSIGIR, pages 253–260. ACM, 2002.

[14] S. Shalev-Shwartz. Online learning and online convex op-

timization. Foundations and Trends in Machine Learning,4(2):107–194, 2012.

[15] L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li. Personalized

recommendation via parameter-free contextual bandits. InSIGIR, pages 323–332. ACM, 2015.

[16] L. Tang, T. Li, L. Shwartz, and G. Y. Grabarnik. Recom-

mending resolutions for problems identified by monitoring.In IFIP/IEEE IM, pages 134–142. IEEE, 2013.

[17] L. Tang, T. Li, L. Shwartz, F. Pinel, and G. Y. Grabarnik.An integrated framework for optimizing automatic monitor-

ing systems in large it infrastructures. In SIGKDD, pages

1249–1257. ACM, 2013.[18] Q. Wang, C. Zeng, W. Zhou, T. Li, L. Shwartz, and G. Y.

Grabarnik. Online interactive collaborative filtering using

multi-armed bandit with dependent arms. arXiv preprintarXiv:1708.03058, 2017.

[19] Q. Wang, W. Zhou, C. Zeng, T. Li, L. Shwartz, and G. Y.

Grabarnik. Constructing the knowledge base for cognitive itservice management. In Services Computing (SCC), IEEE

International Conference on, pages 410–417. 2017.

[20] Y. Yue, S. A. Hong, and C. Guestrin. Hierarchical explo-ration for accelerating contextual bandits. arXiv preprint

arXiv:1206.6454, 2012.

[21] C. Zeng and T. Li. Event pattern mining. Event Mining:Algorithms and Applications, pages 71–121, 2015.

[22] C. Zeng, T. Li, L. Shwartz, and G. Y. Grabarnik. Hierar-chical multi-label classification over ticket data using con-

textual loss. In 2014 IEEE NOMS, pages 1–8. IEEE, 2014.

[23] C. Zeng, L. Tang, T. Li, L. Shwartz, and G. Y. Grabarnik.Mining temporal lag from fluctuating events for correlation

and root cause analysis. In 10th International Conference

on Network and Service Management (CNSM), pages 19–27. IEEE, 2014.

[24] C. Zeng, L. Tang, W. Zhou, T. Li, L. Shwartz,

G. Grabarnik, et al. An integrated framework for miningtemporal logs from fluctuating events. IEEE Transactionson Services Computing, 2017.

[25] C. Zeng, Q. Wang, S. Mokhtari, and T. Li. Onlinecontext-aware recommendation with time varying multi-

armed bandit. In SIGKDD, pages 2025–2034. 2016.[26] C. Zeng, W. Zhou, T. Li, L. Shwartz, and G. Y. Grabarnik.

Knowledge guided hierarchical multi-label classificationover ticket data. IEEE Transactions on Network and Ser-vice Management, 2017.

[27] X. Zhao, W. Zhang, and J. Wang. Interactive collaborative

filtering. In CIKM, pages 1411–1420. 2013.[28] W. Zhou, W. Xue, R. Baral, Q. Wang, C. Zeng, T. Li, J. Xu,

Z. Liu, L. Shwartz, and G. Ya Grabarnik. Star: A systemfor ticket analysis and resolution. In SIGKDD, pages 2181–2190. 2017.

[29] S. Wang, J. Tang, Y. Wang, and H. Liu. Exploring

Implicit Hierarchical Structures for Recommender Systems.In IJCAI, pages 1813–1819. 2015.

Copyright c© by SIAMUnauthorized reproduction of this article is prohibited

Documents

Online IT Ticket Automation Recommendation Using ...Online IT Ticket Automation Recommendation Using Hierarchical Multi-armed Bandit Algorithms Qing Wang Tao Li y S. S. Iyengarz Larisa