A Uniﬁed Conversational Assistant Framework for Business

A Unified Conversational Assistant Frameworkfor Business Process Automation

Yara Rizk, Abhishek Bhandwalder, Scott Boag, Tathagata Chakraborti,Vatche Isahagian, Yasaman Khazaeni, Falk Pollock, Merve Unuvar

IBM Research AI75 Binney Street

Cambridge, Massachusetts [email protected], [email protected], [email protected], [email protected],

[email protected], [email protected], [email protected], [email protected]

Abstract

Business process automation is a booming multi-billion-dollar industry that promises to remove menial tasks fromworkers’ plates – through the introduction of autonomousagents – and free up their time and brain power for morecreative and engaging tasks. However, an essential compo-nent to the successful deployment of such autonomous agentsis the ability of business users to monitor their performanceand customize their execution. A simple and user-friendlyinterface with a low learning curve is necessary to increasethe adoption of such agents in banking, insurance, retail andother domains. As a result, proactive chatbots will play a cru-cial role in the business automation space. Not only can theyrespond to users’ queries and perform actions on their be-half but also initiate communication with the users to informthem of the system’s behavior. This will provide businessusers a natural language interface to interact with, monitorand control autonomous agents. In this work, we present amulti-agent orchestration framework to develop such proac-tive chatbots by discussing the types of skills that can be com-posed into agents and how to orchestrate these agents. Twouse cases on a travel preapproval business process and a loanapplication business process are adopted to qualitatively an-alyze the proposed framework based on four criteria: perfor-mance, coding overhead, scalability, and agent overlap.

IntroductionAutomation made its way into manufacturing plants in theearly 1900s, however comparatively, its integration into theservice industry has been slow. The service industry firsttook advantage of automation through customer care chat-bots which routed incoming customer calls to the appro-priate human operators. Gradually, these robotic answeringmachines improved and began to replace services providedby some human operators by walking customers throughsimple steps to resolve problems and escalating to humansonly when necessary. As enabling technologies improved,so have these customer care chatbots which can now carryout more elaborate conversations with the customers.

Beyond chatbots, little automation was incorporated inthe services industry despite the plethora of automation op-portunities. The emergence of Robotic Process Automation(RPA) led to the next breakthrough. RPA developed robotic

software agents capable of automating simple tasks within abusiness process such as form filling and data mining. RPAsreduced the cost of deploying autonomous agents but cov-ered a small portion of automatable tasks, namely simplerepetitive tasks. This led to the emergence of Business Pro-cess Automation (BPA) which supersedes RPA’s scope. Inaddition to task automation, BPA looked to automate de-cision making, data management, content digitization, andworkflow improvement. BPA relies on various artificial in-telligence technologies including optical character recogni-tion (OCR) systems and decision making models to auto-mate business process tasks. As state-of-the-art algorithmsbecome more accurate and reliable - deep learning OCR hasachieved super-human accuracy levels - automation in ser-vice industry will become more cost-effective.

Nevertheless, wide-spread deployment of autonomoussystems has still lagged behind expectations. Trustworthi-ness, coding overhead, scalability, algorithm life cycle, tech-nical expertise and user experience have been a few chal-lenges that have stood in the way of more successful in-tegration into business process solutions. In this work, weclaim that a proactive multi-agent conversational frameworkcan be that one stop shop that resolves many of these chal-lenges. However, multiple research questions at the intersec-tion of automation and business processes must be addressfirst. A conversational interface reduces the necessary tech-nical expertise to interact, monitor and customize the frame-work, but many algorithms do not have a conversational in-terface. How can we effectively integrate such algorithmsinto a conversational framework with minimal coding over-head? How can such independently developed (and poten-tially incompatible) functions coexist and cooperate withinthe same ecosystem? How could the user access all of theautomation options in the framework with minimal interfaceand protocol hoping? The goal is not to create a natural lan-guage interaction for every part of the process but a singleinteractive interface. If it is accessing different documents,running OCR on a new application, checking our databaseor setting up alerts on specific events in our system, onewould prefer to interact with a single interface which is con-veniently conversational in this scenario.

We believe an orchestrator and agent composition ap-

arX

iv:2

001.

0354

3v1

[cs

.AI]

7 J

an 2

020

proach with a specific pipeline would provide solutionsto these questions. We present a multi-agent orchestrationframework that consists of heterogeneous agents such asdialog, informational retrieval, task execution, and alertingagents and an orchestration layer that consists of three com-ponents (scorer, selector and sequencer) that would controlthe execution of the agents in the framework. As a proof ofconcept, we create two instances of this assistant for a travelpreapproval business process and a loan application businessprocess – deployed in a sandbox environment – and discusshow this framework can address these challenges.

Related WorkBusiness Process AutomationTask automation has driven the digital transformation in en-terprises with the emergence of RPA as a light-weight ap-proach for automating repetitive tasks. RPA is a paradigmthat seeks to automate the “mouse-click" in user interfacesto relieve human resources of repetitive tasks. RPA adoptsan outside-in approach that requires minimal changes tolegacy software (van der Aalst, Bichler, and Heinzl 2018);they have been applied to various sectors including account-ing (Moffitt, Rozario, and Vasarhelyi 2018), auditing (Fer-nandez and Aman 2018), human resources (Papageorgiou2018), banking (Stople et al. 2017), public administration(Houy, Hamberg, and Fettke 2019) and energy sectors (Lac-ity, Willcocks, and Craig 2015).

Multiple approaches have been adopted in the develop-ment of RPAs. (Gao et al. 2019) proposed an RPA solutionfor document flow automation in a debt collector businessprocess. They used deep learning OCR and classificationto control the path of the document through the businessprocess. User behavior was captured to train a self-learningagent capable of identifying relationships between tasks anddeploying RPAs based on the learned rules (Wróblewska etal. 2018). This approach, known as form-to-rule, relied onfirst-order logic to deduce the rules. Another branch of lit-erature focused on identifying RPA-eligible tasks automat-ically. Leopold et al. relied on natural language processingof business process descriptions (Leopold, van der Aa, andReijers 2018). They adopted supervised machine learning toclassify instances into one of three classes.

As the scope of RPA increases to include more complextasks, RPA agents incorporate other technological advance-ments such as OCR for document digitization, and data min-ing for content collection and management to successfullyautomate complex tasks. These advancements have achievedthe best performance when adopting deep learning models(Schmidhuber 2015). However, these models are computa-tionally expensive to train and have seen slow adoption de-spite their super-human accuracy due to their lack of trans-parency and explainability.

Beyond RPA, BPA also seeks for automated decisionmaking within business processes. The field of automatedplanning has been considered a prime candidate to enablemore advanced levels of automation in business processes(Marrella 2017). Planning algorithms can be used to auto-matically generate business process models, adapt the busi-

ness process’ behavior based on unforeseen circumstances,and autonomously perform conformance checking.

End-to-end BPA has yet to find success in real-world de-ployments despite the ubiquity of computing devices andthe advancements in software technology. Trust and reli-ability have been two of the main challenges preventingwide-spread adoption of automation approaches. With busi-ness stakeholders lacking the technical skills to monitor au-tonomous agents and intervene when they do not operate asexpected, they have been reluctant to adopt this technology.Providing a natural language interface to allow such users tointeract with autonomous agents may be the key to increasethe adoption of automation in service enterprises.

Conversational AgentsNatural language is the main communication modality inbusiness enterprises. From conversations to written docu-ments, forms and emails, understanding and generating nat-ural language can significantly facilitate the integration ofvarious autonomous agents into domains where users donot speak the language of computers. Conversational agentshave been developed for a wide range of applications usinga plethora of machine learning techniques that have resultedin agents with increasing degrees of sophistication. In thissection, we will focus on chatbots developed for enterpriseapplications and specifically those used in conjunction withRPA. Two approaches to chatbot development have beenmainly adopted in the literature: 1) building domain specificchatbots by relying on natural language understanding andgenerative machine learning models, and 2) training deeplearning models on data samples (Galitsky 2019). While theformer requires more overhead to develop, it has been themore widely adopted approach for enterprise chatbots dueto its predictable and controllable behavior.

In enterprises, chatbots were first adopted for customersupport which involved answering customers’ questionsabout the services, products, location etc. of businesses (Gal-itsky 2019). Beyond question answering bots, a shopping botwas developed to increase the sales of shops by answeringquestions from customers and even providing custom pricesfor products (Heo, Lee, and others 2018). A delivery bot wasalso developed for food delivery services that reduced the ef-fort required from customers to order pizza (Heo, Lee, andothers 2018). While only automating 30% of the processeswith these bots, the bot developers have recently begun in-vestigating the incorporation of RPA to increase the per-centage of automation. Matthies et al. developed a chatbotfor agile software development teams (Matthies, Dobrigkeit,and Hesse 2019) which analyzed commits in version con-trol systems to provide insights into the teams’ performance.Kalia et al. proposed an approach to generate a conversa-tional agent based on dialog trees given a business processflow (Kalia et al. 2017). Even though they provide a sys-tematic way to develop chatbot that were sound, significantoverhead and domain knowledge were necessary.

Furthermore, these approaches have focused on design-ing a single conversational agent capable of answering fre-quently asked questions and performing tasks in a processsuch as determining the price of products in the shopping

Figure 1: Multi-agent Orchestration Overview

bot. Expanding the scope of their operation would requiresignificant coding overhead by developers who understandthe existing system. Also, reusing existing agents may notbe possible due to compatibility reasons. In this work, wepropose a conversation agent framework based on the multi-agent system paradigm as a potential solution to these issues.Instead of designing a single powerful agent, multiple spe-cialized agents with narrower scopes can be independentlydesigned and then integrated into a multi-agent system to actas a single conversational agent from the user’s perspective.

MethodologyOverall FrameworkThe framework for the proactive conversational assistant forBPA consists of three main components: skills, agents, andan orchestrator, as shown in Fig. 1. Skills perform well de-fined tasks; they require a set of inputs to produce a set ofoutputs. Agents are composed of skills that fall into threemain categories: understand skills, act skills, and respondskills. An execution pattern for the skills must be provided;it can be as simple as a static sequential script or as complexas a full execution graph. The orchestrator coordinates be-tween agents and it determines which agent or agents mustexecute to successfully respond to a user. When a user en-ters an utterance, this natural language sentence is forwardedto all the agents in the assistant by the orchestrator after po-tentially preprocessing the input. The orchestrator requests apreview of each agent’s response and their confidence score.This score depicts how confident an agent is in its abilityto respond to the user. Depending on the adopted orchestra-tion pattern, the orchestrator selects one or more agents toexecute and return their responses to the user.

SkillsSkills are atomic functions, the building blocks of agents.They are categorized according to their role within an agentand their effect on the state of the world. Skills can haveone of three roles: understand, act and respond. Furthermore,skills can either be world changing or non-world changing.

Understand skills Serving as the entry point to an agent,understand skills identify the user’s intent. In the conversa-tional setting, understand skills are generally intent and en-tity recognition skills but can also be as simple as a key-phrase detection. They may take a user’s natural language

utterance and classify it into one of many predefined intentsthat the agent can handle or check if a specific phrase des-ignated to trigger the agent is there. They may also annotateany entities they recognize within the input. In a non-dialogsetting, understand skills are event triggered skills that watchfor specific events and state changes in the system. Theseevents could be a user logging in or a change in database.

Act skills Act skills process the user input to produce anoutput that can be used to generate a response. We distin-guish two main types of act skills: world changing and non-world changing. World changing skills perform actions thatmodify the state of the world or have side-effects outside theconfines of the orchestrator world. Examples of such skillsinclude credit score checking skills, email sending skills, etc.Non-world changing skills do not have any side effects. Ex-amples include email reading skills, weather checking skills,etc. Act skills in the business process world can perform var-ious tasks or activities. One task is gathering informationby querying databases or making API calls to informationsources. Decision points are a set of activities that move thebusiness process forward. Another task that moves the busi-ness process forward is submitting applications or forward-ing gathered information to the next activity.

Respond skills Respond skills create the response thatwill be returned to the user by the conversational assistant.The skills take the output of act skills and process them toform natural language responses to the user; these could befull flesh natural language generation to template responsesgiven different outputs of the other skills. Other modalitiesof responses can also be considered such as visualization ofthe output in the form of plots or images.

AgentsIndividually, the skills may not be able to perform complextasks required by the activities within a business process.Instead of increasing the scope of individual skills by imple-menting more complex functionality within them, compos-ing them into agents capable of completing these activities ismore effective. This modular approach simplifies debuggingof skills and agents and increases the reusability of skills bymaking them as domain agnostic as possible.

We define a contract for the agents in order to facilitatetheir orchestration within a single assistant. The agents ex-pect to receive an input message or utterance and the cur-rent context which represents the state of the system. Theagent is capable of producing a preview response that doesnot cause any world-changing actions to be invoked, an ex-ecute response that results from fully executing its pipelineand may cause the world to change, an updated context oncethe agent has executed, and a confidence in its response.

Multiple approaches can be adopted to compose skillsinto agents. Creating the execution pipeline can be done stat-ically where a software developer writes a script that explic-itly defines the order in which skills must be executed anddirects the flow of data by routing inputs and outputs. Ingoal-oriented non-deterministic planning based composition(Muise et al. 2019; Botea et al. 2019), software developersdeclaratively define the inputs and outputs of skills and the

end goal of the agent. Then, a planning engine composes anexecution tree using the subset of available skills that willlead to the goal. While this approach allows independent de-velopers to include their skills in a catalog, composing skillsinto agents in an effective way requires a minimum level ofcompatibility between the skills, including the adoption of aunified vocabulary for input/output variables.

Agents are composed of an understand skill, a set of actskills, and a respond skill. Depending on the functionality ofthe agent, none, one or multiple act skills may be included inthe composition. Next, we discuss four main types of agentsbased on their functionality. While many more agents canbe created, we consider the ones that would be most relevantand commonly used in BPA.

Dialog Agents Users may just need to inquire about cer-tain topics and do not require any action to be taken by theassistant. In such scenarios, a dialog agent, composed of anunderstand skill for intent and entity recognition and a re-spond skill to answer the user’s query. Examples of suchagents include chit-chat agents, FAQ or “help" agents, etc.They can be implemented in dialog tree type services. Theseagents do not change the world; their preview mode and ex-ecute mode are the same.

Information Retrieval Agents Unlike dialog agents, in-formation retrieval agents must connect to an external ser-vice (such as making an API call or querying a database) inorder to produce their answer to a user’s query as opposed tohaving the answering within the agent implementation. Asa result, such agent compositions consist of three types ofskills: understand skill that determine the intent of the userand annotate any entities that may be useful, one or moreact skills that queries the appropriate service to produce itsanswer, and a respond skill that appropriately constructs theresponse returned to the user. In general, most agents withinthis category do not contain skills that change the world.However, their preview and execute modes may vary forcomputational/latency reasons. In some situations, queryingdoes change the world; for example, retrieving the creditscore of a loan applicant is considered as world changingas it will have a negative impact on the applicant if querieda large number of times. In other cases, simply incurring acost, for example monetary or in the form or using up APIcalls that are limited, to query a database can be modeled asworld changing to minimize the cost incurred by the system.

Task Execution Agents Beyond querying for informa-tion, the business user may want the assistant to performtasks within the business process. Task execution agents arecapable of moving the business process forward by submit-ting application, making decisions at decision points andsuch. These agents will change the world and therefore mayrequire a preview and execute mode based on the type oforchestrator adopted. These agents are composed of an un-derstand skill, a set of act skills and a respond skill.

Alerting Agents Setting up alerts to receive notificationstriggered by the occurrence of specific events allows busi-ness users to monitor the business process in an asyn-chronous way. Like the other agents, these agents are also

composed of an understand skill, a set of act skills and arespond skill. However, their understand skill does not nec-essarily have to be a natural language understanding skill.However, setting up custom alerts would require an intentand entity recognition understand skill. These agents con-sist of a watcher skill on the system that may trigger thealert, a notification generation skill and a communicationchannel skill that sends the notification over the appropriatecommunication channel (e.g. email, SMS, slack...). A natu-ral language understanding skill can be added to allow theuser to customize the trigger and settings through dialog. Ifthat is the case, a Natural Language to Query (NLQ) skillwould be necessary to convert natural language trigger def-initions to structured, well-defined queries on the databasethat may trigger the alert and is watched by the daemon. Ifan NLQ skill is not available, hard-coded conversions mustbe present. However, this approach restricts the number ofalerts that can be created. NLQ provides a broader range oftriggers that can be set. It also allows the creation of domainagnostic alert agents which would be more portable and re-quire minimal coding overhead to deploy in new domains,a desirable feature in the BPA field. In this paper, we dis-tinguish between alerts and notifications. The former is theoccurrence of an event that causes the conditions of the trig-ger to be satisfied. The latter is the actual message sent tothe user when an alert happens.

OrchestrationCombining these diverse agents into a single assistant re-quires a domain agnostic orchestration layer capable of de-termining which subset of agents are best suited to respondto the user’s utterance. Various orchestration frameworkswith diverse characteristics can be adopted to achieve thisgoal. Stateless orchestrators do not keep track of the state ina centralized location. Instead the context variables (state)are passed between the orchestrator and agents. Stateful or-chestrators on the other hand, maintain the state of the sys-tem in a centralized location. The computational cost of datatransfer in the former approach increases with the numberof context variables maintained in the state. As a result, sys-tem designers tend to minimize the number of variables intheir state resulting a decrease in the amount of informationagents have access to. Stateful orchestrators, however, re-quire a central device to maintain the state which could becostly from a deployment and scalability perspective.

Apriori orchestrators decide which agents should executeby considering the user input and what they know aboutagent capabilities. Posterior orchestrators pass the inputsto the agents requesting their responses, then decide whichagents to select. In this case, preview and execute modesof agents must be available for those agents that performworld changing actions. Orchestrators also expect agents toprovide a confidence in their responses which are used as afeature in their decision making algorithm.

Unlike dialog managers, orchestrators do not requireknowledge of the dialog to select agents. Dialog managersare assumed to be part of those agents that need it. For ex-ample, dialog agents would have managers that keep track ofwhich node of a tree the conversation is in or perform dialog

Figure 2: 3S Orchestration Pipeline

disambiguation and co-reference resolution.The 3S orchestrator is an example of a posterior orches-

trator that relies on the agent’s confidence, among other met-rics, to determine which agents should respond. Its pipelineconsists of three main steps: scoring, selecting, and sequenc-ing agents (Yurochkin et al. 2019; Upadhyay et al. 2019).

Scorer When the orchestrator receives the user’s utter-ance, it forwards this input to all agents and requests a pre-view of their response. Once the agents return their previewresponses and their confidence score, the scorer performs ad-ditional computations to obtain the final scores per agent. Anidentity function scorer simply forwards the scores as is tothe selector. A scorer may normalize scores, especially sincethey could have been produced by very diverse agents thatcompute their scores differently. A more sophisticated scorercould enhance the agents’ confidences with other measuressuch as conversation stickiness, dialog depth (within a dialogtree) and agent engagement. A Bayesian model that adaptsto user feedback could also be adopted.

Selector Once the scorer computes the final scores peragent, it forwards them to the selector that must select oneor more agents that will respond to the user. It can be assimple as a Top 1 (or Top K) score selector that picks theagent (or K agents) with the highest score. More complexselection criteria could be adopted such as heuristic rules,learned selection models from labeled user data, online re-inforcement learning models trained on user feedback, or acombination of multiple approaches. Once the selector de-termines the subset of agents that must execute, the orches-trator requests their execution responses. Various machinelearning algorithms can be adopted to improve the robust-ness of the selector as the noise in the score increases. In asupervised learning domain, labels can represent agents thatare the output of classifiers and the input to the classifierwould be a feature vector consisting of the agents’ scoresand/or user utterances. In a reinforcement learning domain,actions can represent agents and the environment producesa reward when the correct agent is selected. For both ap-proaches, deep learning models can be trained if enough datais provided to further improve the selection model.

Figure 3: Travel Preapproval Process

Conversation Responding agentManager: Hello Chit-ChatTravelbot: Hi there

Manager: Retrieve the number of acceptedpapers authored by John Smith Publication Query

Travelbot: The number of accepted papersby John Smith is 7

Manager: Approve John Smith’s request Business ProcessTravelbot: John Smith’s application hasbeen approved

Task Execution

Table 1: Conversation samples

Sequencer After receiving the agents’ execution re-sponses, a sequencer determines the order in which theseresponses will be presented to the user. If a Top 1 score se-lector was adopted in the previous stage, then no sequenc-ing is necessary. The response is directly outputted to theuser through the conversational interface. However, whena subset of agents has responded, a sequencer could adoptdifferent approaches to order these responses: e.g. descend-ing score, heuristic rules (e.g. greeting agent must alwaysrespond before other agents), supervised or reinforcementlearning models trained on labeled data and user feedback.

Use Case – 1: Travel Preapproval ApplicationWe will now detail the implementation of a conversationalassistant (in Slack) for two personas (employees, and man-agers/directors) in a simplified travel preapproval applica-tion process using the framework described so far.

Travel Preapproval Business ProcessWe adopt a simplified business process revolving aroundtravel preapprovals generally used across many corpora-tions. We consider the scenario of researchers/employees re-questing travel funds to attend research conferences. Thetravel preapproval process, shown in Fig. 3, was imple-mented in a business process management software.

Employees submit a travel preapproval request to attenda conference by filling in specific information into a form,shown in Fig. 4(a). Once submitted, the application is for-warded to the employee’s manager who can approve/reject

the application (Fig. 4(b)) or request additional informa-tion/changes to the application (Fig. 4(c)). If approved bythe manager, the application is forwarded to the director whomakes the final decision on the travel request.

Agents in TravelbotTravelbot contains multiple agents that enable it to respondto a broad range of user requests. The 3S orchestrator coordi-nates between the composed agents to respond to the user orexecute any actions requested by the user. Table 1 includesa sample interaction between Travelbot and a manager.

Chit-Chat Agent A simple dialog agent, Chit-Chat is im-plemented as a question answering dialog tree that is capa-ble of answering utterances related to greetings (e.g. “hello”,“how are you”, “goodbye”, etc.). It can also introduce theassistant answering questions such as “who are you” and“how can you help me”. Additional intents and dialog treesare also included to make the assistant more human-like andimprove the user experience – e.g. “what are you doing”,“tell me a joke”, etc. This agent does not have any act skills;it does not cause any side effects when executed and hencehas identical preview and execute modes.

Publication Query Agent This information retrievalagent queries a database containing the publications of em-ployees and conference information. In the adopted use case,this agent can be used to auto-fill the information in theapplication form related to travel justification and confer-ence location and dates. It can also be used by managersand directors to obtain information necessary for makingdecisions. For example, travel requests would be approvedfor employees presenting their work at the conference asfirst authors. This agent makes this information accessible tomanagers through a conversational interface and eliminatesthe need for them to switch to a different interface. The staticcomposition of this agent consists of a Natural LanguageUnderstanding (NLU) understand skill, an act skill capableof retrieving employee information from an employee direc-tory that would be used by the publication query skill (sec-ond act skill) to retrieve publication/conference information.Finally, the respond skill generates the final response thatwill be returned to the user.

Business Process Data Query Agent Another informa-tional retrieval agent, the business process data query agentallows managers and directors to query the existing travelrequest applications. Managers and directors can ask ques-tions like “How many pending travel requests are in myqueue?” or “How many applications have been submitted byemployee X?”. Such queries factor into the decision makingprocess and allow decision makers to make more informeddecisions. This agent is composed of three skills: an NLUunderstand skill, a data query act skill, and a respond skill.

Business Process Task Execution Agent This agent al-lows users to move the business process forward by request-ing the assistant to submit an application on their behalf(e.g. “Submit an application to AAAI 2020 for me.”) orapprove/reject/send back an application (e.g. “Reject Jack’sapplication to ICML 2019.”). The agent composition is more

Figure 4: Employee Travel Preapproval User Interfaces: (a)Employee Application Form, (b) Manager/Director ReviewForm, (c) Employee/Manager/Director Revision Form

complex in this case and consists of four act skills that mayor may not be invoked depending on the recognized intent.Like the other agents, this agent also has an intent and entityrecognition understand skill and a respond skill. Its act skillsare employee directory query, paper directory skill, travelpricing skill and business process task execute act skill. Thepricing skill is used auto-fill the “requested amount” field inFig. 4(a).

Travel Estimation Agent This agent can respond to in-quiries about travel such as – “What is the cheapest flightfrom BOS to SFO leaving on 2019/12/01 and returning on2019/12/07?”. This information retrieval agent consists of anintent and entity recognition understand skill, a travel pric-ing skill which uses the Amadeus API to estimate the costof flights and hotels, and a respond skill that formulates theanswer returned to the user. Managers or directors can usethis agent to verify the “requested amount” field submittedby the employee.

Visualization Agent Beyond natural language responses,the visualization agent provides the assistant with the abilityto return visual responses. This is useful in situations whereinformation retrieval agents return data that is easier to di-gest in a plot form. Therefore, the user can ask the assistantto plot the results. This agent is composed of an intent andentity recognition understand skill to if and which plot theuser wants and what variables to use if multiple options ex-ist. A plotter skill converts the plottable data to an image ofa bar graph or doughnut chart or other visualization formatthat is then forwarded to the user using the respond skill. Dueto the stateless implementation of the orchestrator, the datais persisted for a finite number of iterations in the contextthat is passed between agents the orchestrator. If the visu-alization agent is invoked by a user utterance, it will lookfor plottable data in the context. If this data does not exist,then a default response of “There is no data to be plotted.” isreturned to the user.

Alerting using Natural Language Query TravelBot al-lows the user to create and customize alerts based on user’sdata-store. The system accepts user input and if it is deter-mined that intent is the creation of alert, the sentence is thenpassed through an NLQ module (Sen et al. 2019), (Sen etal. 2018), (Jammi et al. 2018). The NLQ module can ex-press any natural language sentence into a data query whichcan run on top of many different types of data-stores (Elas-ticSearch, SQL, etc). The query from NLQ module is thenstored and a daemon runs in the background monitoringthese queries. If there is any change in the data store (Ad-dition of new row, changes in values of an existing row) andthe query output changes, an alert is generated and sent tothe user on the communication channel. The system deter-mines the communication channel for a specific user throughconversation. Alerts could be configured to send notifica-tions via Slack, email or SMS using the Twilio API.

Use Case – 2: Loan ApplicationUsing the same multi-agent framework, we also develop aloan officer chatbot for a loan application business process.We include: 1) a data query agent capable of answeringquestions related to existing applications, 2) a business rulesagent that can provide insights into whether an applicationwill be accepted or rejected, 3) a visualization agent capableof representing data in different formats, 4) an alerts agentthat sends notifications when changes to loan applicationsoccur, and 5) a content analyzer that can process loan docu-ments and extract information from them. Sample conversa-tions with this chatbot are included in Table 2 and Fig. 5.

Figure 5: Multi-modal responses from the LoanBot.

ChallengesMultiple challenges have hindered the deployment of au-tonomous conversational agents in BPA domains. We focus

Conversation Respondingagent

Loan Officer: What is the total loan amount forborrowers with credit score more than 500? Data query

Loanbot: The sum value is 137368000.0

Loan Officer: Who are the top 3 borrowers withaverage amount more than 10000 Data query

Loanbot: These are the value: 1). average ofJ. Smith is 584917$, 2). average of V. Doe is575692$, 3). average of Y. Doe is 557615$

Loan Officer: List all borrowers with yearly in-come more than 50000 but credit score less than150

Data query

Loanbot: Total records found are 82. Here is thelink: <url>

Loan Officer: Plot the bar chart per yearly in-come Visualization

Loanbot: <image> (Fig. 5)

Loan Officer: Find the top 5 borrowers in termsof total amount of loans Data query

Loanbot: The result for your query is: <csv>(Fig. 5)

Loan Officer: Could you process an applicationrequesting a loan of 3000$? Business rules

Loanbot: What is the credit score?

Loan Officer: 400 Business rulesLoanbot: What is the annual salary (in USD)

Loan Officer: 5000 Business rulesLoanbot: In how many months will the loan bepaid back?

Loan Officer: 12 Business rulesLoanbot: High risk loan. This loan requestshould not be approved

Table 2: Sample conversation with LoanBot

on a few of these challenges and discuss how the proposedframework can overcome them.

Coding Overhead The enterprises that need chatbots of-ten do not have the technical expertise or resources to main-tain and improve the framework. Thus, it is important thatthe system requires minimal coding overhead to deployin new domains, expand the scope of the system and im-prove its performance by adding new agents during its life-time. Our framework assumes that independent developers,whether domain experts or software developers, have cre-ated the autonomous agents but require that they abide bythe adopted input-output contracts. This reduces the codingoverhead to integrate the agents. Furthermore, the 3S orches-trator is robust enough to handle the noise in the agent confi-dence which is a byproduct of independent agent developers.

Scalability Expanding the scope and capabilities of a con-versational agent requires significant developer effort andmay result in performance degradation. In the proposedframework, integration is trivial but naively adding agentsmay result in performance issues if the orchestrator incor-

rectly selects agents. Adopting an orchestration model thatlearns from previous interactions and changes in the systemwould improve the overall performance of the assistant.

Agent Overlap As the number of agents in the frame-work increases, some agent functionality and knowledgemay overlap. This may cause conflicts among agents whenattempting to respond to users and degrade overall perfor-mance. As a result, it is important to model or quantify agentoverlap and incorporate it within the orchestrator model toinsure that the orchestrator will select the best agent to com-plete the task of responding to the user. Adopting a learn-ing selector in the orchestrator pipeline can lead to betterselection models that can handle agent overlap and preventperformance degradation.

Multi-user Support Giving users the ability to composetheir personalized assistants requires the framework to sup-port multiple users. While this can be resolved by creating asession per user, it provides an opportunity to increase theavailable data from which the orchestrator can learn bet-ter execution models. Therefore, the proposed frameworkshould consider approaches to learn from multiple users’data while maintaining privacy and security of the data.

Access Control With different personas in business pro-cesses, some agents may not be accessible to specific users.For example, with the Travelbot, employees should not haveaccess to the "approve/reject" functionality. Our frameworkrequires that these restrictions are implemented inside theagent. When the user submits a request, the orchestrator for-wards this request to all agents and expects them to declineexecution if the user does not have the required access.

ConclusionIn this paper, we presented a multi-agent framework to de-velop a conversational assistant supporting multiple capa-bilities such as querying data through natural language, au-tonomously executing tasks in a business process, alertingusers of changes in the business process and visualizing datain various forms based on the user’s requests. An orches-trator capable of combining independently developed skillswith minimal overhead is the key to effectively deploy morepowerful conversational assistants. While many challengesremain such as scalability and agent overlap, preliminary re-sults motivate follow on research.

References[Botea et al. 2019] Botea, A.; Muise, C.; Agarwal, S.; Alkan,O.; Bajgar, O.; Daly, E.; Kishimoto, A.; Lastras, L.; Mari-nescu, R.; Ondrej, J.; Pedemonte, P.; and Vodolan, M. 2019.Generating dialogue agents via automated planning. arXivpreprint arXiv:1902.00771.

[Fernandez and Aman 2018] Fernandez, D., and Aman, A.2018. Impacts of robotic process automation on globalaccounting services. Asian Journal of Accounting andGovernance 9:123–132.

[Galitsky 2019] Galitsky, B. 2019. Developing EnterpriseChatbots. Springer.

[Gao et al. 2019] Gao, J.; van Zelst, S. J.; Lu, X.; and van derAalst, W. M. 2019. Automated robotic process automa-tion: A self-learning approach. In OTM ConfederatedInternational Conferences" On the Move to MeaningfulInternet Systems", 95–112. Springer.

[Heo, Lee, and others 2018] Heo, M.; Lee, K. J.; et al. 2018.Chatbot as a new business communication tool: The caseof naver talktalk. Business Communication Research andPractice 1(1):41–45.

[Houy, Hamberg, and Fettke 2019] Houy, C.; Hamberg, M.;and Fettke, P. 2019. Robotic process automation in publicadministrations. Digitalisierung von Staat und Verwaltung.

[Jammi et al. 2018] Jammi, M.; Sen, J.; Mittal, A.; Verma,S.; Pahuja, V.; Ananthanarayanan, R.; Lohia, P.; Karanam,H.; Saha, D.; and Sankaranarayanan, K. 2018. Toolingframework for instantiating natural language querying sys-tem. Proceedings of the VLDB Endowment 11(12):2014–2017.

[Kalia et al. 2017] Kalia, A. K.; Telang, P. R.; Xiao, J.; andVukovic, M. 2017. Quark: a methodology to trans-form people-driven processes to chatbot services. InInternational Conference on Service-Oriented Computing,53–61. Springer.

[Lacity, Willcocks, and Craig 2015] Lacity, M.; Willcocks,L. P.; and Craig, A. 2015. Robotic process automation:mature capabilities in the energy sector. Technical Report.

[Leopold, van der Aa, and Reijers 2018] Leopold, H.;van der Aa, H.; and Reijers, H. A. 2018. Identifyingcandidate tasks for robotic process automation in textualprocess descriptions. In Enterprise, Business-Process andInformation Systems Modeling. Springer. 67–81.

[Marrella 2017] Marrella, A. 2017. What automatedplanning can do for business process management. InInternational Conference on Business Process Management,7–19. Springer.

[Matthies, Dobrigkeit, and Hesse 2019] Matthies, C.; Do-brigkeit, F.; and Hesse, G. 2019. An additional set of(automated) eyes: chatbots for agile retrospectives. InProceedings of the 1st International Workshop on Bots inSoftware Engineering, 34–37. IEEE Press.

[Moffitt, Rozario, and Vasarhelyi 2018] Moffitt, K. C.;Rozario, A. M.; and Vasarhelyi, M. A. 2018. Roboticprocess automation for auditing. Journal of EmergingTechnologies in Accounting 15(1):1–10.

[Muise et al. 2019] Muise, C.; Chakraborti, T.; Agarwal, S.;Bajgar, O.; Chaudhary, A.; Lastras-Montano, L. A.; On-drej, J.; Vodolan, M.; and Wiecha, C. 2019. Plan-ning for goal-oriented dialogue systems. arXiv preprintarXiv:1910.08137.

[Papageorgiou 2018] Papageorgiou, D. 2018. Transformingthe hr function through robotic process automation. BenefitsQuarterly 34(2):27–30.

[Schmidhuber 2015] Schmidhuber, J. 2015. Deep learning inneural networks: An overview. Neural networks 61:85–117.

[Sen et al. 2018] Sen, J.; Mittal, A. R.; Saha, D.; andSankaranarayanan, K. 2018. Functional partitioning of on-

tologies for natural language query completion in questionanswering systems. In IJCAI, 4331–4337.

[Sen et al. 2019] Sen, J.; Ozcan, F.; Quamar, A.; Stager, G.;Mittal, A.; Jammi, M.; Lei, C.; Saha, D.; and Sankara-narayanan, K. 2019. Natural language querying of complexbusiness intelligence queries. In Proceedings of the 2019International Conference on Management of Data, 1997–2000. ACM.

[Stople et al. 2017] Stople, A.; Steinsund, H.; Iden, J.; andBygstad, B. 2017. Lightweight it and the it function: experi-ences from robotic process automation in a norwegian bank.Bibsys Open Journal Systems 25(1).

[Upadhyay et al. 2019] Upadhyay, S.; Agarwal, M.; Boun-neffouf, D.; and Khazaeni, Y. 2019. A bandit approachto posterior dialog orchestration under a budget. NeurIPSConversational AI Workshop.

[van der Aalst, Bichler, and Heinzl 2018] van der Aalst,W. M.; Bichler, M.; and Heinzl, A. 2018. Robotic processautomation.

[Wróblewska et al. 2018] Wróblewska, A.; Stanisławek, T.;Prus-Zajaczkowski, B.; and Garncarek, Ł. 2018. Roboticprocess automation of unstructured data with machine learn-ing. Annals of Computer Science and Information Systems16:9–16.

[Yurochkin et al. 2019] Yurochkin, M.; Upadhyay, S.; Boun-effouf, D.; Agarwal, M.; and Khazaeni, Y. 2019. On-line semi-supervised learning with bandit feedback. LLDWorkshop at ICLR.

Documents

A Uniﬁed Conversational Assistant Framework for Business