14
1 Turning Event Logs into Process Movies: Animating What Has Really Happened Massimiliano de Leoni, Suriadi Suriadi, Arthur H. M. ter Hofstede, Wil M. P. van der Aalst Abstract—Today’s information systems log vast amount of data which contains information about the actual execution of business processes. The analysis of this data can provide a solid starting point for business process improvement. This is the realm of process mining, an area which has provided a repertoire of many analysis techniques. Despite the impressive capabilities of existing process mining algorithms, dealing with the abundance of data recorded by contemporary systems and devices remains a challenge. Of particular importance is the capability to guide the meaningful interpretation of this “ocean” of data by process analysts. To this end, insights from the field of visual analytics can be leveraged. An approach is proposed where process states are reconstructed from event logs and visualised in succession, leading to an animated history of a process. This approach is customisable in how a process state, partially defined through a collection of activity instances, is visualised: one can select a map and specify a projection of activity instances on this map based on their properties. In this paper an implementation of the proposal is described for the open-source process-mining framework ProM along with reporting an evaluation with one of Australia’s largest insurance companies: Suncorp. Index Terms—Business Process Mining, Visual Analytics, Event-log Animation, Process Visualisation 1 I NTRODUCTION As a result of increased automation and storage capacity, more and more data is recorded by today’s software systems and devices. The McKinsey Global Institute (MGI) estimated that enterprises globally stored more than 7 exabytes of new data on disk drives in 2010, while consumers stored more than 6 exabytes of new data on devices such as PCs and notebooks [1]. The amount of data recorded in various domains has been growing exponentially, thereby following Moore’s law. While the availability of large amounts of data is an enabler for various forms of analysis, the sheer quantity and diversity of this data creates new challenges [2]. In the field of Business Process Management (BPM), so-called process-aware information systems record in- formation about the execution of business processes in event logs. Analysing such event logs has been the driver of the area of process mining (see e.g. [3]), which emerged a little over a decade ago. In this relatively short timespan, this discipline has proven to be capable of providing deep insight into process-related problems that contemporary enterprises face. Through the appli- cation of process mining, organisations can discover the processes as they are conducted in reality, check whether certain practices and regulations were really followed and gain insight into bottlenecks, resource utilisation, and other performance-related aspects of processes. Despite the fact that the field of process mining has shown itself to be a valuable addition to the BPM landscape, dealing with large collections of data still remains a challenge. In fact, although automatic tech- M. de Leoni and W. M. P. van der Aalst are with Eindhoven University of Technology. M. de Leoni is also with University of Padua. S. Suriadi and A. H. M. ter Hofstede are with Queensland University of Technology niques are certainly needed, process analysts need to be guided with regards to where to focus their attention in this “ocean of data”, which automatic techniques to choose for further analysis and how to fine-tune these techniques. To achieve this, one can leverage from the field of visual analytics, a term coined by Jim Thomas in [4], which “combines automated analysis techniques with interactive visualizations for an effective understanding, rea- soning and decision making on the basis of very large and complex data sets” [5]. A starting point of this paper is the belief that the ap- plication of techniques from the field of visual analytics can play a significant role in overcoming the challenges related to the analyses of large collections of (process) data. In [6] a map metaphor was used to aid people in the selection of activities to perform. A “map” could, e.g., be a geographical map, a timeline, or an organisational chart, and activity instances are positioned on this map according to their properties. In addition, the colour of a dot representing an activity instance is determined by its status or distance, e.g., how close the activity is to its deadline, how long it is being executed. The approach focussed on showing the current state of the information system at run-time. This approach can easily be extended to a-posteriori analysis: using the information stored in event logs, it is possible to replay the history and build the states the system went through. Hence, for each “map”, a sequence of different “photographs” can be built, showing how activities were projected on the map in each of these states. If, for each map, the constructed sequence of photographs is played in succession, one obtains a different “process movie”. These movies or animations (one per map) provide analysts and domain experts with a helicopter view of the past execution history seen from different perspectives.

BPM animation

Embed Size (px)

DESCRIPTION

BPM

Citation preview

  • 1Turning Event Logs into Process Movies:Animating What Has Really Happened

    Massimiliano de Leoni, Suriadi Suriadi, Arthur H. M. ter Hofstede, Wil M. P. van der Aalst

    AbstractTodays information systems log vast amount of data which contains information about the actual execution of businessprocesses. The analysis of this data can provide a solid starting point for business process improvement. This is the realm of processmining, an area which has provided a repertoire of many analysis techniques. Despite the impressive capabilities of existing processmining algorithms, dealing with the abundance of data recorded by contemporary systems and devices remains a challenge. Ofparticular importance is the capability to guide the meaningful interpretation of this ocean of data by process analysts. To this end,insights from the field of visual analytics can be leveraged. An approach is proposed where process states are reconstructed fromevent logs and visualised in succession, leading to an animated history of a process. This approach is customisable in how a processstate, partially defined through a collection of activity instances, is visualised: one can select a map and specify a projection of activityinstances on this map based on their properties. In this paper an implementation of the proposal is described for the open-sourceprocess-mining framework ProM along with reporting an evaluation with one of Australias largest insurance companies: Suncorp.

    Index TermsBusiness Process Mining, Visual Analytics, Event-log Animation, Process Visualisation

    F

    1 INTRODUCTIONAs a result of increased automation and storage capacity,more and more data is recorded by todays softwaresystems and devices. The McKinsey Global Institute(MGI) estimated that enterprises globally stored morethan 7 exabytes of new data on disk drives in 2010,while consumers stored more than 6 exabytes of newdata on devices such as PCs and notebooks [1]. Theamount of data recorded in various domains has beengrowing exponentially, thereby following Moores law.While the availability of large amounts of data is anenabler for various forms of analysis, the sheer quantityand diversity of this data creates new challenges [2].

    In the field of Business Process Management (BPM),so-called process-aware information systems record in-formation about the execution of business processes inevent logs. Analysing such event logs has been thedriver of the area of process mining (see e.g. [3]), whichemerged a little over a decade ago. In this relativelyshort timespan, this discipline has proven to be capableof providing deep insight into process-related problemsthat contemporary enterprises face. Through the appli-cation of process mining, organisations can discover theprocesses as they are conducted in reality, check whethercertain practices and regulations were really followedand gain insight into bottlenecks, resource utilisation,and other performance-related aspects of processes.

    Despite the fact that the field of process mining hasshown itself to be a valuable addition to the BPMlandscape, dealing with large collections of data stillremains a challenge. In fact, although automatic tech-

    M. de Leoni and W. M. P. van der Aalst are with Eindhoven Universityof Technology. M. de Leoni is also with University of Padua.

    S. Suriadi and A. H. M. ter Hofstede are with Queensland University ofTechnology

    niques are certainly needed, process analysts need to beguided with regards to where to focus their attentionin this ocean of data, which automatic techniques tochoose for further analysis and how to fine-tune thesetechniques. To achieve this, one can leverage from thefield of visual analytics, a term coined by Jim Thomasin [4], which combines automated analysis techniques withinteractive visualizations for an effective understanding, rea-soning and decision making on the basis of very large andcomplex data sets [5].

    A starting point of this paper is the belief that the ap-plication of techniques from the field of visual analyticscan play a significant role in overcoming the challengesrelated to the analyses of large collections of (process)data.

    In [6] a map metaphor was used to aid people in theselection of activities to perform. A map could, e.g.,be a geographical map, a timeline, or an organisationalchart, and activity instances are positioned on this mapaccording to their properties. In addition, the colour ofa dot representing an activity instance is determined byits status or distance, e.g., how close the activity is to itsdeadline, how long it is being executed. The approachfocussed on showing the current state of the informationsystem at run-time. This approach can easily be extendedto a-posteriori analysis: using the information stored inevent logs, it is possible to replay the history and buildthe states the system went through. Hence, for eachmap, a sequence of different photographs can bebuilt, showing how activities were projected on the mapin each of these states. If, for each map, the constructedsequence of photographs is played in succession, oneobtains a different process movie. These movies oranimations (one per map) provide analysts and domainexperts with a helicopter view of the past executionhistory seen from different perspectives.

  • 2PlayCButton

    TimelineCandTimeCSlider

    ACclockCshowingCtheCcurrentCtimestampCofCtheCmovie

    SliderCtoCadjustmovieCspeed

    TabsC-CAnimationCMovieCChooser

    AggregatedCdotC-CsizeCofCthedotCisCpositivelyCcorrelatedCwithCtheCnumberCofCactivityCinstances

    aggregated.

    DistributionCofvariablesCperCdot(mouseCover)

    ListCofCactivityCinstancesthatCareCnotCtoCbeC

    positionedConCtheCmap.

    ListCofCactivityCinstancesCthatCcannotCbeCplacedConC

    theCmap.

    ColourCLegend

    Fig. 1. A screenshot of a process movie referring to thelog data of a process enacted in an Australia insurancecompany to deal with claims. Activity instances are pro-jected onto the Australian state of the claimants.

    An example of such a movie is shown in Figure 1where the event log refers to the execution of process in-stances to handle insurance claims of an insurance com-pany in Australia. The process map is of a geographictype: each activity instance is projected onto the map asa dot in the time frame when it was being executed asrecorded in the log. The dot is positioned on the Aus-tralia state of the claimant. At any given point in time,the size of the dots on the map represents the number ofactivity instances which are projected on to that position.The instances can have different characteristics (e.g., inthe figure, relative to the claims type). Therefore, the ar-eas of dots are also sliced according to the percentage ofinstances with given characteristics, and different coloursare assigned to the slices. Completed activity instancesare no longer shown on the map. Thus, Figure 1 is thephotograph of the activity instances at a particularpoint in time as shown in the timestamp box (i.e. 23 Dec2011 14:18:05). The time slider at the bottom of the figureallows users to go back or forth to a certain point in timeto view the corresponding photograph. Alternatively, thephotographs can be played in a continuous sequence byactivating the play button.

    In [7] an initial framework was proposed that exploitsthe map metaphor to show a summary of an event log inthe form of an animation. The corresponding implemen-tation was realised as a plug-in for the ProM open-sourceprocess mining framework (www.processmining.org). Aset of experiments was conducted with some stakehold-ers from a Dutch municipality to obtain some quickinitial feedback [8]. As a result of the feedback, theframework was adjusted to cope with the problems thatwere identified. In fact, significant extensions and mod-ifications of the initial framework and of the accordantimplementation were needed. This paper reports on theresulting framework, its corresponding ProM plug-in,

    and a new validation effort, this time with subjects fromSuncorp, one of Australias largest insurance companies.This evaluation is concerned with determining whetherthe map metaphor is understood and can help to providemeaningful insights.

    The paper is organised as follows. Section 2 positionsour work with respect to the literature, highlighting thelimitations that exist in relevant state-of-the-art work.Section 3 discusses the adjusted framework. Then, Sec-tion 4 provides some details of the implementation ofthis framework realised as a plug-in of the ProM envi-ronment. Section 5 describes a case study illustrating theapproach in the context of insurance claims processingfor homes in Suncorp. Section 6 starts with reportingthe feedback from stakeholders of a Dutch municipalityfollowed by the changes that were required to cope withthe problems pointed out. Then, the section describes theevaluation of the adapted framework done in collabora-tion with Suncorp. Finally, Section 7 concludes the paperby summarising the results obtained and identifyingfuture directions of work.

    2 RELATED WORKThe approach presented in this paper builds on twoemerging disciplines: process mining [3] and visual analyt-ics [5], [4]. As a general comment, one can observe thatwhile many techniques have been developed for processmining, data mining and statistical analysis, they oftendo not, or insufficiently, take visualisation aspects intoaccount. Conversely, one can observe that the researchcommunity in the area of information visualisation (seee.g. [9] or [10]) has not focussed on process-relatedaspects.

    There exists a number of research works (e.g., [11],[12], [13]) on aspects related to visualisation in the fieldof business process management. These works employdifferent metaphors and also mash-up approaches torepresent the state of a process at run-time. The approachdescribed in [6] elaborates on these ideas and captures aprocess state as a map with a colouring scheme used forthe activities representing their status or their character-istics. This approach though is aimed at providing run-time support for activity selection and not at providingsupport for the analysis of the history of a process.

    The term visual analytics was coined in [4]. This ref-erence reviews the early work in this field. A comprehen-sive more recent reference is provided in [5]. Examples ofrecent significant research in the area of visual analyticscan be found in document analysis [14], financial analy-sis [15], [16] and geo-spatial object analysis [17]. In [18]pioneering work is reported where visual analytics isapplied to the field of data mining. Another interestingwork is [19] which concerns displaying multiple timeseries without aggregation. The above body of work usesstatic representations for capturing time dependencies,i.e. images that summarise the analysis of time-orienteddata. When larger data sets come into play, static repre-sentations show their limits as they require large screens

  • 3to represent the time axis. Conversely, dynamic represen-tations (i.e., using the physical time) show their powerwhen the focal point is to analyse the data over time. Adetailed up-to-date survey is reported in [20], where it isclear that there is no significant research work that usesdynamic representations: A frequent goal is to integratedata from multiple time stemps in a single image. Often,time abstraction is used for this purpose. Unfortunately,to visualise process-related data, temporal aspects are ofcrucial importance as they are related to concurrency andcausality of the activities being performed. Therefore,abstracting time would relegate the time dimension tosecond-class status.

    Most of the work in the visual analytics area is tailoredto a particular application or to a particular visualisa-tion. This is also confirmed by [21]: To our knowledge,there exists no visualisation framework that [. . . ] provides abroader selection of possible representations. We think thatan open framework fed with pluggable visual and analyticalcomponents for analyzing time-oriented data is useful. Sucha framework will be able to support multiple analysis tasksand data characteristics, which is a goal of Visual Analytics.In fact, the configuration of different maps allow one toplug different representations, thus supporting multipleviews.

    To our knowledge, only the approach of the so-calledFuzzy Animations [22] can be considered to be process-aware. It focuses on providing a graphical user interfacewhere the past states extracted from the event log areprojected onto a process model which is automaticallyderived from the events in the log. It is a valuable ap-proach though quite specific: it only focuses on control-flow aspects and visualisation is limited to processgraphs.

    3 THE FRAMEWORKIn this section, we define a formal framework that cap-tures the mapping of event logs to movies. Throughthe resulting movies the history of a process can be visu-alised in different contexts, thus facilitating its analysis.The framework is derived from the framework presentedin [7], where the main difference is that a richer notionof state is adopted. The notion of state as presentedhere allows one to exploit temporal information whendefining the positioning of activity instances.

    The events in an event log can be sorted chrono-logically and subsequently be replayed in that order.The occurrence of an event makes the system entera particular state. Hence, by replaying all events inthe event log in chronological order, it is possible torebuild a process history, i.e. the sequence of states thesystem went through. Each state can be representedas a configuration of activity instances on a map ofchoice. In order to define how states can be representedon a map, we have to choose an image for that mapand to define the positioning of activity instances asdots on that image. Such an annotated map can beseen as a photograph and, thus, a process history can

    be visualised as a sequence of such photographs, thattogether form a movie. To convey more information,each dot can be filled with a colour, where this colourdepends on the value of a characteristic of choice of thecorresponding activity instance.

    An activity instance is the execution of a certain activityin a certain case and it can thus be represented as apair (at, cid), with at A the activity type (A is theuniverse of activity types) and cid C the case (C is theuniverse of case identifiers). Processes also access andmodify data. Let V be the universe of variable names,then a process data variable is a pair (vn, cid) where vn Vand cid C. These variables can take on different valuesin different cases and also within the same case as timeprogresses.

    Our framework only provides visualisations for ac-tivity instances that have been created but are not yetconcluded. Such activity instances can be in a numberof states: they can be scheduled when they have beencreated but not yet been assigned to a resource, they canbe assigned when they have been assigned to a resourcebut have not yet started execution, executing when workon them has commenced, and suspended when workon them has been temporarily halted. We will use theset Z to capture the various states which an activityinstance can be in, Z = {Scheduled ,Assigned ,Executing ,Suspended ,Concluded}. An activity instance is referred toas active if it is in a state that is an element of Z exceptfor the state Concluded.

    Definition 1 (Event): Let U be the universe of val-ues that variables can take. An event e is a tuple(at, cid , t, z, P ) where: at A is an activity type; cid is a case identifier; t is the timestamp when event e occurred; z Z is the state to which the corresponding

    activity instance moves; P : V 6 U is an assignment of values to variables.

    Function P is partial since not every event has anassociated value for all process variables.

    We use the following functions to access the con-stituent elements of an event e = (at, cid , t, z, P ),activity(e) = at, case(e) = cid , timestamp(e) = t,state(e) = z and properties(e) = P . The latter functionwill be overloaded and properties(e, vn) = P (vn). More-over, given a function f , dom(f) represents the domainof f .

    Definition 2 (System State): Let T be the universe ofpossible timestamps. A system state S = (, , s) con-sists of: a function : (A C) 6 Z where (at, cid) = z

    denotes that activity instance (at, cid) is in state z Z;

    a function : (V C) 6 U where (vn, cid) = vvaluedenotes that variable (vn, cid) has value vvalue ;

    a function s : (A C) 6 T where s(at, cid) =t denotes that activity instance (at, cid) started attimestamp t.

  • 4Due to our definition of activity instance, as a pairconsisting of activity name and case identifier, it is notpossible to distinguish between different instances of thesame activity within the same case. Such instances mayarise from the occurrence of loops in a process model.

    3.1 Creation of the Sequence of StatesSimilar to existing algorithms for conformance check-ing [3], this framework is based on the principle ofreplay. Events in the log are replayed to determine, aposteriori, the sequence of states that the system hasgone through. In order to formalise this notion for ourframework, let us first define the overriding operator .Let f be a function, a function f = f (x , y) is definedby f (

    x ) = y and f (x) = f(x) for all x dom(f) \ {x}.

    The definition of can be extended to tuple sets, byiteratively applying the definition to all tuples in the set(noting that the order in which the elements are chosenis not important).

    Definition 3 (Replaying of events): Let Si = (i, i, si )be the current state during replay and e be the nextevent to replay. Replaying e causes the current state Sito change to state Si+1 = (i+1, i+1, si+1). This changeis denoted as Si

    e Si+1, wherei+1 = i ((activity(e), case(e)), state(e))i+1 = i {((v, case(e)), properties(e, v)) | v dom(properties(e))}and if state(e) 6= Executing then si+1 = siotherwise si+1 =

    si ((activity(e), case(e)), timestamp(e))

    The initial state from which replaying starts is S0 =(0, 0,

    s0 ) where dom(0) = dom(0) = dom(s0 ) = .

    Replaying is used to reconstruct the execution history.Definition 4 (Execution History): Let e1, . . . , en be the

    sequence of events in an execution log ordered by times-tamp, i.e. for every 1 i < j n, timestamp(ej) timestamp(ei). Let S0

    e1 S1 e2 . . . en Sn be thesequence of states visited when replaying the eventlog. An execution history is a sequence of pairs H = where (Si, ti) denotes that thesystem entered state Si at time ti = timestamp(ei).

    3.2 Mapping States onto MapsActivity instances are visualised as dots on a map. Bynot fixing the type of map, but allowing this choice tobe configurable, different types of relationships can beshown thus providing a deeper insight into the contextof the work that was performed. Many types of mapscan be thought of: geographical maps (e.g., the map ofa universitys campus), process schemas, organisationaldiagrams, Gantt charts, etc. Naturally, one can also makehighly specialised maps to suit a particular purpose.The positioning of an activity instance may vary acrossdifferent maps. When the use of a certain map is envis-aged, the location of activity instances at runtime on thismap should be captured through a formal expressionspecified at design time.

    Definition 5 (Position function): Let M be the set ofmaps of interest. For each available map m M , there

    exists a partial function that returns a pair of expressionsfor each activity type.

    positionm : A 6 Expr(V {T , t}) Expr(V {T , t})where Expr(X) is the domain of all expressions that usesome of the variables in X . For each activity instanceai = (at, cid) and each map m, positionm returns a pairof expressions. The evaluation of these expressions, overa state S, returns a pair of coordinates (x, y) which is theposition of ai on map m at state S.

    More specifically, variables T and t are used to incor-porate references in time: they are used to represent thestarting time of activity instances and the current time ofreplay, respectively. Let : X 6 U be a value assignmentof a subset of the variable names in X . We define evalas a function which, given an expression f Expr(X)and a value assignment , yields an integer number:

    eval[[f ]] () = c

    where c Z.Given a map m M , a state Si = (i, i, si ),

    an activity instance ai = (at, cid) dom(i), thenpositionm(at) = (f

    m,a, f

    m,a). The coordinates of ai on

    map m for state Si at a given timestamp t is:

    coordm(ai)Si

    =(eval[[f m,at]] (ai), eval[[f

    m,at]] (ai)

    )where ai(vn) = i(vn, cid), ai(T ) = si (ai) and ai(t) =t.

    For example, consider a loan request process whereeach instance corresponds to a different request. Theapplicants monthly income and the requested loanamount are stored in variables income and loan. One candefine a cartesian map c where every activity instance isassociated with a distinct dot whose x and y coordinatesare determined by the values of these variables. Assumethat the maximum values of income and loan, as seen inthe log, are 150000 and 10000 respectively. Also assumethat the maximum x and y coordinate values on to whicha dot can still be properly displayed on map c are 800and 600 respectively. To ensure that an activity instancewith maximum income and loan values can be properlydisplayed, we can define a position function such thatpositionc(at) = (income 800150000 , loan 60010000 ).

    The projection of a state Si = (i, i, si ) onto a mapm is the projection of activity instances ai dom(i)onto m at position coordm(ai)

    Si

    . As can be seen fromthe definition, the function positionm is partial and hencenot all activity instances in the state are mapped. Thismay be because it simply is not meaningful. However,there may also be activity instances in dom(i) that arenot mapped as some of the variables in the positionfunction do not have a value. Another reason for anactivity instance not to be projected onto a given map isthat its coordinates are invalid, i.e. falling outside thatmap (for example because the x or the y coordinateare negative). Given a map, activity instances that haveinvalid coordinates or none at all need to be visualiseddifferently from instances with valid coordinates. To this

  • 5end, each map is associated with two lists of activitiesinstances, one enumerating the activity instances that donot have coordinates and one enumerating those whichhave invalid ones.

    As mentioned before, the dots representing activityinstances can be filled with colour. This allows for richervisualisation as one can take the value of a variableof choice into account. Currently, different colouringschemes are proposed, which are based (i) on the stateof the activity instances, (ii) on the characteristics of thecase of the activity instance or (iii) on the age of theactivity instances.

    Based on the state of the activity instance. When thisscheme is used, an activity instance ai is coloured accord-ing to i(ai) where Si = (i, i, si ) is the current state. Inparticular, for activity instances that are in the scheduled,allocated, executing or suspended state, we fill the rel-ative dot with white, cyan, green or black, respectively.Of course, activity instances that are concluded or notscheduled are not represented on a map.

    Based on the characteristics of the case of the activityinstance. When this scheme is used, the end user choosesone of the variables vn V present in the event log.The value of the selected variable determines whichcolour is used to fill the dots of the activity instancesin the case. Let Sn = (n, n, sn) be the last state in theexecution history. The dot corresponding to an activityinstance ai = (at, cid) dom(i) is coloured according ton(vn, cid), i.e. the last value assigned to variable vn forthe case cid. In particular, the 15 most commonly occur-ring variable values are associated with the 15 non-whitecolours of the 16-colour EGA palette1. The white colouris excluded as it is used to represent all other variablevalues. The colour of dots for a particular case should notchange during an animation. Otherwise, one can easilylose track of dots. Therefore, the visualisation approachchosen is to ignore any changes to a variable and to justuse its final value. This then means that one should notchoose a variable for visualisation purposes whose valuecan change during the execution of a case. In addition,the choice of variable should also be informed by theability to use it as a meaningful classifier of cases.

    Based on the age of the activity instance. In this approach,the dots are coloured according to the age of the activityinstance, i.e. the amount of time that has elapsed sincethe instance was started. The colour white is associatedwith activity instances that just started. As time pro-gresses, the colour of instances that have not completedbecomes closer and closer to red. Let Si = (i, i, si ) bethe state at time t, then the age of an activity instanceai = (at, cid) dom(i) is computed as follows:

    age(ai) = expln(2)(tsi (ai))

    MET(at)

    where MET (at) is the average of the time that wastaken to complete instances of activity type at. For eachactivity instance ai, age(ai) is always between 0 and 1.If t = si (ai), i.e. activity instance ai was just started,

    1. http://en.wikipedia.org/wiki/Enhanced Graphics Adapter

    age(ai) = 1. Value age(ai) decreases exponentially asai ages. When t si (ai) = MET (at), age(ai) = 0.5.An established approach is used to map age(ai) to acolour: the Fire Colour Pallet [10]. The colour rangesin intensity from a bright white (age(ai) = 1) throughyellow, orange (age(ai) 0.5), brown, and then to black(as age(ai) 0).

    In order to deal with dots that may overlap on acertain map, they are represented transparently. Thecolour of areas of overlap is determined by the coloursof the individual dots involved. Unfortunately, this maylead to confusion in some cases as it may be hard tocorrectly interpret the resulting colour, but the advantageof this approach is that dots whose area is completelycovered by one or more other dots still remain visible.If the centres of dots coincide, then the dots involvedare merged to form bigger dots in order to avoid thatdots whose sizes and centre positions are identical canno longer be visually distinguished from each other. Thediameter of such dots grows according to the number nof activity instances involved. Li et al. [23] conductedan analysis with a number of subjects where they foundthat quantities represented as circles are most intuitivelyperceived when the circle grows as a power of 0.4.Applying this observation to our case, i.e. when joiningn dots of different activity instances, the diameter ofthe resulting dot is computed as n0.4. The amalgamateddots are also divided in as many slices as there areconstituting dots, and each slice is filled with the colourof the dot which it corresponds to.

    Another important feature is concerned with handlingactivities instances when they are going to disappear.If a dot suddenly disappears between two consecutivephotographs, end users would not notice that the corre-sponding activity instance is completed, especially whenmany dots are visualised at the same time on the maps.Therefore, we have introduced a fading effect: if a dot isgoing to disappear in x photographs, it starts fading out(i.e., becomes transparent). Value x can be customisedby an end user on the fly while playing the movie. Asthe number of photographs in which a dot is going todisappear becomes smaller, the fading effect becomesmore pronounced, till the dot completely vanishes.

    4 IMPLEMENTATION OF THE FRAMEWORKFigure 2 shows the architecture of the implemen-tation. The yellow component is implemented as astand-alone Java application whereas the red compo-nents are implemented as plug-ins of ProM, an open-source pluggable framework for the implementationof process mining tools in a standardised environment(http://www.promtools.org). Plug-ins require a numberof input objects and produce one or more output objects.These input objects could, for example, be event logs oroutput objects of other plug-ins. In this way, one candefine a chain of plug-ins invocations.

    The core software is the Log-On-Map Replayer plug-inwhich takes a map-specification file and an event log

  • 6Event Log

    Map

    DesignerMap Specification

    Automatic

    Map

    Generator

    Log On

    Map

    Replayer

    Movies

    Process

    ModelOptional

    Fig. 2. The architecture of the implementation.

    as input. Each map-specification file consists of a set ofavailable maps (i.e. the map name and the URL wherethe map image can be retrieved) with correspondingposition functions (Definition 5), one definition for eachmap. The plug-in employs the framework defined in Sec-tion 3 and generates for each available map a sequenceof photographs. Playing such a sequence of photographsin succession yields a movie. As previously mentioned,each photograph captures the state of the process as itexisted at a certain point in time. The graphical userinterface provides controls to select one of the availablemovies and put it in focus, play/stop that movie, or goto a specific moment in time in that movie.

    Map specifications can be drawn through a Java stand-alone application, the Map Designer. It allows processanalysts to load images (e.g., PNG or JPG) to use asmaps and to define how to project activity instancesonto those maps. To do so, analysts can simply dragand drop an activity type onto the map and place itat the position of interest. This way, the activity typesposition is statically defined and applies to all instancesof that type. Alternatively, analysts can define a positionas dynamic. The position of an activity instance is thendefined in terms of the state of the process instanceinvolved. As far as the implementation is concerned, thestate is encoded as an XML document and the positionfunction is defined as an XQuery over this document.Section 5 illustrates the interface of the Map Designerapplication and the Log On Map Replayer plug-in throughthe case study.

    There exists a number of potential maps that canbe applied to a wide range of scenarios. For example,a process model can serve as a map where activityinstances are projected onto the icons representing thecorresponding activity types. Hence, as long as thepositions of the activity icons are known, this type ofmap can be used in different scenarios and the cor-responding positioning function can be automaticallygenerated. This concept of facilitating the generationof maps with little effort required by users has led tothe implementation of a second plug-in of ProM, theAutomatic-Map-Generator. It takes an event log as inputand optionally a process model and produces a map, in-tended as the background image, and the correspondingposition function. Currently, three types of maps can be

    generated automatically: Cartesian, Deadline/Timeline,Process-Model. When generating a map of these types,users need to define the size of the image throughspecifying the width w and the height h.

    Cartesian Map. This type of map takes inspiration fromthe Cartesian coordinate system. The projection of activ-ity instances is determined by choosing two numericalvariables, vx and vy , to derive the values of x and ycoordinates from. Let xmax (xmin) and ymax (ymin) be themaximum (minimum) values for vx and vy as present inthe event log. The position function for a Cartesian mapc for instances of an activity type at A is defined aspositionc(at) = (x

    at, y

    at) where

    xat =vxxminxmaxxmin w and yat =

    vyyminymaxymin h.

    For example, suppose that the end user chooses vari-ables payout and amount as vx and vy . Based on thesevariables, the plug-in determines the corresponding min-imum and maximum values as seen in the log, e.g.xmin = 0, xmax = 100, ymin = 1000, ymax = 10000.For map of size, for example, 800 600 pixels (i.e.,w = 800 and h = 600), the plug-in thus specifiesxat =

    ((payout0)/(1000)) 800 and yat = ((amount

    100)/(10000 1000)) 600.Deadline/Timeline Map. Activity instances are posi-

    tioned along the x-axis according to the time that theyhave been active. When an activity instance has justbecome active, its x-coordinate is equal to w. Its y-coordinate is obtained by choosing a numerical variablevy , extracting its current value, and using the same com-putation as for the Cartesian map. As time progresses,the x-coordinate changes (becomes less and less), butthe y coordinate remains constant. The position functionfor a time map l for instances of an activity type ai =(at, cid) is defined as positionl(at) = (xat, yat) where

    xat =(1 tTdat

    ) w and yat = vyyminymaxymin hwhere dat is a constant which defines the maximum validduration for instances of an activity type at. If a certainactivity instance is active for more than dat, xat willbecome negative and is then enumerated in the list ofactivity instances with invalid coordinates.

    Process-Model Map. As mentioned before, for a process-model map activity instances are projected onto thecorresponding icon in the model. In order to enable theautomatic generation of maps of this type, end usersneed to provide a process model which also encodesthe actual coordinates of the icons that represent themodels activities. Currently, we only support the Petrinet formalism, though we believe that it is relativelyeasy to extend the implementation to support otherprocess modelling formalisms. Petri Nets are stored infiles in PNML standard format which also encodes thecoordinates of positions of all activities (i.e. Petri nettransitions) of the model.

  • 75 A CASE STUDY WITH AN AUSTRALIAN IN-SURANCE COMPANYWe applied the visualisation framework in a case studythat we conducted with one of the largest insurancecompanies in Australia, namely Suncorp, during thesecond-half of 2012. Through regular meetings (almostweekly) with the stakeholders from Suncorp, we iden-tified the need to communicate the current landscapeof Suncorps claims processing performance to highermanagers within the company. To this end, we believedthat the visualisation framework proposed in this papercould be used to generate a number of movies summaris-ing Suncorps claims processing trends and performance.

    Suncorp provided us with data related to the process-ing of claims that were finalised within a 6-month period(regardless of the starting time of the claims). The dataconsists of over one million events for 34 activity types,which together describe the processing of over 32,000claims from multiple departments within Suncorp.

    For the purpose of evaluating the visualisation frame-work and the usefulness of the resulting movies, a subsetof the data (from one department only) was used. Thissubset of data was selected because it contains richattribute information, including: loss type (i.e. the causeof a loss that triggered an insurance claim, such as fire,theft, or burglary), payout amount (i.e. the amount ofmoney paid to a customer as a result of an awardedinsurance claim), team (i.e. the team within Suncorpwhich processed the claims), and many others. As willbe detailed later, the richness of attribute information inthis subset of the log allowed us to produce interestingmaps and, consequently, movies.

    5.1 Overview of the Maps (and Movies) Created forthe Case StudyUsing this subset of data, four movies were pro-duced using four different maps. A thorough expla-nation of the interpretation of these maps and thevarious interactive configuration options available tousers (while a movie is being played) is providedin the remainder of this section. Some screencasts ofthe tool showing these four movies are available athttp://www.processmining.org/online/logonmaps.

    5.1.1 Australia MapThe first movie was produced using a geographical mapof Australia. In this movie, a dot is projected onto themap at the position that corresponds to the state orterritory where the claim was lodged. Thus, the goal ofthis map is to display the distribution of insurance claimsacross all Australian states and territories at any givenpoint in time, and how the distribution evolves over aperiod of time. Figure 1 shows a snapshot of the movie.

    The bottom part of Figure 1 shows the widgets tocontrol the playback of the movie. From left to right, thisarea contains the play button to start/stop the movie, thetimeline and time slider box which shows the relativeprogression of the movie and the slider that can be

    dragged by users to reach a particular snapshot, theclock showing the current timestamp of the movie beingplayed as obtained from the timestamp information inthe event log, and the slider to adjust the playback speedon the fly. As also stated in [24], it is important to playthe movie at the right speed. The optimal speed maybe hard to predict: animations played too slowly maybecome boring, whereas the opposite can cause relevantinformation to be missed. The timeline box also shows awave in which the x-axis represents time and the y-axisthe number of active activity instances. The purpose ofthe two buttons above the timeline box (labelled as Pos.Trend and Colour Trend) will be explained later in thissection.

    At the top of Figure 1, we see the Maps panel whichis made up of a number of tabs, each associated with adifferent movie. By selecting a tab, the correspondingmovie is brought to the front to show the state ofthe process at a specific point in time, in terms of theactivity instances that are active, their state, and theirpositions. While playing the animation, users can changethe animation they are watching in case they wish toconsider a different type of movie. In Figure 1, twoanimation movies were executed simultaneously: thefirst one is the Australian map animation (called theLoss Cause/the State plot on the tab) and the other oneis the Incurred Amount/Claim Duration movie (notshown in Figure 1, but users can switch between thetwo movies at anytime).

    There are two boxes on the panel on the left-hand sideof the screenshot. The top box enumerates the activityinstances for which there are no associated positions(i.e. they do not belong to the domain of the corre-sponding position function). The bottom box enumeratesthose activity instances whose positions are invalid (i.e.the corresponding position function returns coordinateswhich either fall outside the map boundary or could notbe evaluated). In the screenshot shown in Figure 1, therewas no activity instance that was projected as invalidat that particular point in time.

    The centre of Figure 1 shows a snapshot of the moviebeing played. The configuration file used to produce thismovie projects activity instances onto the correspondingAustralian state/territory in which the claim was lodged.The size of the dots is positively correlated with thenumber of events that are simultaneously present onthe map at exactly the same position. As mentionedpreviously, dots projected at the same coordinates aremerged to form bigger dots; moreover, the colouringscheme of the dots is configurable. In Figure 1, thecolour of the dot was configured so that each dot iscoloured according to the loss cause of the insuranceclaims. The association of colours with values of losscause is explained in the legend shown on the right-hand side panel. Many dots are projected at the samecoordinates and, hence, they are merged in clusters. Asmentioned previously, each bigger dot is divided in asmany slices as the number of different colours associatedwith merged dots. The size of the slice is determined

  • 8proportionally with the percentage of dots of a givencolour: the number of activity instances with a particularcolour (i.e. a particular loss cause) in a dot over thetotal number of activity instances represented by the dot.These percentages can be viewed by rolling the mouseover the dots (as shown in Figure 1).

    On the right-hand side of Figure 1, there are two tabs.The top one is labelled as Parameters and the bottomone is labelled as Legend (the Legend tab is not shownin Figure 1 because the tab was being selected in thescreenshot, causing the Legend tab itself to disappear).As explained above, the purpose of the Legend tab isto describe the meaning of each colour in the dot. TheParameters tab, on the other hand, is used to configurethe colouring scheme and the manner in which the dotsare to be displayed in the animation movie.

    Figure 3 (top part) shows the expanded parameter tab.The first check-box allows a user to merge multiple dotsinto one dot even if the dots are partially overlapping.If this option is ticked, dots whose area partly over-lap are merged. Otherwise, if it is unticked, dots aremerged only if they are exactly positioned at the samecoordinates. The second check-box allows the fading outof the dots when the corresponding activity instancesare soon going to be no more active. The number offrames required for the dots to begin to fade out canbe customised using the slider on the panel. When dotsare merged, by default, the bigger dots are annotatedwith the number of dots that are merged. Sometimes theprocess owner does not want to disclosure this aggregateinformation for privacy and/or confidentiality reasons.For instance, this happened for the Suncorps case stud-ies. The last check-box allows users to display or removethe exact number of merged dots that is shown in themiddle of each dot: If this option is unticked, the numberwill not be displayed in the movie (the legend will alsoshow an N.A. status in place of the exact number).

    The second half of the parameter panel allows usersto customise the colouring scheme of the dots in themovie according to one of the schemes described inSection 3. The three options will respectively enablethe colouring of dots based on the state, the age, orthe values of a particular variable of the correspondingactivity instances. Furthermore, for the third option, thevariable type to be used is selected from a drop-downbox, as shown in Figure 3.

    The Pos. Trend panel, shown in Figure 3, is usedto present to viewers the evolution of the number ofactivity instances that are correctly projected onto themap (blue-lined graph) and not projected onto the map(green-lined graph) over the time span of the movie.The red-lined graph in this figure shows those activityinstances that could not be projected onto the mapcorrectly (i.e. those activity instances with invalid pro-jection); however, we do not see the red-lined graphin Figure 3 because there were no invalidly projectedactivity instances throughout the whole animation. Thethick vertical red line represents the current time of themovie being played. The Colour Trend panel serves a

    Fig. 3. Screenshots of the interactive parameter paneland the position trend graph

    similar function as the Pos. Trend panel, except that thistime it displays the evolution of the number of activityinstances per dot colour over the time span of the movie.

    Typical insights that can be gained from this mapinclude an understanding of the distribution and thecharacteristics of claims across Australian states and,more importantly, the differences in the characteristicsof the claims. For example, while playing the movie,we noticed that the state of Tasmania had a higherproportion of claims from damages of rental propertiesas compared to other states, while natural hazard seemsto be one of the most dominant causes for insuranceclaims across all states.

    Based on the description of our framework in Sec-tion 3, the LogOnMap plug-in is used to project indi-vidual activity instances onto the map. However, thereare situations when stakeholders are more interested inthe analysis of the overall distribution of cases and theircharacteristic, rather than of the single activity instances.This is precisely the situation that we encountered inour case study with Suncorp. In fact, two out of the fourmovies produced (i.e. the Australian map and the Quad-rant map) are concerned with case-level performance. Toaddress this situation, we inserted one dummy activityinstance for each case (i.e. trace) in the event log. These

  • 9Fig. 4. A screenshot of the second animation producedusing the quadrant map

    dummy activity instances start before any other activityinstances in the case are active and complete when allother activity instances are no longer active. Then, in themap configuration file, we project the dummy activityinstances onto the map to generate movies that showthe performance and trends of cases. The other activityinstances are not projected (i.e. the position function isonly defined for the dummy activity instances).

    5.1.2 Quadrant MapThe second movie was produced based on a Cartesianmap whereby the x-axis represents the amount of in-surance claim payouts, and the y-axis represents thenumber of days taken to process the claims. We call thismap a quadrant map. Note that this second movie isalso about viewing cases, thus the insertion of dummyactivity instances into the log was applied. A screenshotof the generated animation is shown in Figure 4.

    In this screenshot, dots are coloured based on the ageof the activity instances (see Section 3.2). Dots in thebottom-left quadrant represent claims with low payoutvalues and relatively quick processing times (expected),dots in the top-right quadrant represent claims withhigh payout values and relatively long processing times(expected), dots in the bottom-right quadrant representclaims with high payout values and quick processingtimes, and finally, dots in the top-left quadrant representclaims with low payout values and long processing times(under-performing claims).

    In other words, this movie allows us to gain insightsinto the performance of Suncorps claims process over aperiod of time. In our case study, this movie proved to beuseful in conveying to business analysts and managersthe performance of their claims process.

    5.1.3 Deadline MapThe third movie was produced based on a deadline map.This movie shows those activity instances which were

    Fig. 5. A screenshot of the third animation movie pro-duced using a deadline map

    completed on-time (before the deadline) and those whichwere not (see Figure 5). The position of a dot on the x-axis tells us the time remaining before the correspondingactivity instance reaches its deadline, while the positionof that dot on the y-axis reveals the type of that instance.In this figure, dots are coloured according to the teamsthat performed the corresponding activity instances.

    When an activity instance becomes active, a dot rep-resenting the activity instance appears on the map. They-axis position of the dot is determined by the typeof the activity instance, while the x-axis position isinitially determined by the amount of time the activityinstance has before the deadline is reached (note thatthe deadline for every activity instance is provided as anevent attribute in the log). Thus, the later the deadlineof an activity instance is relative to the time when theinstance becomes active, the further to the right thestarting position of the dot is.

    After an activity instance becomes active, the dotrepresenting that activity instance moves from right toleft as time progresses. The amount of time until thedeadline expires is represented through the distance tothe thick black line. Consequently, at any given point intime, those activity instances which did not complete bythe deadline are captured by the dots to the left of thethick black line.

    In our case study, this movie allowed us to identifya number of activities that often ran overtime, such asthe Follow Up Requested activity, the Conduct FileReview activity, and the Incoming Correspondenceactivity. Other activites, such as the New Claim (IPI)activity, mostly completed around the deadline.

    5.1.4 Process model mapThe fourth movie was produced based on a processmodel map (see Figure 6). The process model used inthis movie is the Fuzzy model [22] that we discoveredusing the Disco tool (http://www.fluxicon.com/disco).

  • 10

    Fig. 6. A screenshot of the fourth animation movieproduced using a process model map

    As explained in Section 4, dots are projected onto themap according to the position of the icon representingthe activity captured by the dots. In this screenshot,dots are coloured according to the age of the activityinstances.

    A typical insight that can be gained from using thistype of map is the identification of activities in a processthat can potentially be a bottleneck. For example, whileplaying the movie, the appearance of a large dot on aparticular activity icon over an extended period of timemay indicate the piling up of work items (i.e. activityinstances) related to that activity. As can be seen fromFigure 6, many activity instances were piling up fortwo activity types, namely, Follow Up Requested andConduct File Review.

    5.2 Map Designer

    As stated in Section 4, a Map Designer tool has beenimplemented. Figure 5.2 shows two examples of howwe have used the map designer to generate the config-uration files for the four movies used in the case study.

    The top part of Figure 5.2 shows how we have usedthe map designer tool to automatically generate the con-figuration file for the fourth movie (the process modelmovie). Here, we can see how a user can simply dragan activity name (from the Task List window) to thedesired position on the map. By doing so, the map de-signer tool automatically generates a map configurationfile which specifies, for each activity whose instancesneed to be projected onto the map, the static positionof those instances.

    The bottom part of Figure 5.2 shows how we used themap designer to help us generate the map configurationfor the second movie (the quadrant movie). This movierequires a dynamic positioning of dots based on thevariable values of the activity instances to be projected.Thus, to enable such a dynamic positioning of dots,

    Draglactivitiesltobelprojected

    fromlthel'TasklList'windowltolthel

    map.

    StaticlProjection

    DynamiclProjection

    ManuallylinsertXQuerylstatementsltolspecifyltheldynamicpositionloflactivities.

    Fig. 7. Screenshots of the Map Designer showing static(top) and dynamic (bottom) activity projections

    we inserted the desired XQuery statements into thecorresponding pop-up window.

    Overall, we found the map designer tool to be quiteuseful in enabling us to quickly define, and adapt, ourmap configuration files to suit the type of visualizationthat we would like to see.

    6 EVALUATION WITH END USERSThe validity, the usefulness, and the intuitiveness ofthe approach and of the resulting implementation hasbeen thoroughly assessed through engagement with endusers. Specifically, the approachs validation has beenconducted in two phases.

    A first version of the tool was released in the sec-ond half of 2012 and reported in [8]. This version wasevaluated with three subjects of a Dutch municipality:a process management specialist, a communication andmarketing specialist, and a business advisor for customercontacts. In this case, we used a real-life event logconcerning the process to handle the applications of thehouse-building permits submitted by Dutch residents. Inparticular, we defined four maps, as reported in [8].

    Through this evaluation process, we discovered anumber of usability issues and missing features in ourtool which contributed to unnecessary complication inthe interpretation of the results. A summary of theidentified issues is provided below (see [8] for details): In the first version of the tool, activity instances

    left no trace when they disappeared. The subjects

  • 11

    found this particularly confusing as moving dotscan disappear at different positions on the map.

    The subjects interviewed remarked that it was some-times unclear how long activity instances were ac-tive.

    Activity instances could not be related to character-istics of the case, e.g., the type of permit requested.

    After addressing the issues above (and other minorissues), we released a second version of this tool whichis the version discussed in this article. For instance, thefading effect was introduced to make it more evidentwhen activity instances are about to disappear (see endof Section 3). The different colour schemas discussed inSection 3.2 were introduced to relate activity instancesto case characteristics or to draw attention to the age ofactivity instances.

    After releasing the second version of the tool, we per-formed a more extensive session of experiments whereusers personally interacted with the tool. The partici-pants of this second experiment session were Suncorpemployees and the movies used in the experiment werethose four movies already explained in Section 5. The useof a different case study from another continent allowedus to assess the framework in different settings and withsubjects with a different cultural and work background.

    The evaluation was conducted using an establishedmethodology, in addition to a number of interviews witha relatively large number of subjects. Section 6.1 de-tails the evaluation methodology, the background of theparticipants, and the experiment procedure. Section 6.2reports the result of the evaluation and lessons learned,along with directions for future development.

    6.1 Methodology for the EvaluationThe evaluation of the second version of our tool was con-ducted using the Co-operative Evaluation methodology,which is a mature, fully-documented methodology in thefield of human-computer interaction [25]. This is a cost-effective technique for identifying usability problems inprototype products and processes. The technique encour-ages design teams and users to collaborate in order toidentify usability issues and their solutions.

    6.1.1 Nature and Number of ParticipantsThe participants of this experiment consisted of Sun-corps employees of various roles: five team leaders, onemanager, two business analysts, and one claims officer.They had different levels of knowledge of the insuranceclaim process.

    In terms of the participants familiarity with processmining techniques and business process managementtechnology, three out of the nine participants were awareof the existence of Business Process Management sys-tems, while the rest were not aware at all, or only hada very limited awareness, of such systems. Furthermore,only one participant had experience with process analy-sis, while the rest had none, or limited, experience withprocess analysis.

    It is worth highlighting that, because of the user-intensive nature of this method, it is difficult to runthis experiment with a large number of users. Never-theless, as documented in [25], past applications of sucha method have shown that the careful choice of experi-ment subjects, even if relatively small, can minimise theproblem of obtaining subjective results.

    6.1.2 Procedure to Conduct ExperimentsThe experiment of our visualisation tool was conductedusing the four movies generated using the LogOnMapRe-play plug-in (detailed in Section 5). The experimentwas conducted with each participant individually, oneafter another. Before the experiment started, we gavea brief introduction of the framework and of the fourmovies, after which we let the participant play withthe tool on their own. Each participant was roughlygiven 10 minutes to interact with the tool. In accordanceto the methodology, no further comments were given,thus letting the participants draw their own conclusions.Without our interference, we could thus evaluate thelevel of understandability of the map metaphor and theusefulness of the approach when extracting knowledgefrom event logs.

    While performing such tasks, participants had to ex-plain what they were doing by thinking-aloud. Duringthe experiment, notes were taken on the behavioursexhibited by the participants in order to measure thedegree of efficiency and effectiveness observed. In par-ticular, they were asked to communicate any meaningfulconclusions that they managed to draw by observing theanimations and interacting with the tools.

    At the end of the experiment, each subject was giventhe opportunity to fill out a semi-structured question-naire with questions designed to measure the subjectsimpressions and expectations with regards to the tool.To ensure anonymity, the filled-out questionnaire wasinserted into a ballot box by the participant him/herself,thus guaranteeing the anonymity of responses.

    In summary, our experiment methodology providesa valuable means to not only verify the effectivenessand efficiency of the visualisation framework, but alsoto elicit further possible improvement opportunities (seeSection 6.3). This method is, therefore, an eminentlyformative evaluation method, rather than a summativeone. It is useful for identifying those usability bugs thatcan affect the effectiveness of the system being evaluated.

    6.2 Evaluation ResultsThe results of our experiment will be detailed based onthe questionnaire results (both quantitative and quali-tative data were collected) and our observations of theparticipants responses to the tool during the experiment.The first two questions in the questionnaire (Q1 andQ2) were used to gauge participants familiarity withBPM systems and process analysis (the result of whichhas been summarised in Section 6.1.1). The rest of thequestions (Q3Q8) were used to gauge participantssatisfaction with the tool.

  • 12

    6.2.1 Questionnaire resultsThe questionnaire used consisted of both closed andopen questions. The third question (Q3) asked partici-pants the type of insights that they expected to obtainby using the visualisation tool. Participants expectationsof insights they wish to obtain by using the tool vary,although they are roughly consistent with what can beexpected from visual analytics. A baseline expectationfrom all participants is to understand the trend and vol-ume of their claims processing in terms of performance,and to understand why certain trends occurred. Someparticipants also expected the tool to guide them in iden-tifying problems with their processes and how to resolvethem. Finally, a few participants also expected the tool tohelp them identify opportunities in their processes with afocus on planning the assignment of resources.

    The remaining five questions (i.e. Q4Q8) consistedof both closed and open-ended questions. For each ofthese questions, the user selects one out of a numberof pre-determined satisfaction rating (e.g. very much,much, not so much, not at all, and I dont know).Furthermore, to gain more insights, they were allowed toprovide reasons for the satisfaction rating they selected.

    The results of the closed questions are displayed inFigure 8. With respect to Q4, most participants (8 out of9) found the tool allow them to gain expected process-related insights (expressed as answer to Q3). This is con-firmed by analysing the related comments, e.g. Having avisual representation makes it clearer [. . .] the flow on impactswhen there is a problem in one area and how it then relatesto another area., Was good to see key area for the businessto improve on [. . .], You could clearly identify bottlenecks,claims with long duration but low value etc., Give graphicalrepresentation of claims incidents and work loading. Giveinsights to claims costs compared to time of year.

    However, there was one participant who did not findthe tool to help him/her much with addressing the ques-tion he/she listed earlier. Upon reading the comment,it turned out that the participant expected a feature inour tool which was simply not built: the ability to drilldown into activity instances that appear on the moviesto learn more about them (e.g. obtaining the details ofthe customer related to a particular activity instance).

    Another recurrent comment was about the intuitive-ness of the deadline map, with dots moving towards theleft as time progresses. They would have expected themto move towards the right. This is an interesting point:in the first version of the tool, the dots moved towardsthe right. We changed this since the subjects from theDutch municipality found the movements towards rightas not very intuitive. This makes us suspect that theintuitiveness may also be subjective and depend on thecultural background.

    In terms of the maturity of the tool (Q5), it was quitesurprising to note that one third of the participants didnot provide any responses; however, by looking at the re-lated comments, it turned out that these participants didnot respond because they either did not undertand thequestion or they found that they had not spent enough

    (Q4),Did,the,tool,assist,you,in,gaining,insights,you,listed,in,Question,3?

    (Q5),Do,you,think,the,tool,is,mature?

    (Q6),Do,you,find,the,behaviour,of,the,tool,intuitive?,In,other,words,,does,the,tool,behave,in,a,way,that,you,expect?

    (Q7),Did,the,tool,run,without,interruptions,and,without,crashes,during,the,experiment?

    (Q8),Do,you,think,there,are,any,essential,features,that,are,missing?

    Fig. 8. Closed questions results

    time with the tool to properly answer Q5. Nevertheless,those who responded to this question all agreed that thetool is mature. The related comments also confirmed thisobservation, e.g. ...it captures a lot of relevant informationand provides plenty of options to navigate to specific areasyou might want to examine..., ...it has all the activities for6 months and I can go back in time if I wanted to. Obviouslyas we used it we would discover more things we may wantto see but for the moment, I think there is enough for me toplay with. At the same time, comments related to Q5also suggested a number of possible tool improvements,including the use of plain English in the descriptionand labelling of various configuration options.

    Q6 evaluates the intuitiveness of the tool. As shown inFig. 8, the responses we obtained were quite similar, with8 out of 9 participants finding the tool quite intuitive.From those who found the tool to be intuitive, wereceived comments such as It acts exactly to what youwould expect, according to the selections you choose., Visualinterpretation easy to accept., and [The] tool is easy touse.. The one participant who did not find the tool tobe intuitive commented on the need for the tool to haveclear labels, filters, and help boxes so that we can easilynavigate around it.

  • 13

    Another feature of the tool that we wanted to evaluatewas its operational stability. Therefore, Q7 asked partic-ipants to rate whether the tool ran smoothly during theexperiment. The responses to this question are almostevenly divided, with 5 participants stating the tool to berunning smoothly during the experiment and another 4participants stating the opposite. One participant com-mented that it was frustrating that he/she managed tocrash the tool, and other participants commented on thelow-quality graphics used during the experiment due tothe projector used.

    Finally, we also asked participants if there were anyfeatures in the tool that they would like to have but thatwere not currently available. Apart from one participantwho did not respond, the responses to this questionare evenly split, with 4 participants stating that therewere missing features and another 4 participants statingthe opposite. Among those who responded Yes to thisquestion, a number of features that should be addedwere suggested. These suggested features, except forthose that have already been stated in response to Q6,will be detailed in Section 6.3.

    6.2.2 Additional Observations

    As explained in Section 6.1, we took notes to try andgauge the degree of efficiency and effectiveness as ex-perienced by the involved subjects during the empiricaltests. In particular, we were interested to learn whetherthe subjects could draw interesting conclusions, whichwere either unexpected or confirmed previous intuitions.

    The notes taken during the experiment stated thatparticipants managed to derive important conclusions assuggested by the movies. For example, one participantgained insights with regards to the distribution of claimtypes across different Australian states and raised thepossibility of how insurance premiums can be adjustedaccordingly. A number of other participants gained in-sights into the absurdity of having small-value claimswhich took a very long time to complete. One participantclearly acknowledged the usefulness of the deadline mapto compare performance levels across different teams.Another participant noted a peak time period in termsof the number of claims. Overall, the insights gained byparticipants from using this tool are consistent with whatwe expected the tool to be able to provide.

    The rest of the observation notes taken during theexperiment corroborated the results of the questionnaire,thus further validating the results of our experiment.

    6.3 Evaluation Conclusion and Future Tool Improve-ment

    The analysis of the questionnaires answers and of theusers interaction with the tool let us make the followingobservations:

    1) In most cases, the tool behaved as expected.2) A number of participants highlighted that the lan-

    guage used in the tool was not written in plain

    English, leading to potential wrong interpretationof the options or features.

    3) The stability of the tool needed to be improved asit crashed rather frequently during the experiment.

    4) A recurrent request from many participants was toprovide filtering capabilities such that the moviecan be configured to display only information ofinterest, e.g. only display activity instances whichwere executed by a certain group of resources orwhich occurred within a specific time period. Notethat ProM already has sophisticated log filteringcapabilities; nevertheless, it is still worthwhile tointegrate them directly into this plug-in to allowusers to do the filtering on the fly while playingthe movie.

    5) One participant expressed his/her interest in be-ing able to drill down into the activity instancesassociated with certain colours or having certaincharacteristics, and to extract the respective details.

    6) For dots representing more than one activity in-stance, participants showed an interest in beingable to quickly learn the percentage of slices of eachrepresented colour (e.g. 30% of slices are colouredred, 10% blue, etc.). In the version evaluated, wesimply showed the number of dots that wereamalgamated, without detailing this statistics percolour. Since then, this issue has been addressed.Moreover, more detailed bug testing has been con-ducted and, as result, many bugs have been fixed.

    Despite the issues listed above, we believe that theevaluation has clearly assessed the validity of the tool.The concept and the design of the tool are desirable. Themetaphor of maps and movies is clearly understood andserves the purpose of gaining insights into past execu-tions to understand process trends and performance.

    Overall, subjects were enthusiastic about the abilityto generate movies based on event data. Several partici-pants expressed a desire to start using the plug-in in theirown analyses. In fact, the tool has helped some end usersfind interesting patterns in various situations, some ofwhich were quite surprising to them. Therefore, we con-clude that the concept of our visualization framework ispromising and that the current tool already illustrates itspotential.

    7 CONCLUSIONProcess models can be viewed as geographic maps.However, unlike real maps the quality of process modelsoften leaves much to be desired. Man-made processmodels tend to be subjective and disconnected from realprocess executions. Process mining techniques can beused to improve the quality of process maps, e.g., pro-cess discovery techniques can be used to automaticallyderive process models from event data and conformancechecking techniques can be used to pinpoint and quan-tify deviations between model and reality. However,this is not sufficient as process maps are static and donot show the flow of work. Therefore, we developed

  • 14

    an approach to visualise process histories in a genericmanner. Different maps can be used as long as activityinstances can be given coordinates on such a map,e.g., an activity instance may be mapped onto a Ganttchart, an organisation chart, a process model, etc. Byshowing a sequence of photographs of the process(i.e., a movie), one can see concept drift, complianceproblems, bottlenecks, etc.

    This paper describes an implementation of these ideasin the ProM framework. We also developed a mapdesigner that allows end users to define the maps of in-terests and to position activities on them. Many maps arewidely-applicable (e.g., a time-line or a process modelmap) and can be used in many different settings withfew or no changes. Therefore, we also developed a ProMplug-in that semi-automates the definition of certaintypes of maps, by self-generating the map picture andthe projection of activity instances. The implementationis generic and any collection of maps can be usedas long as it is possible to map instances onto mapcoordinates. Moreover, dots on the map can be colouredusing various schemas according to the properties of thecorresponding activity or process instance.

    Interested readers can try the ProM implementationwith a sample event log. The sample event log,the corresponding maps, the configuration files,and the related instructions are available fromhttp://www.processmining.org/online/logonmaps.

    The approach has initially been evaluated using a casestudy in the context a Dutch municipality. The outcometriggered a series of changes in the framework andthe reference implementation. After the adjustments, weperformed a second more extensive experiment whereparticipants actually worked with the tool. To minimizethe issue of obtaining subjective results, this experimentwas conducted based on a well-founded methodology,i.e. the Co-operative Evaluation methodology [25]. Thissecond experiment was conducted with the employeesof Suncorp (one of the largest insurance organisations inAustralia).

    The results in this paper show the value of combiningprocess mining and visual analytics. Process mining re-sults are often perceived to be rather abstract and static.Visual analytics approaches tend to be data-centric ratherthan process-centric. The combination of both fields mayyield innovative process-centric visualisations such asthe process movies proposed in this paper.

    REFERENCES[1] J. Manyika, M. Chui, B. Brown, J. Bughin, R. Dobbs, C. Roxburgh,

    and A. H. Byers, Big data: The next frontier for innovation,competition, and productivity, McKinsey Global Institute (MGI),Tech. Rep., May 2011.

    [2] D. Laney, A. Bitterer, R. Sallam, and L. Kart. (2012, December)Predicts 2013: Information innovation. Gartner.

    [3] W. M. P. van der Aalst, Process Mining: Discovery, Conformance andEnhancement of Business Processes. Berlin Heidelberg: Springer-Verlag, 2011.

    [4] J. J. Thomas and K. A. Cook, Eds., Illuminating the Path: TheResearch and Development Agenda for Visual Analytics. IEEE CSPress, 2005.

    [5] D. Keim, J. Kohlhammer, G. Ellis, and F. Mansmann, Eds., Mas-tering the Information Age: Solving Problems with Visual Analytics.VisMaster, http://www.vismaster.eu/book/, 2010.

    [6] M. de Leoni, M. Adams, W. M. P. van der Aalst, and A. H. M. terHofstede, Visual support for work assignment in process-awareinformation systems: Framework formalisation and implementa-tion, Decision Support Systems, vol. 54, no. 1, pp. 345361, 2012.

    [7] W. M. P. van der Aalst, M. de Leoni, and A. H. M. ter Hofstede,Computational Intelligence. Nova Publisher, 2012, ch. 8: ProcessMining and Visual Analytics: Breathing Life into Business ProcessModels, pp. 107138.

    [8] M. de Leoni, J. Buijs, W. M. P. van der Aalst, and A. H. M. terHofstede, Facilitating process analysis through visualising pro-cess history: Experiences with a dutch municipality, EindhovenUniversity of Technology, Tech. Rep. BPM-12-24, 2012.

    [9] C. Chen, Information Visualization: Beyond the Horizon. Springer-Verlag, New York, Inc., 2006.

    [10] R. Spence, Information Visualization: Design for Interaction, 2nd ed.Harlow, England: Pearson Education Limited, 2006.

    [11] G. Alonso and C. Hagen, Geo-Opera: Workflow Concepts forSpatial Processes, in SSD97: Proceedings of the 5th InternationalSymposium on Advances in Spatial Databases, ser. Lecture Notes inComputer Science, vol. 1262. Springer Verlag, 1997, pp. 238258.

    [12] D. Kaster, C. Bauzer-Medeiros, and H. V. da Rocha, SupportingModeling and Problem Solving from Precedent Experiences: TheRole of Workflows and Case-based Reasoning, EnvironmentalModelling and Software, vol. 20, no. 6, pp. 689704, 2005.

    [13] B. Schonhage and A. Eliens, Management Through Vision: ACase Study Towards Requirements of BizViz, in AVI 2000: In-ternation Conference of Information Visualisation. IEEE ComputerSociety, 2000, pp. 387392.

    [14] D. Oelke, C. Ming, C. Rohrdantz, D. Keim, U. Dayal, H. Lars-Erik,and H. Janetzko, Visual Opinion Analysis of Customer FeedbackData, in Proceedings of the IEEE Symposium on Visual AnalyticsScience and Technology (IEEE VAST 2009). IEEE, 2009, pp. 187194.

    [15] D. Keim, T. Nietzschmann, N. Schelwies, J. Schneidewind,T. Schreck, and H. Ziegler, A Spectral Visualization Systemfor Analyzing Financial Time Series Data, in Proceedings ofthe Eurographics/IEEE-VGTC Symposium on Visualization (EuroVis2006). Eurographics Association, 2006, pp. 195200.

    [16] H. Ziegler, T. Nietzschmann, and D. Keim, Relevance DrivenVisualization of Financial Performance Measures, in Proceedingsof the Eurographics/IEEE-VGTC Symposium on Visualization (EuroVis2007). Eurographics Association, 2007, pp. 1926.

    [17] P. Bak, F. Mansmann, H. Janetzko, and D. Keim, Spatio-temporalAnalysis of Sensor Logs using Growth Ring Maps, IEEE Transac-tions on Visualization and Computer Graphics, vol. 15, pp. 913920,2009.

    [18] D. Keim, Visual Exploration of Large Data Sets, Communicationsof the ACM, vol. 44, pp. 3844, 2001.

    [19] M. Krstajic, E. Bertini, and D. A. Keim, Cloudlines: Compactdisplay of event episodes in multiple time-series, IEEE Transac-tions on Visualization and Computer Graphics, vol. 17, no. 12, pp.24322439, December 2011.

    [20] J. Kehrer and H. Hauser, Visualization and Visual Analysisof Multifaceted Scientific Data: A Survey, IEEE Transactions onVisualization and Computer Graphics, vol. 19, no. 3, pp. 495513,2013.

    [21] W. Aigner, S. Miksch, W. Muller, H. Schumann, and C. Tominski,Visualizing time-oriented data-a systematic view, Journal onComputers and Graphics, vol. 31, no. 3, pp. 401409, 2007.

    [22] C. W. Gunther and W. M. P. van der Aalst, Fuzzy Mining: Adap-tive Process Simplification Based on Multi-perspective Metrics,in International Conference on Business Process Management (BPM2007), ser. LNCS, vol. 4714. Springer-Verlag, 2007, pp. 328343.

    [23] J. Li, J.-B. Martens, and J. J. van Wijk, A model of symbolsize discrimination in scatterplots, in Proceedings of the SIGCHIConference on Human Factors in Computing Systems (CHI 10).ACM, 2010, pp. 25532562.

    [24] J. Heer and G. Robertson, Animated transitions in statisticaldata graphics, IEEE Transactions on Visualization and ComputerGraphics, vol. 13, no. 6, pp. 12401247, November 2007.

    [25] A. Dix, J. E. Finlay, G. D. Abowd, and R. Beale, Human-ComputerInteraction, 3rd ed. Prentice Hall, 2003.