Using Workflow Data to Explore the Structure of an Organizational Routine

7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

1/25

Using workflow data to explore the structure

of an organizational routine

Brian Pentland1, Thorvald Haerem

2and Derek Hillison

1

1Michigan State University,

2Norwegian School of Management

Prepared for the 3rd International Conference on Organizational Routines:

Empirical research and conceptual foundations25-26 May 2007, Strasbourg, France

Abstract

Empirical studies of organizational routines are often limited because performances are

distributed in time and space, and therefore difficult to observe. Workflow management

systems provide an opportunity to collect data about a large number of complete performances at

relatively low cost. In this paper, we analyze data from a workflow management system for

processing invoices to demonstrate how workflow event logs can be used to analyze

organizational routines. The sample includes all invoices processed over an eleven month period

(N=2072). We use the data to compute a variety of measures which can be used to identify and

compare the structures of routines.

Acknowledgements

We thank Compello Software, Visma Services, Bo Hjort Christensen and Norwegian Research

Council for their support. We also appreciate the help from Alf Slettemoen in collecting and

preparing the data for the analyses.


2/25

Using workflow data Page 2 of 25

Introduction

Empirical study of organizational routines poses many difficulties for the researcher.

Routines are typically distributed in time, space, and throughout an organizations structure.

Short of stapling yourself to the paperwork, it is difficult to observe even a single performance

of an organizational routine from beginning to end. In addition, the natural variability in

performances makes it difficult to identify a representative pattern that encompasses the range of

possible performances. Time, money and patience often limit us to observations of a few

performances, or parts of performances, or interviews with a subset of participants.

Workflow systems provide an unprecedented opportunity to gather data about the

patterns of action generated by routines (van der Aalst et al, 2003;). With the proliferation of

computer network technology, more and more organizations have adopted workflow systems to

support their routines (Basu and Kumar, 2002). These systems typically involve a mixture of

human and automatic processingthey are like the glue that holds together other, more

common applications for accounting, email, and so on (Becker, Muehlen and Gille, 2002).

Typical workflow systems generate event logs that include time-stamped records of each event

or action that occurs in the system, making it possible to collect large numbers of performances

at very low cost. Computer scientists have made tremendous progress in analyzing event logs for

a range of purposes, such as the recovery of formal process models (van der Aalst et al, 2003;

van der Aalst and Weijters, 2004). Organizational scholars, however, have not paid as much

attention to this increasingly ubiquitous source of data about organizational processes and

routines.

In this paper, we demonstrate how workflow event logs can be used to analyze

organizational routines. We begin by discussing the challenges involved in analyzing routines


3/25


as systems that generate patterns of events that can change over time. We provide a survey of

techniques for analyzing and representing the structure of organizational routines, and illustrate

these techniques using over 2000 performances of an invoice approval routine, as captured in a

workflow management system. While workflow event logs have several important limitations, it

seems clear that they provide a valuable source of data about organizational routines.

Theory

To have an empirical science of organizational routines, we need to be able to identify

particular routines and compare them to other routines. These analytical moves identification

and comparison are fundamental to any empirical science. In disciplines where individuals or

organizations are the unit of analysis, such as psychology or economics, identification and

comparison are difficult. The additional complexity of studying patterns of behaviors by groups

of individuals that take place in the context of an organization compounds this difficulty.

Identification and comparison are particularly challenging if we conceptualize routines as

generative systems (Feldman and Pentland, 2003; Pentland and Feldman, 2005). We are able to

observe surface structure (the performative aspects of the routine), but we would like to say

something about the underlying structures which generate the performances. The grammatical

metaphor of surface vs. deep structure has limitations when applied to social systems

(Pentland, 1995), but the basic analogy is useful when trying to understand human perceptions of

complex tasks and systems (Chi, Feltovich and Glaser, 1981; Haerem and Rau, 2007): we are

trying to understand the features of a system that can generate a potentially infinite variety of

performances from a finite sample of performances.


4/25


When we say that organizational routines are generative systems, it means that there is

some underlying mechanism that generates the interdependent patterns of action that we

recognize as an organizational routine. Figure 1 illustrates this perspective.

Figure 1: Routine as generative structure

Figure 1 assumes that there is one routine generating the performances we are seeing.

But in field research, the situation is more like Figure 2: we see a collection of performances, and

we would like to know something about the underlying routine. Is there one routine, or many?

Which performances represent standard practice, and which are exceptions? More broadly, how

can we generalize about a routine from observing a sample of its performances?

Figure 2: Generalization of structure from specific performances

ABCD

ACD

CD

(etc)

?

Performances Indicate Routine?

ABCD

ACD

C

D(etc)

Ostensive patternABCD

PerformancesGeneratesRoutine


5/25


We can see this problem in two ways: measurement or induction. As a measurement

problem, we are using the performances to compute various properties of the routine, such as the

average cycle time, or the number of steps. Because each performance has a beginning and an

end, we can compute the cycle time. The variability of the performances is easily dealt with by

familiar procedures such as averaging, or other measures of central tendency. We can use these

properties to compare routines (which routine is faster/better/cheaper?)

While this familiar procedure works well for properties, our current theory of

organizational routines suggests that they are not just things with variable properties. Rather,

they are complex generative systems that produce recognizable, repetitive, patterns of

interdependent actions carried out by multiple actors. If we are interested in the pattern of

actions the actual performances of the routine -- then familiar statistical techniques like

averaging are not applicable, because the data are not scalar.

Generalizing about the patterns involves induction: generalizing about a class from

observing instances of that class. For example, one might observe several black crows, and on

the basis of these observations, one might conclude that all crows are black. If a routine

generated identical performances, the induction problem would be just that easy. Unfortunately,

we know that organizational routines tend to generate a great variety of patterns. In a typical

situation, we are attempting to generalize from a small set of observations to a large class that is

filled with variety. Furthermore, the granularity of observations can affect our impression of

how much variety is present. From a distance, every performance looks the same; up close, they

are all different.

One practical approach to this problem is to simply ignore the variety and describe

typical performances. In process mapping, for example, we interview participants to create a


6/25


flow chart (or some other kind of diagram) that summarizes their ostensive understanding of how

routine typically works. This method avoids the problem of induction because it bypasses the

primary evidence the actual performances of the routine. While flowcharts and other process

maps can be useful, this seems a rather weak basis for an empirical science of routines,

especially when we know that flow charts are often a poor representation of actual activities

(Suchman, 1997).

Properties and patterns over time

Part of what makes organizational routines different is their intrinsically event-based

nature. To study routines up close, and understand them, we must account for the role of time.

Within a single performance, time is embodied in the pattern of events. For some routines, like

the invoice routine, the events occur over a few hours or days. For other routines, like hiring or

technology roadmapping (Howard-Grenville, 2005), events may take weeks or months. Even in a

very fast routine, like a sports team driving for the goal, or landing an airplane, timing is critical.

Nevertheless, when we write about routines, these patterns are generally conceptualized as

synchronic occurring at a single moment in time.

When we consider changes in the pattern over time, of course, we need a longitudinal

(more macro-scale) time (Barley, 1990). Figure 1 shows the different ways that time can factor

into empirical studies of routines.

Figure 1: Approaches to identification and comparison

Properties Patterns

Cross-sectionalSynchroniccomparison

Identification

LongitudinalDiachroniccomparison

Evolution


7/25


Our familiar statistical tools are well suited for working with properties, such as cycle

time. Cross-sectional data takes a snap-shot of a routine during a particular time window, which

is sufficient for a synchronic description or comparison (Barley, 1990). With longitudinal data,

we can do diachronic comparisons and track how the attributes of the system change over time.

When working with patterns, the empirical operations are less familiar. Typical social

science methods work on scalar (or columnar) data like questionnaire items that can be fed

into correlation-based analyses. Network models provide a prominent example of a more

sophisticated approach, and we will illustrate a network-based approach later in this paper.

Other techniques, such as classification and clustering algorithms, generally depend on

computing a distance between patterns. In other words, we need to characterize the space of

possible patterns and locate each observed pattern within that space.

The ability to classify patterns directly would be an important aid in an empirical science

of organizational routines. For example, we theorize that routines are composed of sub-routines,

and that these sub-routines can be re-combined in various ways to create new routines. But how

can we identify particular subroutines, given the enormous variability in a typical set of

performances? The answer is not obvious, but the ability to identify and compare the structure

of routines and map their change over time would be fundamental to empirical tests of our most

influential theories about stability, change, and evolution in organizations. Nelson and Winter

(1982) hypothesize that organizations change by mutation, and by recombining routines in

various ways. Genetic models and metaphors continue to be an influential approach to

theorizing about routines (Becker, 2004), but empirical evidence from real organizations has

been difficult to obtain, and methods for analyzing changes in patterns over time have been

lacking. The use of workflow data may afford the possibility of testing evolutionary concepts. In


8/25


the following sections, we offer some modest examples of how workflow data might be used to

explore the structure of an invoice processing routine.

Methodology

For this paper, we collected workflow data from an invoice-management system

(Compello Software) for one company over an 11-month period. The data include 2072

performances of the invoice processing routine, consisting of 62756 individual actions taken

from a lexicon of 31 action types.

Figure 2 shows a flow chart for the invoice process. Invoices can enter the system on

paper (which is most common), or via an electronic portal. If the invoice conforms to a known

Form type, it is scanned, registered in the system, and distributed for approval. Some invoices

can be immediately sent to the financial system for payment; others require multiple approvals.

Once these are complete, the invoice is paid.


9/25


Figure 2: Flow chart of the invoice processing routine

Table 2 shows one performance of the invoice process routine as it appears in the

workflow event log. The first column in the table indicates that the routine consists of two main

phases: entry and approval. The first portion of this routine, referred to as Entry, is almost

entirely automated. When Person is -1, that means that a computer is taking the action

automatically, according to a rule. In the approval phase, some actions are automatic, but

others are manual. People are back in the picture. In the analysis that follows, we will

sometimes refer to the routine as a whole, but at times it will be useful and interesting to

compare the automated, initial entry section of the routine (Entry) to the more manual,

approval phase (Approval). After the last approval, the invoice data is exported to a separate

accounts payable system that is outside the scope of the workflow system. This is an example of

Financial

system

Scanning OCR OCR Form def.

Registration/Distribution

AuthorizationStatus Fully posted

Form?

Electronic invoice


10/25


a limitation to workflow event logs their scope is limited by the design and architecture of the

underlying systems.

Table 2: Example event log from workflow system1

PHASE ACTIONTYPE OFACTION DATA DATE PERSON DATA CAPTURE

Entry Enter document type Data entry CI 05-Apr-06 -1 OCR/XML

Entry Enter invoicno. Data entry 27475 05-Apr-06 -1 OCR/XML

Entry Enter invoicedate Data entry 3/31/2006 05-Apr-06 -1 OCR/XML

Entry Enter duedate Data entry 4/20/2006 05-Apr-06 -1 OCR/XML

Entry Enter period Data entry 33 05-Apr-06 -1 OCR/XML

Entry Enter vendor account Data entry some text 05-Apr-06 -1 OCR/XML

Entry Enter amount Data entry 1250 05-Apr-06 -1 OCR/XML

Entry Enter currency Data entry 2 05-Apr-06 -1 System

Entry Enter text Data entry some text 05-Apr-06 6 Manual

Entry Approve Approve 5 05-Apr-06 -1 System

Entry Enter amount Data entry 1250 05-Apr-06 6 Manual

Approval Approve Approve 5 05-Apr-06 19 Manual

Approval Enter period Data entry 33 05-Apr-06 -1 OCR/XML

Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual

Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual

Approval Enter currency Data entry 2 05-Apr-06 -1 System

Approval Enter account Data entry Acct 05-Apr-06 19 Manual

Approval Enter Tax-code Data entry 16 05-Apr-06 19 System/manual

Approval Notify Workflow 4/5/2006 05-Apr-06 -1 System

Approval Enter value dim. 1 (DP) Data entry 131 05-Apr-06 19 Manual

Approval Approve Approve 5 06-Apr-06 22 Manual

Approval Enter text Data entry some text 18-Apr-06 44 Manual

In some respects, workflow data is extremely convenient, because it is computer

generated. Events are unitized, categorized and time-stamped. This largely eliminates the need

for the arduous and time-consuming task of coding that would be required in a typical

observational or archival field study. In other respects, depending on how the workflow event

log is stored, it may require some rather sophisticated computer procedures (e.g., queries written

in SQL) to retrieve it. Many software programs databases differ from installation to installation.

Such differences introduce more complexity in making the data comparable across organizations.

1 Some columns have been removed or values changed for confidentiality.


11/25


To minimize such complications we chose a process and software which allowed a variation

within a standardized framework. But once data has been retrieved, it is more or less ready for

analysis.

Describing the data: Lexical variety

The most basic property of a routine is the list of events it generates. Events are fundamental

because we need the events to describe and recognize the patterns. By analogy to human

language, we refer to the number of different events generated by a routine as its lexical variety

orlexical size. Table 2 provides some measures of the lexical variety each phase in the routine.

These measures alone might suggest that entry and approval are basically the same they

have almost the same number of actions in their lexicon (15 vs. 17); they have a similar

average length; and a similar number of different lexical items appear in each performance.

Table 2: Lexical variety measures

Entry Approval

Average Length 11.31532 12.2525

Lexicon Size 15 17

Average Lexicon Items 10.15916 9.634741

While the numberof events in each phase is similar, the events themselves are not.

Figure 3 shows which events are found in each phases of the routine. Some of events are shared

between phases, yet other events are unique to a specific phase. Thus, one can easily

discriminate between the phases based on the presence (or absence) of particular events. In other

words, these two phases can be easily identifiedor recognized. In cases where there is more

overlap, more elaborate classification (or pattern recognition) techniques can be used to


12/25


discriminate between phases (e.g., markov and n-grams are two widely used techniques for

classifying sequence data).

Figure 3: Frequency of action types in the entry and approval phases

Frequency of Actions in Entry and Approval Phase

0 500 1000 1500 2000 2500

Enter currency

Enter period

Enter text

ApproveEnter amount

Enter duedate

Enter invoicedate

Enter invoicno.

Enter vendor account

Enter document type

Distribute to approver

Enter account

Enter Tax-code

Enter value dim. 1 (DP)

Notify

Enter value dim. 3 (SP)Enter value dim. 2 (PR)

Enter Last Comment

Remove approver

Reminder (last if more than one)

Send back to accountant

Enter value dim. 4 (EM)

Approved by backup

Forward

New scanning

Entry Phase Approval Phase

Listing the sequences

Lexical variety provides a foundation, but we are ultimately more interested in the

performances, which are temporally orderedpatterns of events. To describe how performances

of a routine vary, one could simply list all the sequences one observes. If the routine is simple

enough, with fixed performances, this might be a good way to describe what the routine does.


13/25


Table 3 shows the ten most frequent sequences for each phase of the routine. These sequences

were produced by assigning an integer code to each event type in the lexicon (taken from the

second column of table 2).

Table 3: Ten most frequent sequences for each phase

Entry Phase Approval Phase

# % of total Action Sequence (119 unique, n=1998) # % of total Action Sequence (902 unique, n=2297)

904 45.25% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7 138 6.01% 2, 14, 5, 8, 16, 6, 15, 20

419 20.97% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16 92 4.01% 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17, 2

147 7.36% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 13 51 2.22% 2, 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17

121 6.06% 9, 12, 11, 10, 14, 21, 7, 8, 2, 16, 7 44 1.92% 2, 14, 5, 5, 8, 25, 19, 2, 6, 15, 17, 16

37 1.85% 9, 12, 11, 10, 21, 7, 8, 16, 2, 7, 14 44 1.92% 2, 14, 5, 5, 8, 25, 2, 16, 6, 15, 17

36 1.80% 9, 12, 11, 10, 14, 8, 21, 7, 16, 2, 7 37 1.61% 2, 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19

30 1.50% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16, 13 33 1.44% 2, 14, 5, 5, 8, 25, 2, 6, 15, 17, 16

25 1.25% 9, 12, 11, 14, 21, 7, 8, 16, 2, 7, 10 32 1.39% 2, 2, 14, 5, 5, 8, 6, 15, 25, 17, 19, 16

24 1.20% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 13, 16 27 1.18% 2, 14, 5, 8, 16, 6, 15, 17

17 0.85% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 7 25 1.09% 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19, 2

In the invoice routine, raw enumeration is apparently not a useful way to summarize the

routine: there is too much variety. There are 119 distinct sequences in the entry phase, and 902

distinct sequences in the approval phase. For the entry phase, the ten most common sequences

account for 88% of the observed sequences. For the approval phase, the ten most common

account for just under 23%.

One implication of Table 3 is that, even for automated workflows such as the Entry

phase, there can be quite a lot of variability in the performances. For the less automated

Approval phase, variability is even higher. For this routine, collecting a small sample of

performances is not likely to result in a good description. Even for the entry phase, which is


14/25


almost entirely automated, there are over 100 distinct sequences. In the context of the invoice

processing routine, these variations may or may not be particularly meaningful. In a more

complex routine, there may be more than two phases (or subroutines), and they may be

combined and recombined in many different ways. Clearly, we need tools to identify and

compare such variations and combinations.

Structure of routines

To assess the structure of the routine, and possible changes over time, we need to analyze

the performances. We would like to be able to compute properties of the routine that are

sensitive to the patterns, so that we can perform cross-sectional and longitudinal comparisons.

Better yet, we would like to be able to compare the patterns directly to support identification of

routines and testing of evolutionary models.

Sequential Variety

To create a meaningful comparison that takes into account the pattern of events, we need

to use methods that use information about the sequences. Sequential variety (Pentland, 2003)

uses all of the data in a set of observed sequences (not just pairs of events) to create a single,

scalar measure of the variability in the sample. It provides an index of the variability in the

performances that can be used for either cross sectional or longitudinal comparisons. Sequential

variety is closely correlated to the complexity of the observed sequences, as measured by the

Lempel-Ziv complexity.2

2Algorithmic complexity is a well-established approach to detecting and measuring spatio-temporal patterns in

sequence data (Kaspar and Schuster, 1987). The Kolgomorov-Chaitin Complexity (often called the Kolgomorov

complexity) refers to the length of the shortest self-terminating program required to produce a given sequence.

While the Kolgomorov-Chaitin measure has great theoretical value, it is not readily computable (Kolgomorov,1965). To address this problem, Lempel and Ziv (1974) devised an algorithm that provides an index of the

Kolgomorov complexity. The Lempel-Ziv measure has similar properties to the Kolgomorov-Chaitin measure, and


15/25


Sequential variety is computed by comparing each sequence in a sample to every other

sequence (Pentland, 2003). The distance between sequences is computed using optimal string

matching (Sankoff and Kruskal, 1983; Abbott, 1995). This distance (also called a Levenshtein

distance) can be used for subsequent clustering and pattern classification. Sequential variety is

simply the average distance between sequences in a sample. Like the Lempel-Ziv complexity,

this method uses entire sequences (not just pairs of actions). Figure 4 shows the sequential

variety of the two phases of the invoice process over the 11 month period in our data set.

Sequential variety over time

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8 9 10 11

Month

Initial Phase Approval Phase

Figure 4: Entry phase is less variable than the approval phase

Figure 4 shows the sequential variety of each phase in the invoice routine. The numbers

in Figure 4 are computed month by month. This is, of course, an artificial way of partitioning

thus provides a practical measure of algorithmic complexity on empirical data (Kaspar and Schuster, 1987). For

example, Butts (2001) uses the Lempel-Ziv measure to explore the effects of complexity in social networks.


16/25


the data. But sequential variety depends on comparing sequences within a sample, such as a

months worth of invoices. Figure 4 demonstrates that the entry phase is consistently less

variable than the approval phase. This makes sense, because the entry phase is more highly

automated and (as shown earlier) the sequences are more similar.

Network Graphs and Measures

The simplest way to summarize a large number of patterns is to break each sequence of

action into pairs of actions. In this way, workflow data can be processed and displayed as a

valued, directed network graph, as shown in figure 5. Each node in the graph represents a type

of action, while the arcs represent the sequential relationship between actions. Formally, these

graphs are first-order Markov models, except that the nodes are events rather than states.

Pentland (1999) calls this kind of graph an action network because it indicates which actions

follow which other actions. The action network perspective allows us to use methodologies that

had previously been used for the study of social networks to ask new questions about the nature

of the routine in the organization.

Figure 5: Action network for entire routine (all 31 actions)

If we graph the relationships between actions in each phase, we can visually compare

how these graphs may be similar or different. Figure 5 shows the network of actions found in


17/25


the initial entry phase, while Figure 6 refers to the approval phase. The thickness of the arcs is

an indication of how often a particular sequence occurs, thicker lines indication more frequent

connections between those actions.

From first glance, it is easy to see differences between them. The approval phase is much

more interconnected and varied than the entry phase. This is consistent with the comparison of

sequential variety in figure 4. More numerous (denser) connections should be correlated with

higher sequential variety.

Figure 5 : Action Network for Entry Phase

2

5

7

8

9

10

11

12

13

14

1621

24

25

29


18/25


Figure 6: Action Network for Approval Phase

2

3

5

6

8

14

15

16

1718

19

20

23

25

28

29

31

Once the routine has been represented as a network, a large number of analytical tools

and techniques become available (Wasserman and Faust, 1994). In addition to the overall

density, we can compute centrality of particular events, block models and structural equivalence.

On a practical level, we can detect potential bottlenecks and most common paths.

Dynamic networks

Because we have a sequences sampled over a long period of time, we can show how the

pattern of action within the routine changes over time using a software application called SoNIA

(Moody, McFarland and Bender-deMoll, 2005). SoNIA creates dynamic network movies that

allow visualization of changes in a network over time. On paper, we cannot display the movies,

but figure 7 shows three time slices from the entry phase, based on consecutive samples of 25

invoices each. From these network graphs, the human eye can determine some common

structures and gain further insight into the patterns of action over time. When the full set of

invoices is animated, these patterns are even more apparent.


19/25


2

4

7

8

9

10

11

12

13

14 16

21

24

30

2

4

7

8

9

10

11

12

13

14 16

21

24

30

2

4

7

8

9

10

11

12

13

14 16

21

24

30

Time 1 Time 2 Time 3

Figure 8: Network of action changes over time (entry phase)

Dynamic network graphs give an exciting moving picture that allows the human brain to

use its abstraction and induction capabilities to discover shapes and regularities in the movement

of nodes and arcs. Time is vital to our understanding of the dynamics and flexibility found in an

organizational routine, and the moving network graphs give us this.

Discussion

The ability to identify and compare organizational routines based on large samples of

performances creates some exciting new possibilities that we have only begun to explore here.

In the final section of the paper, we offer some thoughts on the limits and possibilities of this

approach as we currently understand it.

Data, data everywhere...

The most exciting aspect of workflow data is the large and growing number of different

organizations and processes for which it may be available. Workflow systems are built into

enterprise software, such as SAP and Peoplesoft. In fact, basic workflow services are built into

every copy of Microsoft Windows (MS Sharepoint is a workflow manager). So it is safe to say

that workflow data is ubiquitous, but there may be difficulties preventing its universal use in

research.


20/25


The first challenge, of course, is getting it. In many operational systems, logging is

turned off because it would rapidly consume too much disk space. If so, then recording

workflow data would require that the organization agree to activate this functionality.

Depending on how the event data are stored, it may contain data that are considered confidential

or sensitive. If so, it may be difficult to get an organization to invest the resources required to

extract sequences of events (or other abstract representations) that do not compromise the

confidentiality of the organization or its members. Organizations are understandably reluctant

to share this kind of data, and protracted negotiations and non-disclosure agreements may be

required. Thus, while the marginal cost of a thousand extra performances may be very low, the

cost of getting any data at all may still be quite high. Unless the event log is stored in a single

file or database table, considerable technical skill may be required to extract the data.

The second challenge is insuring that it is comparable between organizations. Workflow

software can be configured differently by each organization using it. The utility of workflow

event logs depends to a great extent on what kinds of events have been recorded. The event log

records events from a particular point of view, and only to the extent that programmers (and

system configuration) allows. Different events may be monitored in the event log. Different

processing rules may be used to automate different steps in the process. These differences are

interesting and potentially researchable in their own right, but they could diminish or destroy the

comparability of the data in the event logs.

The third challenge in working with event log data is the novelty of the methods. We

have found that we are making rather extensive use of SQL and Visual Basic to create

customized data analysis procedures. Some analytical tools exist (e.g., for string matching), and

others are being developed. One of the most promising developments is the availability of an


21/25


XML-based standard for representing workflow data (Van Dongen & van der Aalst, 2005) and

an open-source software platform for analyzing these data structures(Van Dongen & van der

Aalst, 2005) . This platform, called ProM, supports the use of plug-ins for analysis. While the

availability of such tools lowers the barrier to entry somewhat, analyzing workflow data requires

skills that are not part of a typical social science PhD program.

Identifying structure is important

While workflow data presents some challenges, the potential payoff is significant. For

example, it can provide a better understanding of the underlying structure of routines. For a

simple case like invoice processing, detailed analysis of the workflow might be perceived as

overkill. But even in our simple example, some interesting issue arose. For example, is invoice

processing one routine or two? How many routines are generating the patterns we observe, and

how similar are they?

Based on the lexicon, the entry and approval phase seem to be very similar. They have

almost the same size lexicon, and nearly the same number of steps per performance. However,

the actions in each phase are different and there is little overlap between the actions.

Furthermore, the phases are very different in sequential variety, and in the actual patterns of

action. Based on this analysis, invoice processing might be better characterized as two

routines, and additional subroutines of these may be discovered with additional analysis.

Combinatorics of organizational evolution

This minor difference in description is important for a variety of reasons. First,

organizations are hypothesized to adapt and evolve by combining and recombining routines.

Routines are like genetic material, or perhaps building blocks, whose combinations can create

meaningful differences in the overall organization. But what is the level of granularity at which


22/25


these combinations can occur? Is it high-level processes, such as order-to-cash? Or is it the

nominal process we see in the process map (e.g., invoice processing)? Or is it at an even lower

level, such as the entry and approval phases identified here? By identifying distinct routines,

we begin to set some boundaries on the combinatorics of organizational adaptation at the level of

routines.

While workflow event logs can help identify meaningful sub-routines, they may not be

very reliable as a tool for observing novel combinations and re-combinations. This is because

the event log is generated by a technical artifact (a software system) that could be swapped in (or

out) of the picture at any time. It would only be possible to observe re-combinations at a lower

level of granularity and a shorter time-scale than the life-span of the system itself.

Sequential variety may indicate evolutionary potential

Such smaller scale variations could be conceptualized as evolution within a routine, as

well. Feldman and Pentland (2003) suggest the possibility of variation and selective retention

within routines, as exception handling and improvisation create variations that may be

recognized and retained as a part of the routine. Variability in routines is important because it

creates opportunity for change. As new ways of doing things are tried out (accidentally or

intentionally) and incorporated into the organizations repertoire of action.

To observe this evolutionary process, we need detailed data on the patterns of action in

each performance, over time. Of course, workflow data offers just such an opportunity. We

expect that routines with higher sequential variety would have a greater possibility of displaying

evidence of variation and selective retention. For example, the approval phase has much more

variability than the entry phase, as shown in figure 4. If the organization is able to select

desirable patterns, and retain them, they should be able to achieve improved outcomes over


23/25


time. For organizations that wish to direct their evolution intelligently, the identification of

patterns, causes and outcomes is necessary to selectively retain the best patterns for execution in

specific circumstances.

Antecedents and consequences

Workflow patterns can also be linked to outcome or performance variables. Outcome

measures, such as cycle time, are often available from workflow systems, using the same data

collection techniques as described earlier. By utilizing such measures we could answer questions

about the consequence of sequential variety and particular workflow patterns. Potentially, one

could investigate the criteria that seem to drive selection of routines (cost, quality, cycle time,

etc.).

It is also possible to study the antecedents of the structure of routines. Do different

environments and organizations tend to produce the same patterns, or are there systematic

differences? Do different organizations, given similar environments, produce similar patterns?

Are there characteristics of the persons or team responsible for the workflow system that may

predict variation in patterns of actions? In other words, are routines shaped more by the external

environment or by internal features of the organization? While answers to these questions seem

a long way off at the moment, the ability to analyze workflow data lets us contemplate them in

ways that were never possible before.


24/25


References

Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology,21, 93-113.

Barley, S. R. (1990). Images of Imaging: Notes on Doing Longitudinal Field Work.Organization Science, 1(3), 220-247.

Basu, A., & Kumar, A. (2002). Research Commentary: Workflow Management Issues in e-Business.Information Systems Research, 13(1), 1-14.

Becker, J., zur Muehlen, M., & Gille, M. (2002). Workflow Application Architectures:

Classification and Characteristics of workflow-based Information Systems. Workflow

Handbook, 39-50.

Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and

Corporate Change, 13(4), 643-678.

Butts, C. T. (2001). The complexity of social networks: theoretical and empirical findings. Social

Networks, 23 31-71.

Chi, M. T. H., Feltovich, P. J., & Glaser, R. Categorization and Representation Physics Problems

by Experts and N0vices. cOGNITIVE SCIENCE, 5, 121-152.

Feldman, M. S., & Pentland, B. T. (2003). Reconceptualizing Organizational Routines as a

Source of Flexibility and Change.Administrative Science Quarterly, 48(1), 94-118.

Haerem, T., & Rau, D. (In Press). The Influence Of Degree Of Expertise And Objective TaskComplexity On Perceived Task Complexity And Performance.Journal of Applied

Psychology.

Howard-Grenville, J. A. (2005). The Persistence of Flexible Organizational Routines: The Role

of Agency and Organizational Context. Organization Science, 16(6), 618.

Kaspar, F., & Schuster, H. G. (1987). Easily Calculable Measure for the Complexity of

Spatiotemporal Patterns. Physical Review A 36(2), 842-848.

Lempel, A., & Ziv., J. (1976). On the Complexity of Finite Sequences.IEEE Transactions onInformation Theory, 22.

Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic Network Visualization.American Journal of Sociology, 4, 1206-1241.

Nelson, S. G., & Winter, R. R. (1982).An Evolutionary Theory of Economic Change.Cambridge, MA: Harvard University Press.


25/25


Pentland, B. T. (1995). Grammatical Models of Organizational Processes. Organization Science,6(5), 541-556.

Pentland, B. T. (1999). Organizations as Networks of Action. In J. A. C. Baum & W. McKelvey

(Eds.), Variations in Organization Science: Essays in Honor of Donald T. Campbell .Thousand Oaks, CA: Sage.

Pentland, B. T., & Feldman, M. S. (2005). Organizational Routines as a Unit of Analysis.

Industrial and Corporate Change, 14(5), 793-815.

Sankoff, D., & Kruskal, J. B. (1983). Time Warps, Strings Edits, and Macromolecules: The

Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.

van der Aalst, W. (2003). Business Process Management: Past, Present and Future.

van der Aalst, W., & Weijters, A. J. M. M. (2004). Process Mining: A Research Agenda.

Computers in Industry, 53(3), 231-244.

Van Dongen, B. F., Medeiros, D. J., Verbeek, H. M. W., Weijters, A., & van der Aalst, W. M. P.

(2005). The ProM framework: A new era in process mining tool support. Paperpresented at the 26th International Conference on Applications and Theory of Petri Nets

(ICATPN 2005), Heidelberg.

Van Dongen, B. F., & van der Aalst, W. M. P. (2005).A Meta Model for Process Mining Data.

Paper presented at the CAiSE05 WORKSHOPS.

Wasserman, S., & Faust, K. (1994). Social Network Analysis: methods and applications:

Cambridge University Press.

Documents

Using Workflow Data to Explore the Structure of an Organizational Routine