Upload
bogdan-comsa
View
214
Download
0
Embed Size (px)
Citation preview
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
1/25
Using workflow data to explore the structure
of an organizational routine
Brian Pentland1, Thorvald Haerem
2and Derek Hillison
1
1Michigan State University,
2Norwegian School of Management
Prepared for the 3rd International Conference on Organizational Routines:
Empirical research and conceptual foundations25-26 May 2007, Strasbourg, France
Abstract
Empirical studies of organizational routines are often limited because performances are
distributed in time and space, and therefore difficult to observe. Workflow management
systems provide an opportunity to collect data about a large number of complete performances at
relatively low cost. In this paper, we analyze data from a workflow management system for
processing invoices to demonstrate how workflow event logs can be used to analyze
organizational routines. The sample includes all invoices processed over an eleven month period
(N=2072). We use the data to compute a variety of measures which can be used to identify and
compare the structures of routines.
Acknowledgements
We thank Compello Software, Visma Services, Bo Hjort Christensen and Norwegian Research
Council for their support. We also appreciate the help from Alf Slettemoen in collecting and
preparing the data for the analyses.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
2/25
Using workflow data Page 2 of 25
Introduction
Empirical study of organizational routines poses many difficulties for the researcher.
Routines are typically distributed in time, space, and throughout an organizations structure.
Short of stapling yourself to the paperwork, it is difficult to observe even a single performance
of an organizational routine from beginning to end. In addition, the natural variability in
performances makes it difficult to identify a representative pattern that encompasses the range of
possible performances. Time, money and patience often limit us to observations of a few
performances, or parts of performances, or interviews with a subset of participants.
Workflow systems provide an unprecedented opportunity to gather data about the
patterns of action generated by routines (van der Aalst et al, 2003;). With the proliferation of
computer network technology, more and more organizations have adopted workflow systems to
support their routines (Basu and Kumar, 2002). These systems typically involve a mixture of
human and automatic processingthey are like the glue that holds together other, more
common applications for accounting, email, and so on (Becker, Muehlen and Gille, 2002).
Typical workflow systems generate event logs that include time-stamped records of each event
or action that occurs in the system, making it possible to collect large numbers of performances
at very low cost. Computer scientists have made tremendous progress in analyzing event logs for
a range of purposes, such as the recovery of formal process models (van der Aalst et al, 2003;
van der Aalst and Weijters, 2004). Organizational scholars, however, have not paid as much
attention to this increasingly ubiquitous source of data about organizational processes and
routines.
In this paper, we demonstrate how workflow event logs can be used to analyze
organizational routines. We begin by discussing the challenges involved in analyzing routines
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
3/25
Using workflow data Page 3 of 25
as systems that generate patterns of events that can change over time. We provide a survey of
techniques for analyzing and representing the structure of organizational routines, and illustrate
these techniques using over 2000 performances of an invoice approval routine, as captured in a
workflow management system. While workflow event logs have several important limitations, it
seems clear that they provide a valuable source of data about organizational routines.
Theory
To have an empirical science of organizational routines, we need to be able to identify
particular routines and compare them to other routines. These analytical moves identification
and comparison are fundamental to any empirical science. In disciplines where individuals or
organizations are the unit of analysis, such as psychology or economics, identification and
comparison are difficult. The additional complexity of studying patterns of behaviors by groups
of individuals that take place in the context of an organization compounds this difficulty.
Identification and comparison are particularly challenging if we conceptualize routines as
generative systems (Feldman and Pentland, 2003; Pentland and Feldman, 2005). We are able to
observe surface structure (the performative aspects of the routine), but we would like to say
something about the underlying structures which generate the performances. The grammatical
metaphor of surface vs. deep structure has limitations when applied to social systems
(Pentland, 1995), but the basic analogy is useful when trying to understand human perceptions of
complex tasks and systems (Chi, Feltovich and Glaser, 1981; Haerem and Rau, 2007): we are
trying to understand the features of a system that can generate a potentially infinite variety of
performances from a finite sample of performances.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
4/25
Using workflow data Page 4 of 25
When we say that organizational routines are generative systems, it means that there is
some underlying mechanism that generates the interdependent patterns of action that we
recognize as an organizational routine. Figure 1 illustrates this perspective.
Figure 1: Routine as generative structure
Figure 1 assumes that there is one routine generating the performances we are seeing.
But in field research, the situation is more like Figure 2: we see a collection of performances, and
we would like to know something about the underlying routine. Is there one routine, or many?
Which performances represent standard practice, and which are exceptions? More broadly, how
can we generalize about a routine from observing a sample of its performances?
Figure 2: Generalization of structure from specific performances
ABCD
ACD
CD
(etc)
?
Performances Indicate Routine?
ABCD
ACD
C
D(etc)
Ostensive patternABCD
PerformancesGeneratesRoutine
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
5/25
Using workflow data Page 5 of 25
We can see this problem in two ways: measurement or induction. As a measurement
problem, we are using the performances to compute various properties of the routine, such as the
average cycle time, or the number of steps. Because each performance has a beginning and an
end, we can compute the cycle time. The variability of the performances is easily dealt with by
familiar procedures such as averaging, or other measures of central tendency. We can use these
properties to compare routines (which routine is faster/better/cheaper?)
While this familiar procedure works well for properties, our current theory of
organizational routines suggests that they are not just things with variable properties. Rather,
they are complex generative systems that produce recognizable, repetitive, patterns of
interdependent actions carried out by multiple actors. If we are interested in the pattern of
actions the actual performances of the routine -- then familiar statistical techniques like
averaging are not applicable, because the data are not scalar.
Generalizing about the patterns involves induction: generalizing about a class from
observing instances of that class. For example, one might observe several black crows, and on
the basis of these observations, one might conclude that all crows are black. If a routine
generated identical performances, the induction problem would be just that easy. Unfortunately,
we know that organizational routines tend to generate a great variety of patterns. In a typical
situation, we are attempting to generalize from a small set of observations to a large class that is
filled with variety. Furthermore, the granularity of observations can affect our impression of
how much variety is present. From a distance, every performance looks the same; up close, they
are all different.
One practical approach to this problem is to simply ignore the variety and describe
typical performances. In process mapping, for example, we interview participants to create a
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
6/25
Using workflow data Page 6 of 25
flow chart (or some other kind of diagram) that summarizes their ostensive understanding of how
routine typically works. This method avoids the problem of induction because it bypasses the
primary evidence the actual performances of the routine. While flowcharts and other process
maps can be useful, this seems a rather weak basis for an empirical science of routines,
especially when we know that flow charts are often a poor representation of actual activities
(Suchman, 1997).
Properties and patterns over time
Part of what makes organizational routines different is their intrinsically event-based
nature. To study routines up close, and understand them, we must account for the role of time.
Within a single performance, time is embodied in the pattern of events. For some routines, like
the invoice routine, the events occur over a few hours or days. For other routines, like hiring or
technology roadmapping (Howard-Grenville, 2005), events may take weeks or months. Even in a
very fast routine, like a sports team driving for the goal, or landing an airplane, timing is critical.
Nevertheless, when we write about routines, these patterns are generally conceptualized as
synchronic occurring at a single moment in time.
When we consider changes in the pattern over time, of course, we need a longitudinal
(more macro-scale) time (Barley, 1990). Figure 1 shows the different ways that time can factor
into empirical studies of routines.
Figure 1: Approaches to identification and comparison
Properties Patterns
Cross-sectionalSynchroniccomparison
Identification
LongitudinalDiachroniccomparison
Evolution
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
7/25
Using workflow data Page 7 of 25
Our familiar statistical tools are well suited for working with properties, such as cycle
time. Cross-sectional data takes a snap-shot of a routine during a particular time window, which
is sufficient for a synchronic description or comparison (Barley, 1990). With longitudinal data,
we can do diachronic comparisons and track how the attributes of the system change over time.
When working with patterns, the empirical operations are less familiar. Typical social
science methods work on scalar (or columnar) data like questionnaire items that can be fed
into correlation-based analyses. Network models provide a prominent example of a more
sophisticated approach, and we will illustrate a network-based approach later in this paper.
Other techniques, such as classification and clustering algorithms, generally depend on
computing a distance between patterns. In other words, we need to characterize the space of
possible patterns and locate each observed pattern within that space.
The ability to classify patterns directly would be an important aid in an empirical science
of organizational routines. For example, we theorize that routines are composed of sub-routines,
and that these sub-routines can be re-combined in various ways to create new routines. But how
can we identify particular subroutines, given the enormous variability in a typical set of
performances? The answer is not obvious, but the ability to identify and compare the structure
of routines and map their change over time would be fundamental to empirical tests of our most
influential theories about stability, change, and evolution in organizations. Nelson and Winter
(1982) hypothesize that organizations change by mutation, and by recombining routines in
various ways. Genetic models and metaphors continue to be an influential approach to
theorizing about routines (Becker, 2004), but empirical evidence from real organizations has
been difficult to obtain, and methods for analyzing changes in patterns over time have been
lacking. The use of workflow data may afford the possibility of testing evolutionary concepts. In
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
8/25
Using workflow data Page 8 of 25
the following sections, we offer some modest examples of how workflow data might be used to
explore the structure of an invoice processing routine.
Methodology
For this paper, we collected workflow data from an invoice-management system
(Compello Software) for one company over an 11-month period. The data include 2072
performances of the invoice processing routine, consisting of 62756 individual actions taken
from a lexicon of 31 action types.
Figure 2 shows a flow chart for the invoice process. Invoices can enter the system on
paper (which is most common), or via an electronic portal. If the invoice conforms to a known
Form type, it is scanned, registered in the system, and distributed for approval. Some invoices
can be immediately sent to the financial system for payment; others require multiple approvals.
Once these are complete, the invoice is paid.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
9/25
Using workflow data Page 9 of 25
Figure 2: Flow chart of the invoice processing routine
Table 2 shows one performance of the invoice process routine as it appears in the
workflow event log. The first column in the table indicates that the routine consists of two main
phases: entry and approval. The first portion of this routine, referred to as Entry, is almost
entirely automated. When Person is -1, that means that a computer is taking the action
automatically, according to a rule. In the approval phase, some actions are automatic, but
others are manual. People are back in the picture. In the analysis that follows, we will
sometimes refer to the routine as a whole, but at times it will be useful and interesting to
compare the automated, initial entry section of the routine (Entry) to the more manual,
approval phase (Approval). After the last approval, the invoice data is exported to a separate
accounts payable system that is outside the scope of the workflow system. This is an example of
Financial
system
Scanning OCR OCR Form def.
Registration/Distribution
AuthorizationStatus Fully posted
Form?
Electronic invoice
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
10/25
Using workflow data Page 10 of 25
a limitation to workflow event logs their scope is limited by the design and architecture of the
underlying systems.
Table 2: Example event log from workflow system1
PHASE ACTIONTYPE OFACTION DATA DATE PERSON DATA CAPTURE
Entry Enter document type Data entry CI 05-Apr-06 -1 OCR/XML
Entry Enter invoicno. Data entry 27475 05-Apr-06 -1 OCR/XML
Entry Enter invoicedate Data entry 3/31/2006 05-Apr-06 -1 OCR/XML
Entry Enter duedate Data entry 4/20/2006 05-Apr-06 -1 OCR/XML
Entry Enter period Data entry 33 05-Apr-06 -1 OCR/XML
Entry Enter vendor account Data entry some text 05-Apr-06 -1 OCR/XML
Entry Enter amount Data entry 1250 05-Apr-06 -1 OCR/XML
Entry Enter currency Data entry 2 05-Apr-06 -1 System
Entry Enter text Data entry some text 05-Apr-06 6 Manual
Entry Approve Approve 5 05-Apr-06 -1 System
Entry Enter amount Data entry 1250 05-Apr-06 6 Manual
Approval Approve Approve 5 05-Apr-06 19 Manual
Approval Enter period Data entry 33 05-Apr-06 -1 OCR/XML
Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual
Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual
Approval Enter currency Data entry 2 05-Apr-06 -1 System
Approval Enter account Data entry Acct 05-Apr-06 19 Manual
Approval Enter Tax-code Data entry 16 05-Apr-06 19 System/manual
Approval Notify Workflow 4/5/2006 05-Apr-06 -1 System
Approval Enter value dim. 1 (DP) Data entry 131 05-Apr-06 19 Manual
Approval Approve Approve 5 06-Apr-06 22 Manual
Approval Enter text Data entry some text 18-Apr-06 44 Manual
In some respects, workflow data is extremely convenient, because it is computer
generated. Events are unitized, categorized and time-stamped. This largely eliminates the need
for the arduous and time-consuming task of coding that would be required in a typical
observational or archival field study. In other respects, depending on how the workflow event
log is stored, it may require some rather sophisticated computer procedures (e.g., queries written
in SQL) to retrieve it. Many software programs databases differ from installation to installation.
Such differences introduce more complexity in making the data comparable across organizations.
1 Some columns have been removed or values changed for confidentiality.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
11/25
Using workflow data Page 11 of 25
To minimize such complications we chose a process and software which allowed a variation
within a standardized framework. But once data has been retrieved, it is more or less ready for
analysis.
Describing the data: Lexical variety
The most basic property of a routine is the list of events it generates. Events are fundamental
because we need the events to describe and recognize the patterns. By analogy to human
language, we refer to the number of different events generated by a routine as its lexical variety
orlexical size. Table 2 provides some measures of the lexical variety each phase in the routine.
These measures alone might suggest that entry and approval are basically the same they
have almost the same number of actions in their lexicon (15 vs. 17); they have a similar
average length; and a similar number of different lexical items appear in each performance.
Table 2: Lexical variety measures
Entry Approval
Average Length 11.31532 12.2525
Lexicon Size 15 17
Average Lexicon Items 10.15916 9.634741
While the numberof events in each phase is similar, the events themselves are not.
Figure 3 shows which events are found in each phases of the routine. Some of events are shared
between phases, yet other events are unique to a specific phase. Thus, one can easily
discriminate between the phases based on the presence (or absence) of particular events. In other
words, these two phases can be easily identifiedor recognized. In cases where there is more
overlap, more elaborate classification (or pattern recognition) techniques can be used to
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
12/25
Using workflow data Page 12 of 25
discriminate between phases (e.g., markov and n-grams are two widely used techniques for
classifying sequence data).
Figure 3: Frequency of action types in the entry and approval phases
Frequency of Actions in Entry and Approval Phase
0 500 1000 1500 2000 2500
Enter currency
Enter period
Enter text
ApproveEnter amount
Enter duedate
Enter invoicedate
Enter invoicno.
Enter vendor account
Enter document type
Distribute to approver
Enter account
Enter Tax-code
Enter value dim. 1 (DP)
Notify
Enter value dim. 3 (SP)Enter value dim. 2 (PR)
Enter Last Comment
Remove approver
Reminder (last if more than one)
Send back to accountant
Enter value dim. 4 (EM)
Approved by backup
Forward
New scanning
Entry Phase Approval Phase
Listing the sequences
Lexical variety provides a foundation, but we are ultimately more interested in the
performances, which are temporally orderedpatterns of events. To describe how performances
of a routine vary, one could simply list all the sequences one observes. If the routine is simple
enough, with fixed performances, this might be a good way to describe what the routine does.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
13/25
Using workflow data Page 13 of 25
Table 3 shows the ten most frequent sequences for each phase of the routine. These sequences
were produced by assigning an integer code to each event type in the lexicon (taken from the
second column of table 2).
Table 3: Ten most frequent sequences for each phase
Entry Phase Approval Phase
# % of total Action Sequence (119 unique, n=1998) # % of total Action Sequence (902 unique, n=2297)
904 45.25% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7 138 6.01% 2, 14, 5, 8, 16, 6, 15, 20
419 20.97% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16 92 4.01% 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17, 2
147 7.36% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 13 51 2.22% 2, 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17
121 6.06% 9, 12, 11, 10, 14, 21, 7, 8, 2, 16, 7 44 1.92% 2, 14, 5, 5, 8, 25, 19, 2, 6, 15, 17, 16
37 1.85% 9, 12, 11, 10, 21, 7, 8, 16, 2, 7, 14 44 1.92% 2, 14, 5, 5, 8, 25, 2, 16, 6, 15, 17
36 1.80% 9, 12, 11, 10, 14, 8, 21, 7, 16, 2, 7 37 1.61% 2, 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19
30 1.50% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16, 13 33 1.44% 2, 14, 5, 5, 8, 25, 2, 6, 15, 17, 16
25 1.25% 9, 12, 11, 14, 21, 7, 8, 16, 2, 7, 10 32 1.39% 2, 2, 14, 5, 5, 8, 6, 15, 25, 17, 19, 16
24 1.20% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 13, 16 27 1.18% 2, 14, 5, 8, 16, 6, 15, 17
17 0.85% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 7 25 1.09% 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19, 2
In the invoice routine, raw enumeration is apparently not a useful way to summarize the
routine: there is too much variety. There are 119 distinct sequences in the entry phase, and 902
distinct sequences in the approval phase. For the entry phase, the ten most common sequences
account for 88% of the observed sequences. For the approval phase, the ten most common
account for just under 23%.
One implication of Table 3 is that, even for automated workflows such as the Entry
phase, there can be quite a lot of variability in the performances. For the less automated
Approval phase, variability is even higher. For this routine, collecting a small sample of
performances is not likely to result in a good description. Even for the entry phase, which is
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
14/25
Using workflow data Page 14 of 25
almost entirely automated, there are over 100 distinct sequences. In the context of the invoice
processing routine, these variations may or may not be particularly meaningful. In a more
complex routine, there may be more than two phases (or subroutines), and they may be
combined and recombined in many different ways. Clearly, we need tools to identify and
compare such variations and combinations.
Structure of routines
To assess the structure of the routine, and possible changes over time, we need to analyze
the performances. We would like to be able to compute properties of the routine that are
sensitive to the patterns, so that we can perform cross-sectional and longitudinal comparisons.
Better yet, we would like to be able to compare the patterns directly to support identification of
routines and testing of evolutionary models.
Sequential Variety
To create a meaningful comparison that takes into account the pattern of events, we need
to use methods that use information about the sequences. Sequential variety (Pentland, 2003)
uses all of the data in a set of observed sequences (not just pairs of events) to create a single,
scalar measure of the variability in the sample. It provides an index of the variability in the
performances that can be used for either cross sectional or longitudinal comparisons. Sequential
variety is closely correlated to the complexity of the observed sequences, as measured by the
Lempel-Ziv complexity.2
2Algorithmic complexity is a well-established approach to detecting and measuring spatio-temporal patterns in
sequence data (Kaspar and Schuster, 1987). The Kolgomorov-Chaitin Complexity (often called the Kolgomorov
complexity) refers to the length of the shortest self-terminating program required to produce a given sequence.
While the Kolgomorov-Chaitin measure has great theoretical value, it is not readily computable (Kolgomorov,1965). To address this problem, Lempel and Ziv (1974) devised an algorithm that provides an index of the
Kolgomorov complexity. The Lempel-Ziv measure has similar properties to the Kolgomorov-Chaitin measure, and
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
15/25
Using workflow data Page 15 of 25
Sequential variety is computed by comparing each sequence in a sample to every other
sequence (Pentland, 2003). The distance between sequences is computed using optimal string
matching (Sankoff and Kruskal, 1983; Abbott, 1995). This distance (also called a Levenshtein
distance) can be used for subsequent clustering and pattern classification. Sequential variety is
simply the average distance between sequences in a sample. Like the Lempel-Ziv complexity,
this method uses entire sequences (not just pairs of actions). Figure 4 shows the sequential
variety of the two phases of the invoice process over the 11 month period in our data set.
Sequential variety over time
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7 8 9 10 11
Month
Initial Phase Approval Phase
Figure 4: Entry phase is less variable than the approval phase
Figure 4 shows the sequential variety of each phase in the invoice routine. The numbers
in Figure 4 are computed month by month. This is, of course, an artificial way of partitioning
thus provides a practical measure of algorithmic complexity on empirical data (Kaspar and Schuster, 1987). For
example, Butts (2001) uses the Lempel-Ziv measure to explore the effects of complexity in social networks.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
16/25
Using workflow data Page 16 of 25
the data. But sequential variety depends on comparing sequences within a sample, such as a
months worth of invoices. Figure 4 demonstrates that the entry phase is consistently less
variable than the approval phase. This makes sense, because the entry phase is more highly
automated and (as shown earlier) the sequences are more similar.
Network Graphs and Measures
The simplest way to summarize a large number of patterns is to break each sequence of
action into pairs of actions. In this way, workflow data can be processed and displayed as a
valued, directed network graph, as shown in figure 5. Each node in the graph represents a type
of action, while the arcs represent the sequential relationship between actions. Formally, these
graphs are first-order Markov models, except that the nodes are events rather than states.
Pentland (1999) calls this kind of graph an action network because it indicates which actions
follow which other actions. The action network perspective allows us to use methodologies that
had previously been used for the study of social networks to ask new questions about the nature
of the routine in the organization.
Figure 5: Action network for entire routine (all 31 actions)
If we graph the relationships between actions in each phase, we can visually compare
how these graphs may be similar or different. Figure 5 shows the network of actions found in
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
17/25
Using workflow data Page 17 of 25
the initial entry phase, while Figure 6 refers to the approval phase. The thickness of the arcs is
an indication of how often a particular sequence occurs, thicker lines indication more frequent
connections between those actions.
From first glance, it is easy to see differences between them. The approval phase is much
more interconnected and varied than the entry phase. This is consistent with the comparison of
sequential variety in figure 4. More numerous (denser) connections should be correlated with
higher sequential variety.
Figure 5 : Action Network for Entry Phase
2
5
7
8
9
10
11
12
13
14
1621
24
25
29
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
18/25
Using workflow data Page 18 of 25
Figure 6: Action Network for Approval Phase
2
3
5
6
8
14
15
16
1718
19
20
23
25
28
29
31
Once the routine has been represented as a network, a large number of analytical tools
and techniques become available (Wasserman and Faust, 1994). In addition to the overall
density, we can compute centrality of particular events, block models and structural equivalence.
On a practical level, we can detect potential bottlenecks and most common paths.
Dynamic networks
Because we have a sequences sampled over a long period of time, we can show how the
pattern of action within the routine changes over time using a software application called SoNIA
(Moody, McFarland and Bender-deMoll, 2005). SoNIA creates dynamic network movies that
allow visualization of changes in a network over time. On paper, we cannot display the movies,
but figure 7 shows three time slices from the entry phase, based on consecutive samples of 25
invoices each. From these network graphs, the human eye can determine some common
structures and gain further insight into the patterns of action over time. When the full set of
invoices is animated, these patterns are even more apparent.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
19/25
Using workflow data Page 19 of 25
2
4
7
8
9
10
11
12
13
14 16
21
24
30
2
4
7
8
9
10
11
12
13
14 16
21
24
30
2
4
7
8
9
10
11
12
13
14 16
21
24
30
Time 1 Time 2 Time 3
Figure 8: Network of action changes over time (entry phase)
Dynamic network graphs give an exciting moving picture that allows the human brain to
use its abstraction and induction capabilities to discover shapes and regularities in the movement
of nodes and arcs. Time is vital to our understanding of the dynamics and flexibility found in an
organizational routine, and the moving network graphs give us this.
Discussion
The ability to identify and compare organizational routines based on large samples of
performances creates some exciting new possibilities that we have only begun to explore here.
In the final section of the paper, we offer some thoughts on the limits and possibilities of this
approach as we currently understand it.
Data, data everywhere...
The most exciting aspect of workflow data is the large and growing number of different
organizations and processes for which it may be available. Workflow systems are built into
enterprise software, such as SAP and Peoplesoft. In fact, basic workflow services are built into
every copy of Microsoft Windows (MS Sharepoint is a workflow manager). So it is safe to say
that workflow data is ubiquitous, but there may be difficulties preventing its universal use in
research.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
20/25
Using workflow data Page 20 of 25
The first challenge, of course, is getting it. In many operational systems, logging is
turned off because it would rapidly consume too much disk space. If so, then recording
workflow data would require that the organization agree to activate this functionality.
Depending on how the event data are stored, it may contain data that are considered confidential
or sensitive. If so, it may be difficult to get an organization to invest the resources required to
extract sequences of events (or other abstract representations) that do not compromise the
confidentiality of the organization or its members. Organizations are understandably reluctant
to share this kind of data, and protracted negotiations and non-disclosure agreements may be
required. Thus, while the marginal cost of a thousand extra performances may be very low, the
cost of getting any data at all may still be quite high. Unless the event log is stored in a single
file or database table, considerable technical skill may be required to extract the data.
The second challenge is insuring that it is comparable between organizations. Workflow
software can be configured differently by each organization using it. The utility of workflow
event logs depends to a great extent on what kinds of events have been recorded. The event log
records events from a particular point of view, and only to the extent that programmers (and
system configuration) allows. Different events may be monitored in the event log. Different
processing rules may be used to automate different steps in the process. These differences are
interesting and potentially researchable in their own right, but they could diminish or destroy the
comparability of the data in the event logs.
The third challenge in working with event log data is the novelty of the methods. We
have found that we are making rather extensive use of SQL and Visual Basic to create
customized data analysis procedures. Some analytical tools exist (e.g., for string matching), and
others are being developed. One of the most promising developments is the availability of an
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
21/25
Using workflow data Page 21 of 25
XML-based standard for representing workflow data (Van Dongen & van der Aalst, 2005) and
an open-source software platform for analyzing these data structures(Van Dongen & van der
Aalst, 2005) . This platform, called ProM, supports the use of plug-ins for analysis. While the
availability of such tools lowers the barrier to entry somewhat, analyzing workflow data requires
skills that are not part of a typical social science PhD program.
Identifying structure is important
While workflow data presents some challenges, the potential payoff is significant. For
example, it can provide a better understanding of the underlying structure of routines. For a
simple case like invoice processing, detailed analysis of the workflow might be perceived as
overkill. But even in our simple example, some interesting issue arose. For example, is invoice
processing one routine or two? How many routines are generating the patterns we observe, and
how similar are they?
Based on the lexicon, the entry and approval phase seem to be very similar. They have
almost the same size lexicon, and nearly the same number of steps per performance. However,
the actions in each phase are different and there is little overlap between the actions.
Furthermore, the phases are very different in sequential variety, and in the actual patterns of
action. Based on this analysis, invoice processing might be better characterized as two
routines, and additional subroutines of these may be discovered with additional analysis.
Combinatorics of organizational evolution
This minor difference in description is important for a variety of reasons. First,
organizations are hypothesized to adapt and evolve by combining and recombining routines.
Routines are like genetic material, or perhaps building blocks, whose combinations can create
meaningful differences in the overall organization. But what is the level of granularity at which
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
22/25
Using workflow data Page 22 of 25
these combinations can occur? Is it high-level processes, such as order-to-cash? Or is it the
nominal process we see in the process map (e.g., invoice processing)? Or is it at an even lower
level, such as the entry and approval phases identified here? By identifying distinct routines,
we begin to set some boundaries on the combinatorics of organizational adaptation at the level of
routines.
While workflow event logs can help identify meaningful sub-routines, they may not be
very reliable as a tool for observing novel combinations and re-combinations. This is because
the event log is generated by a technical artifact (a software system) that could be swapped in (or
out) of the picture at any time. It would only be possible to observe re-combinations at a lower
level of granularity and a shorter time-scale than the life-span of the system itself.
Sequential variety may indicate evolutionary potential
Such smaller scale variations could be conceptualized as evolution within a routine, as
well. Feldman and Pentland (2003) suggest the possibility of variation and selective retention
within routines, as exception handling and improvisation create variations that may be
recognized and retained as a part of the routine. Variability in routines is important because it
creates opportunity for change. As new ways of doing things are tried out (accidentally or
intentionally) and incorporated into the organizations repertoire of action.
To observe this evolutionary process, we need detailed data on the patterns of action in
each performance, over time. Of course, workflow data offers just such an opportunity. We
expect that routines with higher sequential variety would have a greater possibility of displaying
evidence of variation and selective retention. For example, the approval phase has much more
variability than the entry phase, as shown in figure 4. If the organization is able to select
desirable patterns, and retain them, they should be able to achieve improved outcomes over
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
23/25
Using workflow data Page 23 of 25
time. For organizations that wish to direct their evolution intelligently, the identification of
patterns, causes and outcomes is necessary to selectively retain the best patterns for execution in
specific circumstances.
Antecedents and consequences
Workflow patterns can also be linked to outcome or performance variables. Outcome
measures, such as cycle time, are often available from workflow systems, using the same data
collection techniques as described earlier. By utilizing such measures we could answer questions
about the consequence of sequential variety and particular workflow patterns. Potentially, one
could investigate the criteria that seem to drive selection of routines (cost, quality, cycle time,
etc.).
It is also possible to study the antecedents of the structure of routines. Do different
environments and organizations tend to produce the same patterns, or are there systematic
differences? Do different organizations, given similar environments, produce similar patterns?
Are there characteristics of the persons or team responsible for the workflow system that may
predict variation in patterns of actions? In other words, are routines shaped more by the external
environment or by internal features of the organization? While answers to these questions seem
a long way off at the moment, the ability to analyze workflow data lets us contemplate them in
ways that were never possible before.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
24/25
Using workflow data Page 24 of 25
References
Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology,21, 93-113.
Barley, S. R. (1990). Images of Imaging: Notes on Doing Longitudinal Field Work.Organization Science, 1(3), 220-247.
Basu, A., & Kumar, A. (2002). Research Commentary: Workflow Management Issues in e-Business.Information Systems Research, 13(1), 1-14.
Becker, J., zur Muehlen, M., & Gille, M. (2002). Workflow Application Architectures:
Classification and Characteristics of workflow-based Information Systems. Workflow
Handbook, 39-50.
Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and
Corporate Change, 13(4), 643-678.
Butts, C. T. (2001). The complexity of social networks: theoretical and empirical findings. Social
Networks, 23 31-71.
Chi, M. T. H., Feltovich, P. J., & Glaser, R. Categorization and Representation Physics Problems
by Experts and N0vices. cOGNITIVE SCIENCE, 5, 121-152.
Feldman, M. S., & Pentland, B. T. (2003). Reconceptualizing Organizational Routines as a
Source of Flexibility and Change.Administrative Science Quarterly, 48(1), 94-118.
Haerem, T., & Rau, D. (In Press). The Influence Of Degree Of Expertise And Objective TaskComplexity On Perceived Task Complexity And Performance.Journal of Applied
Psychology.
Howard-Grenville, J. A. (2005). The Persistence of Flexible Organizational Routines: The Role
of Agency and Organizational Context. Organization Science, 16(6), 618.
Kaspar, F., & Schuster, H. G. (1987). Easily Calculable Measure for the Complexity of
Spatiotemporal Patterns. Physical Review A 36(2), 842-848.
Lempel, A., & Ziv., J. (1976). On the Complexity of Finite Sequences.IEEE Transactions onInformation Theory, 22.
Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic Network Visualization.American Journal of Sociology, 4, 1206-1241.
Nelson, S. G., & Winter, R. R. (1982).An Evolutionary Theory of Economic Change.Cambridge, MA: Harvard University Press.
7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine
25/25
Using workflow data Page 25 of 25
Pentland, B. T. (1995). Grammatical Models of Organizational Processes. Organization Science,6(5), 541-556.
Pentland, B. T. (1999). Organizations as Networks of Action. In J. A. C. Baum & W. McKelvey
(Eds.), Variations in Organization Science: Essays in Honor of Donald T. Campbell .Thousand Oaks, CA: Sage.
Pentland, B. T., & Feldman, M. S. (2005). Organizational Routines as a Unit of Analysis.
Industrial and Corporate Change, 14(5), 793-815.
Sankoff, D., & Kruskal, J. B. (1983). Time Warps, Strings Edits, and Macromolecules: The
Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.
van der Aalst, W. (2003). Business Process Management: Past, Present and Future.
van der Aalst, W., & Weijters, A. J. M. M. (2004). Process Mining: A Research Agenda.
Computers in Industry, 53(3), 231-244.
Van Dongen, B. F., Medeiros, D. J., Verbeek, H. M. W., Weijters, A., & van der Aalst, W. M. P.
(2005). The ProM framework: A new era in process mining tool support. Paperpresented at the 26th International Conference on Applications and Theory of Petri Nets
(ICATPN 2005), Heidelberg.
Van Dongen, B. F., & van der Aalst, W. M. P. (2005).A Meta Model for Process Mining Data.
Paper presented at the CAiSE05 WORKSHOPS.
Wasserman, S., & Faust, K. (1994). Social Network Analysis: methods and applications:
Cambridge University Press.