Using Workflow Data to Explore the Structure of an Organizational Routine

Embed Size (px)

Citation preview

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    1/25

    Using workflow data to explore the structure

    of an organizational routine

    Brian Pentland1, Thorvald Haerem

    2and Derek Hillison

    1

    1Michigan State University,

    2Norwegian School of Management

    Prepared for the 3rd International Conference on Organizational Routines:

    Empirical research and conceptual foundations25-26 May 2007, Strasbourg, France

    Abstract

    Empirical studies of organizational routines are often limited because performances are

    distributed in time and space, and therefore difficult to observe. Workflow management

    systems provide an opportunity to collect data about a large number of complete performances at

    relatively low cost. In this paper, we analyze data from a workflow management system for

    processing invoices to demonstrate how workflow event logs can be used to analyze

    organizational routines. The sample includes all invoices processed over an eleven month period

    (N=2072). We use the data to compute a variety of measures which can be used to identify and

    compare the structures of routines.

    Acknowledgements

    We thank Compello Software, Visma Services, Bo Hjort Christensen and Norwegian Research

    Council for their support. We also appreciate the help from Alf Slettemoen in collecting and

    preparing the data for the analyses.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    2/25

    Using workflow data Page 2 of 25

    Introduction

    Empirical study of organizational routines poses many difficulties for the researcher.

    Routines are typically distributed in time, space, and throughout an organizations structure.

    Short of stapling yourself to the paperwork, it is difficult to observe even a single performance

    of an organizational routine from beginning to end. In addition, the natural variability in

    performances makes it difficult to identify a representative pattern that encompasses the range of

    possible performances. Time, money and patience often limit us to observations of a few

    performances, or parts of performances, or interviews with a subset of participants.

    Workflow systems provide an unprecedented opportunity to gather data about the

    patterns of action generated by routines (van der Aalst et al, 2003;). With the proliferation of

    computer network technology, more and more organizations have adopted workflow systems to

    support their routines (Basu and Kumar, 2002). These systems typically involve a mixture of

    human and automatic processingthey are like the glue that holds together other, more

    common applications for accounting, email, and so on (Becker, Muehlen and Gille, 2002).

    Typical workflow systems generate event logs that include time-stamped records of each event

    or action that occurs in the system, making it possible to collect large numbers of performances

    at very low cost. Computer scientists have made tremendous progress in analyzing event logs for

    a range of purposes, such as the recovery of formal process models (van der Aalst et al, 2003;

    van der Aalst and Weijters, 2004). Organizational scholars, however, have not paid as much

    attention to this increasingly ubiquitous source of data about organizational processes and

    routines.

    In this paper, we demonstrate how workflow event logs can be used to analyze

    organizational routines. We begin by discussing the challenges involved in analyzing routines

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    3/25

    Using workflow data Page 3 of 25

    as systems that generate patterns of events that can change over time. We provide a survey of

    techniques for analyzing and representing the structure of organizational routines, and illustrate

    these techniques using over 2000 performances of an invoice approval routine, as captured in a

    workflow management system. While workflow event logs have several important limitations, it

    seems clear that they provide a valuable source of data about organizational routines.

    Theory

    To have an empirical science of organizational routines, we need to be able to identify

    particular routines and compare them to other routines. These analytical moves identification

    and comparison are fundamental to any empirical science. In disciplines where individuals or

    organizations are the unit of analysis, such as psychology or economics, identification and

    comparison are difficult. The additional complexity of studying patterns of behaviors by groups

    of individuals that take place in the context of an organization compounds this difficulty.

    Identification and comparison are particularly challenging if we conceptualize routines as

    generative systems (Feldman and Pentland, 2003; Pentland and Feldman, 2005). We are able to

    observe surface structure (the performative aspects of the routine), but we would like to say

    something about the underlying structures which generate the performances. The grammatical

    metaphor of surface vs. deep structure has limitations when applied to social systems

    (Pentland, 1995), but the basic analogy is useful when trying to understand human perceptions of

    complex tasks and systems (Chi, Feltovich and Glaser, 1981; Haerem and Rau, 2007): we are

    trying to understand the features of a system that can generate a potentially infinite variety of

    performances from a finite sample of performances.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    4/25

    Using workflow data Page 4 of 25

    When we say that organizational routines are generative systems, it means that there is

    some underlying mechanism that generates the interdependent patterns of action that we

    recognize as an organizational routine. Figure 1 illustrates this perspective.

    Figure 1: Routine as generative structure

    Figure 1 assumes that there is one routine generating the performances we are seeing.

    But in field research, the situation is more like Figure 2: we see a collection of performances, and

    we would like to know something about the underlying routine. Is there one routine, or many?

    Which performances represent standard practice, and which are exceptions? More broadly, how

    can we generalize about a routine from observing a sample of its performances?

    Figure 2: Generalization of structure from specific performances

    ABCD

    ACD

    CD

    (etc)

    ?

    Performances Indicate Routine?

    ABCD

    ACD

    C

    D(etc)

    Ostensive patternABCD

    PerformancesGeneratesRoutine

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    5/25

    Using workflow data Page 5 of 25

    We can see this problem in two ways: measurement or induction. As a measurement

    problem, we are using the performances to compute various properties of the routine, such as the

    average cycle time, or the number of steps. Because each performance has a beginning and an

    end, we can compute the cycle time. The variability of the performances is easily dealt with by

    familiar procedures such as averaging, or other measures of central tendency. We can use these

    properties to compare routines (which routine is faster/better/cheaper?)

    While this familiar procedure works well for properties, our current theory of

    organizational routines suggests that they are not just things with variable properties. Rather,

    they are complex generative systems that produce recognizable, repetitive, patterns of

    interdependent actions carried out by multiple actors. If we are interested in the pattern of

    actions the actual performances of the routine -- then familiar statistical techniques like

    averaging are not applicable, because the data are not scalar.

    Generalizing about the patterns involves induction: generalizing about a class from

    observing instances of that class. For example, one might observe several black crows, and on

    the basis of these observations, one might conclude that all crows are black. If a routine

    generated identical performances, the induction problem would be just that easy. Unfortunately,

    we know that organizational routines tend to generate a great variety of patterns. In a typical

    situation, we are attempting to generalize from a small set of observations to a large class that is

    filled with variety. Furthermore, the granularity of observations can affect our impression of

    how much variety is present. From a distance, every performance looks the same; up close, they

    are all different.

    One practical approach to this problem is to simply ignore the variety and describe

    typical performances. In process mapping, for example, we interview participants to create a

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    6/25

    Using workflow data Page 6 of 25

    flow chart (or some other kind of diagram) that summarizes their ostensive understanding of how

    routine typically works. This method avoids the problem of induction because it bypasses the

    primary evidence the actual performances of the routine. While flowcharts and other process

    maps can be useful, this seems a rather weak basis for an empirical science of routines,

    especially when we know that flow charts are often a poor representation of actual activities

    (Suchman, 1997).

    Properties and patterns over time

    Part of what makes organizational routines different is their intrinsically event-based

    nature. To study routines up close, and understand them, we must account for the role of time.

    Within a single performance, time is embodied in the pattern of events. For some routines, like

    the invoice routine, the events occur over a few hours or days. For other routines, like hiring or

    technology roadmapping (Howard-Grenville, 2005), events may take weeks or months. Even in a

    very fast routine, like a sports team driving for the goal, or landing an airplane, timing is critical.

    Nevertheless, when we write about routines, these patterns are generally conceptualized as

    synchronic occurring at a single moment in time.

    When we consider changes in the pattern over time, of course, we need a longitudinal

    (more macro-scale) time (Barley, 1990). Figure 1 shows the different ways that time can factor

    into empirical studies of routines.

    Figure 1: Approaches to identification and comparison

    Properties Patterns

    Cross-sectionalSynchroniccomparison

    Identification

    LongitudinalDiachroniccomparison

    Evolution

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    7/25

    Using workflow data Page 7 of 25

    Our familiar statistical tools are well suited for working with properties, such as cycle

    time. Cross-sectional data takes a snap-shot of a routine during a particular time window, which

    is sufficient for a synchronic description or comparison (Barley, 1990). With longitudinal data,

    we can do diachronic comparisons and track how the attributes of the system change over time.

    When working with patterns, the empirical operations are less familiar. Typical social

    science methods work on scalar (or columnar) data like questionnaire items that can be fed

    into correlation-based analyses. Network models provide a prominent example of a more

    sophisticated approach, and we will illustrate a network-based approach later in this paper.

    Other techniques, such as classification and clustering algorithms, generally depend on

    computing a distance between patterns. In other words, we need to characterize the space of

    possible patterns and locate each observed pattern within that space.

    The ability to classify patterns directly would be an important aid in an empirical science

    of organizational routines. For example, we theorize that routines are composed of sub-routines,

    and that these sub-routines can be re-combined in various ways to create new routines. But how

    can we identify particular subroutines, given the enormous variability in a typical set of

    performances? The answer is not obvious, but the ability to identify and compare the structure

    of routines and map their change over time would be fundamental to empirical tests of our most

    influential theories about stability, change, and evolution in organizations. Nelson and Winter

    (1982) hypothesize that organizations change by mutation, and by recombining routines in

    various ways. Genetic models and metaphors continue to be an influential approach to

    theorizing about routines (Becker, 2004), but empirical evidence from real organizations has

    been difficult to obtain, and methods for analyzing changes in patterns over time have been

    lacking. The use of workflow data may afford the possibility of testing evolutionary concepts. In

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    8/25

    Using workflow data Page 8 of 25

    the following sections, we offer some modest examples of how workflow data might be used to

    explore the structure of an invoice processing routine.

    Methodology

    For this paper, we collected workflow data from an invoice-management system

    (Compello Software) for one company over an 11-month period. The data include 2072

    performances of the invoice processing routine, consisting of 62756 individual actions taken

    from a lexicon of 31 action types.

    Figure 2 shows a flow chart for the invoice process. Invoices can enter the system on

    paper (which is most common), or via an electronic portal. If the invoice conforms to a known

    Form type, it is scanned, registered in the system, and distributed for approval. Some invoices

    can be immediately sent to the financial system for payment; others require multiple approvals.

    Once these are complete, the invoice is paid.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    9/25

    Using workflow data Page 9 of 25

    Figure 2: Flow chart of the invoice processing routine

    Table 2 shows one performance of the invoice process routine as it appears in the

    workflow event log. The first column in the table indicates that the routine consists of two main

    phases: entry and approval. The first portion of this routine, referred to as Entry, is almost

    entirely automated. When Person is -1, that means that a computer is taking the action

    automatically, according to a rule. In the approval phase, some actions are automatic, but

    others are manual. People are back in the picture. In the analysis that follows, we will

    sometimes refer to the routine as a whole, but at times it will be useful and interesting to

    compare the automated, initial entry section of the routine (Entry) to the more manual,

    approval phase (Approval). After the last approval, the invoice data is exported to a separate

    accounts payable system that is outside the scope of the workflow system. This is an example of

    Financial

    system

    Scanning OCR OCR Form def.

    Registration/Distribution

    AuthorizationStatus Fully posted

    Form?

    Electronic invoice

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    10/25

    Using workflow data Page 10 of 25

    a limitation to workflow event logs their scope is limited by the design and architecture of the

    underlying systems.

    Table 2: Example event log from workflow system1

    PHASE ACTIONTYPE OFACTION DATA DATE PERSON DATA CAPTURE

    Entry Enter document type Data entry CI 05-Apr-06 -1 OCR/XML

    Entry Enter invoicno. Data entry 27475 05-Apr-06 -1 OCR/XML

    Entry Enter invoicedate Data entry 3/31/2006 05-Apr-06 -1 OCR/XML

    Entry Enter duedate Data entry 4/20/2006 05-Apr-06 -1 OCR/XML

    Entry Enter period Data entry 33 05-Apr-06 -1 OCR/XML

    Entry Enter vendor account Data entry some text 05-Apr-06 -1 OCR/XML

    Entry Enter amount Data entry 1250 05-Apr-06 -1 OCR/XML

    Entry Enter currency Data entry 2 05-Apr-06 -1 System

    Entry Enter text Data entry some text 05-Apr-06 6 Manual

    Entry Approve Approve 5 05-Apr-06 -1 System

    Entry Enter amount Data entry 1250 05-Apr-06 6 Manual

    Approval Approve Approve 5 05-Apr-06 19 Manual

    Approval Enter period Data entry 33 05-Apr-06 -1 OCR/XML

    Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual

    Approval Distribute to approver Workflow 2 05-Apr-06 6 Manual

    Approval Enter currency Data entry 2 05-Apr-06 -1 System

    Approval Enter account Data entry Acct 05-Apr-06 19 Manual

    Approval Enter Tax-code Data entry 16 05-Apr-06 19 System/manual

    Approval Notify Workflow 4/5/2006 05-Apr-06 -1 System

    Approval Enter value dim. 1 (DP) Data entry 131 05-Apr-06 19 Manual

    Approval Approve Approve 5 06-Apr-06 22 Manual

    Approval Enter text Data entry some text 18-Apr-06 44 Manual

    In some respects, workflow data is extremely convenient, because it is computer

    generated. Events are unitized, categorized and time-stamped. This largely eliminates the need

    for the arduous and time-consuming task of coding that would be required in a typical

    observational or archival field study. In other respects, depending on how the workflow event

    log is stored, it may require some rather sophisticated computer procedures (e.g., queries written

    in SQL) to retrieve it. Many software programs databases differ from installation to installation.

    Such differences introduce more complexity in making the data comparable across organizations.

    1 Some columns have been removed or values changed for confidentiality.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    11/25

    Using workflow data Page 11 of 25

    To minimize such complications we chose a process and software which allowed a variation

    within a standardized framework. But once data has been retrieved, it is more or less ready for

    analysis.

    Describing the data: Lexical variety

    The most basic property of a routine is the list of events it generates. Events are fundamental

    because we need the events to describe and recognize the patterns. By analogy to human

    language, we refer to the number of different events generated by a routine as its lexical variety

    orlexical size. Table 2 provides some measures of the lexical variety each phase in the routine.

    These measures alone might suggest that entry and approval are basically the same they

    have almost the same number of actions in their lexicon (15 vs. 17); they have a similar

    average length; and a similar number of different lexical items appear in each performance.

    Table 2: Lexical variety measures

    Entry Approval

    Average Length 11.31532 12.2525

    Lexicon Size 15 17

    Average Lexicon Items 10.15916 9.634741

    While the numberof events in each phase is similar, the events themselves are not.

    Figure 3 shows which events are found in each phases of the routine. Some of events are shared

    between phases, yet other events are unique to a specific phase. Thus, one can easily

    discriminate between the phases based on the presence (or absence) of particular events. In other

    words, these two phases can be easily identifiedor recognized. In cases where there is more

    overlap, more elaborate classification (or pattern recognition) techniques can be used to

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    12/25

    Using workflow data Page 12 of 25

    discriminate between phases (e.g., markov and n-grams are two widely used techniques for

    classifying sequence data).

    Figure 3: Frequency of action types in the entry and approval phases

    Frequency of Actions in Entry and Approval Phase

    0 500 1000 1500 2000 2500

    Enter currency

    Enter period

    Enter text

    ApproveEnter amount

    Enter duedate

    Enter invoicedate

    Enter invoicno.

    Enter vendor account

    Enter document type

    Distribute to approver

    Enter account

    Enter Tax-code

    Enter value dim. 1 (DP)

    Notify

    Enter value dim. 3 (SP)Enter value dim. 2 (PR)

    Enter Last Comment

    Remove approver

    Reminder (last if more than one)

    Send back to accountant

    Enter value dim. 4 (EM)

    Approved by backup

    Forward

    New scanning

    Entry Phase Approval Phase

    Listing the sequences

    Lexical variety provides a foundation, but we are ultimately more interested in the

    performances, which are temporally orderedpatterns of events. To describe how performances

    of a routine vary, one could simply list all the sequences one observes. If the routine is simple

    enough, with fixed performances, this might be a good way to describe what the routine does.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    13/25

    Using workflow data Page 13 of 25

    Table 3 shows the ten most frequent sequences for each phase of the routine. These sequences

    were produced by assigning an integer code to each event type in the lexicon (taken from the

    second column of table 2).

    Table 3: Ten most frequent sequences for each phase

    Entry Phase Approval Phase

    # % of total Action Sequence (119 unique, n=1998) # % of total Action Sequence (902 unique, n=2297)

    904 45.25% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7 138 6.01% 2, 14, 5, 8, 16, 6, 15, 20

    419 20.97% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16 92 4.01% 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17, 2

    147 7.36% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 13 51 2.22% 2, 14, 5, 5, 8, 25, 19, 2, 16, 6, 15, 17

    121 6.06% 9, 12, 11, 10, 14, 21, 7, 8, 2, 16, 7 44 1.92% 2, 14, 5, 5, 8, 25, 19, 2, 6, 15, 17, 16

    37 1.85% 9, 12, 11, 10, 21, 7, 8, 16, 2, 7, 14 44 1.92% 2, 14, 5, 5, 8, 25, 2, 16, 6, 15, 17

    36 1.80% 9, 12, 11, 10, 14, 8, 21, 7, 16, 2, 7 37 1.61% 2, 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19

    30 1.50% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 16, 13 33 1.44% 2, 14, 5, 5, 8, 25, 2, 6, 15, 17, 16

    25 1.25% 9, 12, 11, 14, 21, 7, 8, 16, 2, 7, 10 32 1.39% 2, 2, 14, 5, 5, 8, 6, 15, 25, 17, 19, 16

    24 1.20% 9, 12, 11, 10, 14, 21, 7, 8, 2, 7, 13, 16 27 1.18% 2, 14, 5, 8, 16, 6, 15, 17

    17 0.85% 9, 12, 11, 10, 14, 21, 7, 8, 16, 2, 7, 7 25 1.09% 2, 14, 5, 5, 8, 16, 6, 15, 25, 17, 19, 2

    In the invoice routine, raw enumeration is apparently not a useful way to summarize the

    routine: there is too much variety. There are 119 distinct sequences in the entry phase, and 902

    distinct sequences in the approval phase. For the entry phase, the ten most common sequences

    account for 88% of the observed sequences. For the approval phase, the ten most common

    account for just under 23%.

    One implication of Table 3 is that, even for automated workflows such as the Entry

    phase, there can be quite a lot of variability in the performances. For the less automated

    Approval phase, variability is even higher. For this routine, collecting a small sample of

    performances is not likely to result in a good description. Even for the entry phase, which is

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    14/25

    Using workflow data Page 14 of 25

    almost entirely automated, there are over 100 distinct sequences. In the context of the invoice

    processing routine, these variations may or may not be particularly meaningful. In a more

    complex routine, there may be more than two phases (or subroutines), and they may be

    combined and recombined in many different ways. Clearly, we need tools to identify and

    compare such variations and combinations.

    Structure of routines

    To assess the structure of the routine, and possible changes over time, we need to analyze

    the performances. We would like to be able to compute properties of the routine that are

    sensitive to the patterns, so that we can perform cross-sectional and longitudinal comparisons.

    Better yet, we would like to be able to compare the patterns directly to support identification of

    routines and testing of evolutionary models.

    Sequential Variety

    To create a meaningful comparison that takes into account the pattern of events, we need

    to use methods that use information about the sequences. Sequential variety (Pentland, 2003)

    uses all of the data in a set of observed sequences (not just pairs of events) to create a single,

    scalar measure of the variability in the sample. It provides an index of the variability in the

    performances that can be used for either cross sectional or longitudinal comparisons. Sequential

    variety is closely correlated to the complexity of the observed sequences, as measured by the

    Lempel-Ziv complexity.2

    2Algorithmic complexity is a well-established approach to detecting and measuring spatio-temporal patterns in

    sequence data (Kaspar and Schuster, 1987). The Kolgomorov-Chaitin Complexity (often called the Kolgomorov

    complexity) refers to the length of the shortest self-terminating program required to produce a given sequence.

    While the Kolgomorov-Chaitin measure has great theoretical value, it is not readily computable (Kolgomorov,1965). To address this problem, Lempel and Ziv (1974) devised an algorithm that provides an index of the

    Kolgomorov complexity. The Lempel-Ziv measure has similar properties to the Kolgomorov-Chaitin measure, and

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    15/25

    Using workflow data Page 15 of 25

    Sequential variety is computed by comparing each sequence in a sample to every other

    sequence (Pentland, 2003). The distance between sequences is computed using optimal string

    matching (Sankoff and Kruskal, 1983; Abbott, 1995). This distance (also called a Levenshtein

    distance) can be used for subsequent clustering and pattern classification. Sequential variety is

    simply the average distance between sequences in a sample. Like the Lempel-Ziv complexity,

    this method uses entire sequences (not just pairs of actions). Figure 4 shows the sequential

    variety of the two phases of the invoice process over the 11 month period in our data set.

    Sequential variety over time

    0

    1

    2

    3

    4

    5

    6

    7

    8

    1 2 3 4 5 6 7 8 9 10 11

    Month

    Initial Phase Approval Phase

    Figure 4: Entry phase is less variable than the approval phase

    Figure 4 shows the sequential variety of each phase in the invoice routine. The numbers

    in Figure 4 are computed month by month. This is, of course, an artificial way of partitioning

    thus provides a practical measure of algorithmic complexity on empirical data (Kaspar and Schuster, 1987). For

    example, Butts (2001) uses the Lempel-Ziv measure to explore the effects of complexity in social networks.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    16/25

    Using workflow data Page 16 of 25

    the data. But sequential variety depends on comparing sequences within a sample, such as a

    months worth of invoices. Figure 4 demonstrates that the entry phase is consistently less

    variable than the approval phase. This makes sense, because the entry phase is more highly

    automated and (as shown earlier) the sequences are more similar.

    Network Graphs and Measures

    The simplest way to summarize a large number of patterns is to break each sequence of

    action into pairs of actions. In this way, workflow data can be processed and displayed as a

    valued, directed network graph, as shown in figure 5. Each node in the graph represents a type

    of action, while the arcs represent the sequential relationship between actions. Formally, these

    graphs are first-order Markov models, except that the nodes are events rather than states.

    Pentland (1999) calls this kind of graph an action network because it indicates which actions

    follow which other actions. The action network perspective allows us to use methodologies that

    had previously been used for the study of social networks to ask new questions about the nature

    of the routine in the organization.

    Figure 5: Action network for entire routine (all 31 actions)

    If we graph the relationships between actions in each phase, we can visually compare

    how these graphs may be similar or different. Figure 5 shows the network of actions found in

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    17/25

    Using workflow data Page 17 of 25

    the initial entry phase, while Figure 6 refers to the approval phase. The thickness of the arcs is

    an indication of how often a particular sequence occurs, thicker lines indication more frequent

    connections between those actions.

    From first glance, it is easy to see differences between them. The approval phase is much

    more interconnected and varied than the entry phase. This is consistent with the comparison of

    sequential variety in figure 4. More numerous (denser) connections should be correlated with

    higher sequential variety.

    Figure 5 : Action Network for Entry Phase

    2

    5

    7

    8

    9

    10

    11

    12

    13

    14

    1621

    24

    25

    29

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    18/25

    Using workflow data Page 18 of 25

    Figure 6: Action Network for Approval Phase

    2

    3

    5

    6

    8

    14

    15

    16

    1718

    19

    20

    23

    25

    28

    29

    31

    Once the routine has been represented as a network, a large number of analytical tools

    and techniques become available (Wasserman and Faust, 1994). In addition to the overall

    density, we can compute centrality of particular events, block models and structural equivalence.

    On a practical level, we can detect potential bottlenecks and most common paths.

    Dynamic networks

    Because we have a sequences sampled over a long period of time, we can show how the

    pattern of action within the routine changes over time using a software application called SoNIA

    (Moody, McFarland and Bender-deMoll, 2005). SoNIA creates dynamic network movies that

    allow visualization of changes in a network over time. On paper, we cannot display the movies,

    but figure 7 shows three time slices from the entry phase, based on consecutive samples of 25

    invoices each. From these network graphs, the human eye can determine some common

    structures and gain further insight into the patterns of action over time. When the full set of

    invoices is animated, these patterns are even more apparent.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    19/25

    Using workflow data Page 19 of 25

    2

    4

    7

    8

    9

    10

    11

    12

    13

    14 16

    21

    24

    30

    2

    4

    7

    8

    9

    10

    11

    12

    13

    14 16

    21

    24

    30

    2

    4

    7

    8

    9

    10

    11

    12

    13

    14 16

    21

    24

    30

    Time 1 Time 2 Time 3

    Figure 8: Network of action changes over time (entry phase)

    Dynamic network graphs give an exciting moving picture that allows the human brain to

    use its abstraction and induction capabilities to discover shapes and regularities in the movement

    of nodes and arcs. Time is vital to our understanding of the dynamics and flexibility found in an

    organizational routine, and the moving network graphs give us this.

    Discussion

    The ability to identify and compare organizational routines based on large samples of

    performances creates some exciting new possibilities that we have only begun to explore here.

    In the final section of the paper, we offer some thoughts on the limits and possibilities of this

    approach as we currently understand it.

    Data, data everywhere...

    The most exciting aspect of workflow data is the large and growing number of different

    organizations and processes for which it may be available. Workflow systems are built into

    enterprise software, such as SAP and Peoplesoft. In fact, basic workflow services are built into

    every copy of Microsoft Windows (MS Sharepoint is a workflow manager). So it is safe to say

    that workflow data is ubiquitous, but there may be difficulties preventing its universal use in

    research.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    20/25

    Using workflow data Page 20 of 25

    The first challenge, of course, is getting it. In many operational systems, logging is

    turned off because it would rapidly consume too much disk space. If so, then recording

    workflow data would require that the organization agree to activate this functionality.

    Depending on how the event data are stored, it may contain data that are considered confidential

    or sensitive. If so, it may be difficult to get an organization to invest the resources required to

    extract sequences of events (or other abstract representations) that do not compromise the

    confidentiality of the organization or its members. Organizations are understandably reluctant

    to share this kind of data, and protracted negotiations and non-disclosure agreements may be

    required. Thus, while the marginal cost of a thousand extra performances may be very low, the

    cost of getting any data at all may still be quite high. Unless the event log is stored in a single

    file or database table, considerable technical skill may be required to extract the data.

    The second challenge is insuring that it is comparable between organizations. Workflow

    software can be configured differently by each organization using it. The utility of workflow

    event logs depends to a great extent on what kinds of events have been recorded. The event log

    records events from a particular point of view, and only to the extent that programmers (and

    system configuration) allows. Different events may be monitored in the event log. Different

    processing rules may be used to automate different steps in the process. These differences are

    interesting and potentially researchable in their own right, but they could diminish or destroy the

    comparability of the data in the event logs.

    The third challenge in working with event log data is the novelty of the methods. We

    have found that we are making rather extensive use of SQL and Visual Basic to create

    customized data analysis procedures. Some analytical tools exist (e.g., for string matching), and

    others are being developed. One of the most promising developments is the availability of an

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    21/25

    Using workflow data Page 21 of 25

    XML-based standard for representing workflow data (Van Dongen & van der Aalst, 2005) and

    an open-source software platform for analyzing these data structures(Van Dongen & van der

    Aalst, 2005) . This platform, called ProM, supports the use of plug-ins for analysis. While the

    availability of such tools lowers the barrier to entry somewhat, analyzing workflow data requires

    skills that are not part of a typical social science PhD program.

    Identifying structure is important

    While workflow data presents some challenges, the potential payoff is significant. For

    example, it can provide a better understanding of the underlying structure of routines. For a

    simple case like invoice processing, detailed analysis of the workflow might be perceived as

    overkill. But even in our simple example, some interesting issue arose. For example, is invoice

    processing one routine or two? How many routines are generating the patterns we observe, and

    how similar are they?

    Based on the lexicon, the entry and approval phase seem to be very similar. They have

    almost the same size lexicon, and nearly the same number of steps per performance. However,

    the actions in each phase are different and there is little overlap between the actions.

    Furthermore, the phases are very different in sequential variety, and in the actual patterns of

    action. Based on this analysis, invoice processing might be better characterized as two

    routines, and additional subroutines of these may be discovered with additional analysis.

    Combinatorics of organizational evolution

    This minor difference in description is important for a variety of reasons. First,

    organizations are hypothesized to adapt and evolve by combining and recombining routines.

    Routines are like genetic material, or perhaps building blocks, whose combinations can create

    meaningful differences in the overall organization. But what is the level of granularity at which

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    22/25

    Using workflow data Page 22 of 25

    these combinations can occur? Is it high-level processes, such as order-to-cash? Or is it the

    nominal process we see in the process map (e.g., invoice processing)? Or is it at an even lower

    level, such as the entry and approval phases identified here? By identifying distinct routines,

    we begin to set some boundaries on the combinatorics of organizational adaptation at the level of

    routines.

    While workflow event logs can help identify meaningful sub-routines, they may not be

    very reliable as a tool for observing novel combinations and re-combinations. This is because

    the event log is generated by a technical artifact (a software system) that could be swapped in (or

    out) of the picture at any time. It would only be possible to observe re-combinations at a lower

    level of granularity and a shorter time-scale than the life-span of the system itself.

    Sequential variety may indicate evolutionary potential

    Such smaller scale variations could be conceptualized as evolution within a routine, as

    well. Feldman and Pentland (2003) suggest the possibility of variation and selective retention

    within routines, as exception handling and improvisation create variations that may be

    recognized and retained as a part of the routine. Variability in routines is important because it

    creates opportunity for change. As new ways of doing things are tried out (accidentally or

    intentionally) and incorporated into the organizations repertoire of action.

    To observe this evolutionary process, we need detailed data on the patterns of action in

    each performance, over time. Of course, workflow data offers just such an opportunity. We

    expect that routines with higher sequential variety would have a greater possibility of displaying

    evidence of variation and selective retention. For example, the approval phase has much more

    variability than the entry phase, as shown in figure 4. If the organization is able to select

    desirable patterns, and retain them, they should be able to achieve improved outcomes over

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    23/25

    Using workflow data Page 23 of 25

    time. For organizations that wish to direct their evolution intelligently, the identification of

    patterns, causes and outcomes is necessary to selectively retain the best patterns for execution in

    specific circumstances.

    Antecedents and consequences

    Workflow patterns can also be linked to outcome or performance variables. Outcome

    measures, such as cycle time, are often available from workflow systems, using the same data

    collection techniques as described earlier. By utilizing such measures we could answer questions

    about the consequence of sequential variety and particular workflow patterns. Potentially, one

    could investigate the criteria that seem to drive selection of routines (cost, quality, cycle time,

    etc.).

    It is also possible to study the antecedents of the structure of routines. Do different

    environments and organizations tend to produce the same patterns, or are there systematic

    differences? Do different organizations, given similar environments, produce similar patterns?

    Are there characteristics of the persons or team responsible for the workflow system that may

    predict variation in patterns of actions? In other words, are routines shaped more by the external

    environment or by internal features of the organization? While answers to these questions seem

    a long way off at the moment, the ability to analyze workflow data lets us contemplate them in

    ways that were never possible before.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    24/25

    Using workflow data Page 24 of 25

    References

    Abbott, A. (1995). Sequence Analysis: New Methods for Old Ideas. Annual Review of Sociology,21, 93-113.

    Barley, S. R. (1990). Images of Imaging: Notes on Doing Longitudinal Field Work.Organization Science, 1(3), 220-247.

    Basu, A., & Kumar, A. (2002). Research Commentary: Workflow Management Issues in e-Business.Information Systems Research, 13(1), 1-14.

    Becker, J., zur Muehlen, M., & Gille, M. (2002). Workflow Application Architectures:

    Classification and Characteristics of workflow-based Information Systems. Workflow

    Handbook, 39-50.

    Becker, M. C. (2004). Organizational routines: a review of the literature. Industrial and

    Corporate Change, 13(4), 643-678.

    Butts, C. T. (2001). The complexity of social networks: theoretical and empirical findings. Social

    Networks, 23 31-71.

    Chi, M. T. H., Feltovich, P. J., & Glaser, R. Categorization and Representation Physics Problems

    by Experts and N0vices. cOGNITIVE SCIENCE, 5, 121-152.

    Feldman, M. S., & Pentland, B. T. (2003). Reconceptualizing Organizational Routines as a

    Source of Flexibility and Change.Administrative Science Quarterly, 48(1), 94-118.

    Haerem, T., & Rau, D. (In Press). The Influence Of Degree Of Expertise And Objective TaskComplexity On Perceived Task Complexity And Performance.Journal of Applied

    Psychology.

    Howard-Grenville, J. A. (2005). The Persistence of Flexible Organizational Routines: The Role

    of Agency and Organizational Context. Organization Science, 16(6), 618.

    Kaspar, F., & Schuster, H. G. (1987). Easily Calculable Measure for the Complexity of

    Spatiotemporal Patterns. Physical Review A 36(2), 842-848.

    Lempel, A., & Ziv., J. (1976). On the Complexity of Finite Sequences.IEEE Transactions onInformation Theory, 22.

    Moody, J., McFarland, D., & Bender-deMoll, S. (2005). Dynamic Network Visualization.American Journal of Sociology, 4, 1206-1241.

    Nelson, S. G., & Winter, R. R. (1982).An Evolutionary Theory of Economic Change.Cambridge, MA: Harvard University Press.

  • 7/30/2019 Using Workflow Data to Explore the Structure of an Organizational Routine

    25/25

    Using workflow data Page 25 of 25

    Pentland, B. T. (1995). Grammatical Models of Organizational Processes. Organization Science,6(5), 541-556.

    Pentland, B. T. (1999). Organizations as Networks of Action. In J. A. C. Baum & W. McKelvey

    (Eds.), Variations in Organization Science: Essays in Honor of Donald T. Campbell .Thousand Oaks, CA: Sage.

    Pentland, B. T., & Feldman, M. S. (2005). Organizational Routines as a Unit of Analysis.

    Industrial and Corporate Change, 14(5), 793-815.

    Sankoff, D., & Kruskal, J. B. (1983). Time Warps, Strings Edits, and Macromolecules: The

    Theory and Practice of Sequence Comparison. Reading, MA: Addison-Wesley.

    van der Aalst, W. (2003). Business Process Management: Past, Present and Future.

    van der Aalst, W., & Weijters, A. J. M. M. (2004). Process Mining: A Research Agenda.

    Computers in Industry, 53(3), 231-244.

    Van Dongen, B. F., Medeiros, D. J., Verbeek, H. M. W., Weijters, A., & van der Aalst, W. M. P.

    (2005). The ProM framework: A new era in process mining tool support. Paperpresented at the 26th International Conference on Applications and Theory of Petri Nets

    (ICATPN 2005), Heidelberg.

    Van Dongen, B. F., & van der Aalst, W. M. P. (2005).A Meta Model for Process Mining Data.

    Paper presented at the CAiSE05 WORKSHOPS.

    Wasserman, S., & Faust, K. (1994). Social Network Analysis: methods and applications:

    Cambridge University Press.