ORIGINAL ARTICLEmontali/papers/demasellis-etal...e-mail: [email protected] R. De Masellis e-mail: [email protected] M. Montali · D. Solomakhin Free University of Bozen-Bolzano

J Data SemantDOI 10.1007/s13740-014-0036-6

ORIGINAL ARTICLE

Semantic Enrichment of GSM-Based Artifact-Centric Models

Riccardo De Masellis · Domenico Lembo ·Marco Montali · Dmitry Solomakhin

Received: 14 April 2013 / Revised: 20 February 2014 / Accepted: 25 February 2014© Springer-Verlag Berlin Heidelberg 2014

Abstract We provide a comprehensive framework forsemantic GSM artifacts, discuss in detail its properties, andpresent main software engineering architectures it is able tocapture. The distinguishing aspect of our framework is that itallows for expressing both the data and the lifecycle schemaof GSM artifacts in terms of an ontology, i.e., a shared and for-malized conceptualization of the domain of interest. To guidethe modeling of data and lifecycle we provide an upper ontol-ogy, which is specialized in each artifact with specific lifecy-cle elements, relations, and business objects. The frameworkthus obtained allows to achieve several advantages. On theone hand, it makes the specification of conditions on data andartifact status attribute fully declarative and enables semanticreasoning over them. On the other, it fosters the monitoringof artifacts and the interoperation and cooperation amongdifferent artifact systems. To fully achieve such an interoper-ation, we enrich our framework by enabling the linkage of theontology to autonomous database systems through the use ofmappings. We then discuss two scenarios of practical inter-est that show how mappings can be used in the presence ofmultiple systems. For one of these scenarios we also describe

R. De Masellis · D. Lembo (B)Sapienza Università di Roma,Via Ariosto 25, 00185 Rome, Italye-mail: [email protected]

R. De Masellise-mail: [email protected]

M. Montali · D. SolomakhinFree University of Bozen-Bolzano Piazza Domenicani3, 39100 Bolzano, Italye-mail: [email protected]

D. Solomakhine-mail: [email protected]

a concrete instantiation of the framework and its applicationto a real-world use case in the energy domain, investigatedin the context of the EU project ACSI.

Keywords Data-aware process modeling · Artifact-centricprocesses · Guard Stage Milestone lifecycle · Ontologies ·Ontology-based data access · Process monitoring

1 Introduction

Recent work in business processes, services, and databasesbrought the necessity of considering both data and processessimultaneously while designing enterprise systems. Thisholistic view of considering data and processes together hasgiven rise to a line of research known under the name ofdata-aware business processes [2,4,44,48], aiming to avoidthe notorious discrepancy of traditional activity-centric mod-els where these two aspects are considered separately. One ofthe first proposals in this area is the artifact-centric approach(see, e.g., [24,48]), which relies on the notion of artifact, abusiness-relevant conceptual entity in a given domain whichcombines both static properties, describing the data of inter-est, and the dynamics, induced by processes that manipulatesuch data.

Even though the tight integration of data and processes iscentral for artifact-centric systems, the information managedby artifacts is typically captured by means of rather simplestructures, such as lists of relevant attributes. A rich, concep-tual, and well-founded modeling of the data component is yetto come. Furthermore, a suitable balance between data andprocesses is often missing, leading to artifacts which rely ondata structures essentially tailored to the process they haveto serve, instead of richer structures able to fully reflect thecomplexity of the domain of interest.

123

R. De Masellis et al.

The main negative consequence is that it is difficult toexploit the artifact data that go beyond those of the specificprocess execution. This, in turn, makes difficult to(i) governthe entire enterprise system, (ii) interconnect the data manip-ulated by the different artifacts so as to construct a unique,high-level view of them, (iii) evolve the system so as to incor-porate new features impacting on the data component, and(iv) support interoperation with new processes and externalsystems.

By leveraging on the recent, extensive work on ontology-based data access (OBDA) (see, e.g., [14,40,50]), our pri-mary goal is to overcome these limitations by proposing aframework for the semantic enrichment, governance, andmanagement of artifact-centric systems. In particular, wepropose to shift artifacts to the business level of abstraction,bringing into artifact models the idea of modeling the domainof interest in terms of an ontology, which thus become theheart of the whole artifact system. At the same time, OBDAtechniques enable the (efficient) realization of several, recur-ring architectural solutions adopted in software engineeringto attack the complexity of the system-to-be.

Ontologies provide indeed a formal and explicit concep-tualization of the domain of interest [34] and are increas-ingly adopted in the development of information systems, asthey facilitate comprehension, sharing, and communicationof domain knowledge, at the same time providing a plethoraof reasoning services for intelligent data access and integra-tion. This is possible because (domain) ontologies have a for-mal underpinning in Description Logics (DLs) [7]1, whichare decidable fragments of first-order logic (FOL) that canbe used to represent structural knowledge of a domain ofinterest in an unambiguous, formally grounded way. Theplethora of reasoning services associated with DLs2, includ-ing, e.g., query answering, consequently allow for a run-time,live exploitation of ontologies, which goes far beyond theirusage for modeling purposes only.

Even though the contribution of this work is orthogonal tothe artifact/process modeling language of choice, we showhow our framework can be concretely exploited by groundingit in the recently proposed Guard-Stage-Milestone (GSM)artifact modeling language [25,38].

The choice of GSM is motivated by three main reasons:(i) GSM has a precise execution semantics which providesa solid basis supporting both implementation-related aspectsand formal investigation; (ii) it particularly benefits from cap-turing data at the conceptual level, as its declarative natureintensively relies on queries posed over the data to properly

1 DLs are the logical counterpart of OWL, the W3C standard for ontol-ogy specification http://www.w3.org/TR/owl2-overview/.2 See, for instance, the services offered by the DL-based reasonerspresented by [19,23,35,51,52,55].

drive the execution and evolution of processes and artifacts,and (iii) its main constructs have been recently adopted bythe Object Management Group in the standard for Case Man-agement Modeling Notation (cf. [49]).

The contributions of this paper can be summarized as fol-lows:

– We provide a formal definition of the semantic GSMmodel. Differently from the classical GSM, in seman-tic GSM the information model is given in terms ofan ontology, and conditions on data and artifact statusattributes, used in the specification of GSM lifecycles,are all expressed over the ontology. Furthermore, in ourformalization the GSM lifecycle schema itself is mod-eled through an ontology. The advantage of this featureis twofold: on the one hand it allows for advanced formsof querying over the status of GSM; on the other hand,the framework provides a common, uniform representa-tion for both the lifecycle and the data schema. To guidethe modeling of both such aspects, we provide an upperontology which constitutes the (abstract) core of the over-all conceptual schema to be defined in each artifact. Eachspecific artifact provides then its own specialization of thisupper layer, enriching the ontology with its own lifecycleelements, relations, and business objects.

– We enrich the semantic GSM framework by enabling thelinkage of the ontology towards autonomous database sys-tems, possibly with heterogeneous schemas. To this aim,we borrow the notion of mapping from the data integra-tion [28,41] and OBDA [50] literature. The mapping actu-ally establishes a semantic correspondence between datastored in data sources and the instances of the ontology.This correspondence consequently fosters collaborationand communication among different artifact-centric sys-tems, heterogeneous and legacy data management systems,and multiples applications, as they can continue to workwith their own data formats and schemas, but the data theymaintain can be understood in terms of the ontology. Inparticular, we discuss two main software engineering sce-narios of practical interest in which these needs clearlyarise:

– a system constituted by a back-end information sub-system, controlled by a set of semantic GSM artifacts,and multiple front-end applications running their ownprocesses, which, however, need to access data pro-duced by the back-end;

– a system encompassing multiple interacting subsys-tems running their own internal processes, on the topof which an ontology is posed to provide a globalview of the manipulated data, which in turn allows tomonitor and govern the underlying processes at thebusiness level.

123

http://www.w3.org/TR/owl2-overview/


– We discuss an instantiation of the framework for seman-tic monitoring and governance of artifact systems adoptedwithin the EU project ACSI—Artifact Centric ServiceInteroperation3, and its application to a real-world use casein the energy domain, investigated in such a project. Exam-ples provided throughout the paper are also taken from theACSI energy use case. In fact, this use case triggered theresearch presented in this paper, providing at the same timethe fundamental motivations for introducing our frame-work, and a valid test-bed for experiencing it.

In principle, the framework we present in this paper isparametric with respect to the language used for represent-ing the ontology and for querying it, as well as with respectto the forms of mappings that can be adopted to link theontology with external data management systems. The onlyrequirement we need to impose is the adoption of an ontologylanguage that is able to encode the upper ontology we put atthe core of the framework and to extend it to the domain-specific component of the artifact ontology. We notice thatthe language expressivity requested to meet this requirementis quite limited and that many common basic ontology lan-guages essentially provide it. As for the query language, weonly impose that it has to guarantee decidable query answer-ing. For the sake of concreteness, we propose accordinglythe use of a query language which allows for decidable queryanswering even over expressive ontologies and at the sametime ensures enough expressibility for modeling purposes.Even though a complete investigation of the computationalproblems related to reasoning that arise in the presence ofspecific choices for the mentioned languages is out of thescope of the present paper, we discuss in depth these issuesfor the instantiation of the framework that we investigated inthe ACSI project.

The rest of the paper is organized as follows: In Sect. 2and in Sect. 3, we provide some preliminaries on the GSMmodel, and on DLs and their linkage to data, respectively. InSect. 4, we propose our framework for semantic GSM arti-facts. In Sect. 5, we discuss the architecture in which multi-ple front-end applications access the data produced by a setof back-end GSM semantic artifacts. In Sect. 6, we presentthe architecture in which the ontology is used as a concep-tual, global entry-point to understand the data produced bymultiple running (relational) processes, and discuss how theframework can be exploited towards semantic monitoringand governance of such processes. In Sect. 7, we describethe instantiation of such framework for semantic governanceof processes as realized within the ACSI project. Then, inSect. 8, we discuss some related work, and, in Sect. 9, weclose the paper with a final discussion and conclusions.

3 http://www.acsi-project.eu/.

2 The GSM Model

In this section we describe the main characteristics of theGSM model. We first provide an informal description andthen give precise formalization of the model.

2.1 Informal Introduction

Artifacts, or artifact types, are key business entities of a givendomain, which are characterized by

– a data schema (also called information model) that cap-tures the data maintained by the artifact,

– a lifecycle schema that specifies the possible progressionsof the artifact, and how the underlying data are manipu-lated as the result of a progression step.

– a set of artifact instances, which are instantiations of thecorresponding data and lifecycle models of the artifacttype. The description of a particular business process mayinvolve several instances of different artifacts types.

The GSM artifact modeling language, recently introducedby [25] and [38], provides means for specifying businessartifact lifecycles in a declarative manner, using intuitivelynatural constructs that correspond closely to how executive-level stakeholders think about their business. The main GSMnotions and components of an artifact are

– the data schema (Att)—a set of (possibly nested)attributes, used to capture the domain of interest, whichcan be either

– data attributes, which represent data relevant to thebusiness,

– status attributes, which hold information about theprogress of the artifact instance along its lifecycle.

– Sentries: data-aware expressions, involving events andconditions over the artifact data schema. Sentries havethe form on e if cond, where e is an event and cond is acondition over data. Both parts are optional, supportingpure event-based or condition-based sentries.

– Tasks: units of atomic business-relevant work that areto be performed by an external agent (either human ormachine) in order to update the data schema of an artifactinstance.

– Milestones (Mst): a set of sentries, corresponding tobusiness operational objectives, achieved on the basis oftriggering events and/or conditions over the data schema.

– Stages (Stg): a set of elements, which correspond to clus-ters of activities intended to achieve milestones, and maybe organized into a hierarchy, as they can be either:

– atomic stages, containing exactly one atomic task.

123

http://www.acsi-project.eu/


– composite stages, containing other sub-stages.

At a given moment in time, a stage may be activated(having a status open), which corresponds to a state whenactivities within the stage are being executed. Along theprocess, any stage may be executed multiple times, butit cannot have two occurrences that are being executedsimultaneously.

– Guards (Grd): a set of sentries, which control when astage can be activated for execution.

– Events: set of typed events, which describe the interactionbetween artifact instances and the environment and whichcan be either:

– task invocation, whose instances are populated by thedata from data schema and then sent to the environ-ment to perform a task;

– task termination, whose instances represent the corre-sponding answer from the environment and are usedto incorporate the obtained result back into the artifactdata schema;

– status event, which correspond to any change of astatus attribute, such as opening a stage or achievinga milestone, and can be further used to govern theartifact lifecycle.

– one-way events, which are sent by the environmentand which are used to trigger specific guards or mile-stones.

The operational semantics for GSM is centered aroundtwo notions:

– snapshot—at any point in time it is the state of any givenartifact instance, which is stored according to its dataschema, and is characterized by: (i) values of attributesin the schema, (ii) status of its stages (open or closed),and (iii) status of its milestones (achieved or invalidated).

– business step, or B-step (formally defined in [25]), whichcorresponds to a transition from one snapshot of the sys-tem (before processing the event) to a new one, resultingfrom the incorporation of an event sent by the environ-ment. Incorporation of an event corresponds to processall the effects that the event triggers in the system. Sucheffects are determined based on a set of Event-Condition-Action (ECA) rules and result in issuing a set of statusevents, each of which can trigger further changes.

In order to guarantee that the set of status events generatedduring a B-step is actually finite, a GSM schema has to bewell-formed. Indeed, since the ECA rules can contain nega-tion, they suffer from the same well-known issues in logicprogramming and datalog, namely they can keep firing indef-initely. Such an undesired behavior is avoided by requiring a

sort of stratification (see, e.g., [6,31]), which imposes ECArules to be acyclic and fire in a specific order.

2.2 Formal Basis

This section formalizes the concepts introduced previouslyand gives a brief intuition of the incremental semantics forGSM.

Definition 1 (GSM schema) A GSM schema is a tuple(x, Att, Stg,Mst, Lcyc), where

1. x is a variable that ranges over (IDs of) instances of theartifact;

2. Att , Mst and Stg are the sets described above and theyare called the data schema of the artifact;

3. Lcyc = (Substage, T ask, Owns,Guards, Achv) isthe lifecycle schema, where

(a) Substage is a hierarchical relation over Stg;(b) T ask is a function from atomic stages in Stg to the

set of possible tasks;(c) Owns is a function from Stg to finite, non-empty

subsets of Mst ;(d) Guards is a function from Stg to finite sets of sen-

tries (see below);(e) Achv is a function from Mst to finite sets of sentries;

While sets Stg and Mst are simply the set of stages andmilestones, respectively, the set Att is the union of twodisjoint sets: attd , the data attributes and atts , the statusattributes, plus a special attribute Last I ncEventT ype thatstores the type of the event that is currently being consumed.Formally, Att = attd ∪atts ∪ Last I ncEventT ype. The setof status attributes is composed by boolean attributes s foreach stage s ∈ Stg, which is true if s is currently open orf alse otherwise, and boolean attributes m for each milestonem j ∈ Mst , which specifies whether m is achieved (true) orinvalidated ( f alse).

We now introduce some preliminary definitions requiredto define the lifecycle schema. We assume to have a set ofevent names event and a domain� that includes individualsused to interpret data attributes and the two boolean valuestrue and f alse.

Definition 2 (Snapshot) A snapshot of a GSM data schemais an assignment function from attribute names to the domainand the set of event names � : Att → � ∪ event such that

– �(a) ∈ {true, f alse} for each a ∈ atts and– �(Last I ncEventT ype) ∈ event.

A snapshot� is a snapshot for a GSM schema if it satisfiesthe following invariants:

123


– GSM-1: A stage and its milestone(s) cannot be both true,i.e., for each stage s and each milestone m owned by s,�(s) and �(m) cannot be both true.

– GSM-2: No activity in closed stage. If ¬�(s) for stages ∈ Stg and s′ is substage of s, then ¬�(s′).

We now briefly introduce the condition language that willbe used for specifying the sentries. The syntax of such a lan-guage is formally presented in [43] and is out of the scope ofthis paper. We just mention that the variables of the languagecorrespond to attributes in Att and event names in event.Hence, to evaluate a formula, we need to associate variables(x1 · · · xn) occurring in it to � ∪ event. Given a formula� with variables (x1 · · · xn), we write � |� �(x1 · · · xn)

when �(�(x1) · · ·�(xn)) evaluates to true accordingly tothe semantics of the condition language. The language canalso refer to the so-called status events. A status event fora GSM data schema is an expression of the form ¬a ∧ a′or a ∧ ¬a′ where a ∈ atts . To ease the notation, fromnow on we use +a as a shortcut for ¬a ∧ a′ and −a fora ∧ ¬a′. The intuitive meaning is that +a is true when ashifted from false to true during the course of a B-step, andanalogously for −a. A status event can hence refer to twosnapshots, � and �′, where we establish the convention, ascustomary in the verification community, that primed snap-shots �′ are constructed after �. Consequently, in formu-las, we will use primed variable symbols for variables thatshould be associated with elements in� according to�′ andunprimed variable symbols for variables that should be asso-ciated with elements in� according to �. Formally, given aformula �(x1 · · · xn, x ′

1 · · · x ′m) where x1 · · · xn, x ′

1 · · · x ′m ∈

Att , the pair (�,�′) satisfies �, denoted (�,�′) |�� if �(x1/�(x1) · · · xn/�(xn), x ′

1/�′(x1) · · · x ′

m/�′(xm))

evaluates to true, where (xi/�(xi ) substitutes to xi the value�(xi ) in �.

We are now ready to define the set sentry of sentries fora GSM schema. A sentry for a GSM data schema is a booleanformula of the form τ ∧ γ , where

– τ is either of the following:

– empty;– LastIncEventType = E or– {+,−}a for some status attribute a ∈ atts .

– γ is a formula that contains no event type variables norstatus events.

Notice that a sentry τ ∧γ can be expressed in the classicalform as on τ if γ .

We now turn to the notion of event. We already discussedstatus events above (and the way they can be used in sen-tries), so we now focus on events that allow artifacts to com-municate with the environment. The environment represents

the external world, or, in other words, everything that is notmodeled as an artifact. The environment performs externaltasks, such as human tasks, that are invoked by the artifactsthrough business events. One-way events are sent unsolicit-edly from the environment to an artifact or from an artifactto another artifact. An incoming one-way (event) type is atriple E = (N , O, ψ), where N ∈ event is the event name,O is the event payload structure which is a list of attributesin attd , and ψ is a (post-)condition whose variables refers toattributes in O . A one-way event instance, or simply a one-way message event is a pair e = (N , p) where p : O → �

is the payload such that p |� ψ . The condition ψ in a one-way event type formally represents restrictions on the outputattributes.

Before formally introducing task invocation and task ter-mination event types, we define tasks. Let task be a set oftask names, disjoint from the other sets of names alreadyestablished. A task is a tuple (T, I, O, ψ) where T ∈ taskis a task name, I ⊆ attd are the input attributes, O ⊆ attd arethe output attributes, and ψ is a logical formula in the con-dition language expressing the postconditions of the task.Given that the postcondition should refer to two differentsnapshots � and �′, where � is the snapshot of the systemwhen the task is invoked and �′ is the next snapshot whenit finishes, ψ refers to attributes in I without primes andattributes in O with primes. A task invocation event type isthen a pair E = (T, I ) where T is a task name and I arethe input attributes of T . A task invocation event instanceof type E is a tuple e = (T, p) where p : I → � is theinput payload of the event. A task termination event type isa triple E = (T, I, O) where T is a task name, and I and Oare the input and output attributes of T . A task terminationevent instance is a triple e = (T, p, p′) where (T, p) is atask invocation event instance, p′ : O → � is the output ofthe task, and (p, p′) |� ψ evaluates to true. Here p is calledthe input payload of e and p′ is called the output payload ofe.

Next, we present our running example, which is extractedby a real-world use case scenario developed for the FP7 Euro-pean Project ACSI.

Example 1 From a high-level perspective, the electric sup-ply system is a net of control points (CPs) which generate anddistribute the energy for a whole country. Each control pointis a point in the net where two electric companies exchangeenergy between each other. A centralized organization calledsystem operator is in charge of planning the production andmonitoring the energy trade. Every month, for a certain con-trol point, each company participating in the CP submits aso-called monthly report application to the system opera-tor, which contains a set of measurements, each describingenergy trade for a specific day with the other company con-nected to the control point. Such values are determined by

123


Fig. 1 Graphical representation of the control point assessment artifactdata schema described in Example 1

companies either automatically by hardware or manually byan energy manager. When the system operator receives twoapplications for each control point, it cross-checks data andpublishes a control point assessment, possibly after a manualinspection when values do not match.

We model such a process in GSM by introducing a con-trol point assessment (CPA) artifact. We assume a relationalrepresentation of data, and Fig. 1 provides a graphical rep-resentation of it. The data schema of the artifact providesinformation about both the assessment to be published for aspecific CP and the measures received by the two companiesconnected to the CP.

Relation C P As stores the id of the control point assess-ment (C P AI D); the id and location of the control point(C P I D and location); the month and the year the assess-ment refers to and its publication date. Relation C P AMeaskeeps track of the measures (Meas I D) chosen by the sys-tem operator for a given CPA. This measure is among thosewhich have been proposed (in the monthly report applica-tion) by the two companies connected to the control point.Later on we explain how such values are chosen. The othertwo relations store the set of measurements performed bythe companies. In particular, manually determined valuesare kept in the ManMeas table, which contains the id ofthe measure, the company name, the value of the measure,the date it refers to, the manager in charge of the manualinput, and the input date. Automatically determined values,on the other hand, are stored in the AutoMeas, in which theid of the measure, the company, the date, and the value of themeasure are the only relevant information.

Figure 2 shows a graphical representation of the CPA arti-fact lifecycle. When an instance of CPA artifact is created(for a specific CP) by a one-way creation event from the envi-ronment, Requesting stage opens. The scope of associatedactivity is to request monthly measures from the two com-panies connected to the CP. Indeed, ReqMR1 and ReqMR2open in parallel and their tasks send a task invocation eventto the environment. When a response event is consumed,its payload, containing all the information for a monthlyrequest application, is written in the data schema of the arti-fact (precisely in the ManMeas and AutoMeas relations inFig. 1) and milestone MR1Received (or MR2Received) isachieved. When both ReqMR1 and ReqMR2 close, mile-

stone MRAppsReceived is achieved and stage Requestingcloses. Stage CPADrafting opens when MRAppsReceivedand its three substages take care of analyzing the mea-surement for each day. In particular, EqualMeas opens ifthere exist a couple of measurements provided by the twocompanies which agree on their values for a specific date.In this case, indeed, such a measure is written in tableC P AMeas. In other words, task wrMeas takes care ofwriting measurements which the two companies agree on.Substage DiffMeasAuto opens when there are two measure-ments that disagree (for a specific day) but one of them hasbeen performed automatically. In this case, indeed, the sys-tem operator will chose the automatic measure to be writtenin C P AMeas. The last substage, DiffMeasMan, covers thecase in which the measurements disagree and they are bothperformed automatically or manually. A manual inspectionis then needed in order to choose one of those.

Tables 1 and 2 show sentries for guards and milestone forthe energy example, respectively. Such sentries are expressedas first order logic formulas.

3 Description Logic Ontologies and Their Linkage toData

In this section we recall some basic notions on DescriptionLogic ontologies, and on mechanisms to map ontologies todatabases, which are taken over from the research on dataintegration [28,41].

3.1 Description Logic Ontologies

Description Logic (DL) ontologies model the domain ofinterest in terms of objects (a.k.a. individuals), concepts,which are abstractions for sets of objects, roles, whichdenote binary relations between objects, value-domains,which denote sets of values, and attributes, which denotebinary relations between objects and values. In the rest ofthe paper we refer to an alphabet �, starting from which DLexpressions are built. � is the disjoint union of �P , con-taining symbols for atomic concepts, atomic value-domains,atomic attributes, and atomic roles, and �C , which containssymbols for constants (each denoting either an object or avalue). Complex expressions are constructed starting fromatomic elements, and applying suitable constructs. DifferentDLs allow for different constructs.

A DL ontology is constituted by two main components: aTBox (i.e.,“Terminological Box”), that stores a set of univer-sally quantified FOL assertions stating general properties ofconcepts and roles, thus representing intensional knowledgeof the domain, and an ABox (i.e.,“Assertional Box”), that isconstituted by assertions on individual objects, thus specify-ing extensional knowledge. Again, different DLs allow for

123


Fig. 2 Graphical representationof the control point assessmentartifact lifecycle described inExample 1

Table 1 Guards for the lifecycle in Fig. 2

Stage Guard sentry

on if

Requesting CPACreation Event −RequestMR1 +Requesting −RequestMR2 +Requesting −CPADrafting +MRAppsReceived −

EqualMeas +CPADrafting

∃id,m1id,m2id, c1, c2, d, v.C P As(id,−,−,−,−,−) ∧ c1 �= c2 ∧ m1id �= m2id∧((ManMeas(m1id, id, c1, d, v,−,−) ∧ ManMeas(m2id, id, c2, d, v,−,−))∨(Auto_meas(m1id, id, c1, d, v) ∧ Auto_meas(m2id, id, c2, d, v))∨(ManMeas(m1id, id, c1, d, v,−,−) ∧ AutoMeas(m2id, id, c2, d, v)))

DiffMeasAuto +CPADrafting∃id,m1id,m2id, c1, c2, d, v.C P As(id,−,−,−,−,−)∧

c1 �= c2 ∧ v1 �= v2 ∧ m1id �= m2id∧ManMeas(m1id, id, c1, d, v1,−,−) ∧ AutoMeas(m2id, id, c2, d, v2)

DiffMeasMan +CPADrafting

∃id,m1id,m2id, c1, c2, d, v1, v2.C P As(id,−,−,−,−,−)∧c1 �= c2 ∧ v1 �= v2 ∧ m1id �= m2id∧

((ManMeas(m1id, id, c1, d, v1,−,−) ∧ ManMeas(m2id, id, c2, d, v2,−,−))∨(AutoMeas(m1id, id, c1, d, v1) ∧ AutoMeas(m2id, id, c2, d, v2)))

Table 2 Milestones for the lifecycle in Fig. 2

Milestone Milestone sentry

on if

MR1Received Wait M R1T erm Event −MR2Received Wait M R2T erm Event −MRAppsReceived − MR1Received ∧ MR2Received

valsWrttn wr MeasT erm Event −valChsnAuto chs AutoT erm Event −valChsnMan chs ManT erm Event −published

+valsWrttn ∨ +valsChnsAuto∨+valsChsnMan

(valsWrttn ∨ ¬EqualMeas) ∧ (valsChsnAuto ∨ ¬DiffMeasAuto)∧(valsChsnMan ∨ ¬DiffMeasMan)

123


different kinds of TBox and/or ABox assertions. Formally, aDL ontology O over an alphabet � is a pair 〈T ,A〉, where Tis a TBox and A is an ABox, whose predicate symbols andconstants are from �.

The semantics of a DL ontology O over an alphabet �is given in terms of FOL interpretations for � (cf. [7]). Wedenote with Mod(O) the set of models of O, i.e., the set ofFOL-interpretations that satisfy all TBox axioms and ABoxassertions in O, where the definition of satisfaction dependson the DL language in which O is specified. An ontologyO is satisfiable if Mod(O) �= ∅. A logical sentence, i.e.,a closed formula, φ, expressed in a certain language L, isentailed by an ontologyO, denotedO |� φ, ifφ is satisfied byevery interpretation in Mod(O) (where, again, the definitionof satisfaction depends on the language L). All the abovenotions naturally apply to a TBox T alone.

Various reasoning services can be performed over DLontologies and are supported by state-of-the-art automatedreasoners (see, e.g., [35,52,55]). Among such services, inten-sional ones do not consider the ontology ABox, and discloseproperties only implicitly specified in the ontology, as well aspermit to verify the quality of the modeling. Instead, exten-sional reasoning also involve the ABox. The most importantreasoning service of this kind is query answering, which wedescribe below.

In presenting our framework, we do not refer to a spe-cific ontology language. We only assume that the languagewe use is expressive enough to capture the upper ontology(in fact a TBox) that we will introduce in Sect. 4 and forspecializing it into domain-specific ontologies. As alreadysaid in the introduction, also basic ontology languages essen-tially fulfill these requirements. Among such languages weoften refer to DL-Lite, which is in fact a family of light-weight DLs [18,21], constituting the formal underpinning ofOWL 2 QL, one of the tractable profiles of OWL 2 [46],the W3C standard language for ontology specification [47].DLs of the DL-Lite family allow for specifying basic ontol-ogy constructs, such as subsumption between concepts,subsumption between properties (i.e., roles or attributes),typings of properties, mandatory participation of conceptsinto properties, and functionality of properties. Such DLsguarantee tractability of standard reasoning services overontologies,and in particular enable for FOL-rewritable queryanswering, which is a crucial requirement when ontologiesare linked to databases, and which we formalize below in theparagraph on querying DL ontologies.

In the rest of the paper, for the sake of simplicity, wewill provide only graphical representation of ontologies (or,better, of approximations thereof) given through Entity–Relationship diagrams. Of course, such diagrams are in factencoded into suitable logical axioms, whose semantics isgiven in terms of FOL interpretations, as said above.

3.2 Querying DL Ontologies

Given a language L, an L-query over a DL ontology (orTBox) with alphabet � is a (possibly open) L-formula over�. Let q(x) be an L-query with free (a.k.a. distinguished)variables x over an ontology O. Then, a tuple t of constantsfrom �C is a certain answer for q(x) if O |� q(t), whereq(t) is the closed formula, i.e., a sentence, obtained by sub-stituting x with t. The above definition applies to a booleanL-query (i.e., a formula with no free variables) in the stan-dard way: 〈〉 is the certain answer to q if O |� q, and wesay that the query is certainly true; conversely, if O �|� q, theset of certain answers is empty, and we say that the queryis certainly false. Then, the query answering reasoning ser-vice is defined as follows: given a DL ontology O, and anL-query q over O, compute the set of certain answers to qover O. We denote such set by cert(q,O). It is easy to seethat this is a form of reasoning under incomplete informa-tion. An important notion related to query answering is thatof FOL-rewritability, which intuitively means that one canobtain the certain answers to a query q over an ontologyO = 〈T ,A〉 by first rewriting q into a new first-order queryqr over O and then evaluating qr over the ABox A seen as adatabase. Such a rewriting has to depend only on the TBox.We call qr the FOL-rewriting of q with respect to T (we referto [18] for the formal definition). Notably, a FOL query canbe translated into SQL and, therefore, can be evaluated overany relational DBMS managing the underlying ABox. Thishas of course a crucial impact in performances and even inpractical realizability of the query answering reasoning ser-vice for ontologies.

We notice that our framework is parametric with respectto the language used for querying ontologies. However,for computational reasons, the expressivity of such languagehas to be somehow controlled in the practice. For example, itis well known that answering FOL queries in the presence ofincomplete information is undecidable [3], whereas the mostexpressive language for which decidability of query answer-ing over various DL ontologies has been shown is that ofunion of conjunctive queries (UCQs) (e.g., [18,32]).

In the rest of the paper, we refer to a query language pro-posed in [17], which allows for the use of all FOL constructsin queries in a semantically controlled way. Intuitively, decid-ability (and even tractability, in some notable cases) of queryanswering is preserved in such language since reasoning overincomplete information is needed only to answer the UCQ-subcomponents of the queries. This is obtained by virtue ofa particular semantic interpretation of the queries allowed inthis language, based on the use of an epistemic operator. Moreprecisely, one such a query, called ECQ, over a DL ontologyO is a (possibly open) domain independent formula of thefollowing form:

123


Q −→ [q] | ¬Q | Q1 ∧ Q2 | ∃x .Q | x op y,

where q is a UCQ over O, op is one among =, �=, >, <, ≥,and ≤, and [q] denotes that q is evaluated under the (mini-mal) knowledge operator (cf. [17]). To compute the certainanswers cert(Q,O) to an ECQ Q over an ontology O, we cancompute the certain answers over O of each UCQ embeddedin Q and evaluate the first-order part of Q over the relationsobtained as the certain answers of the embedded UCQs.

Interestingly, query answering of UCQ in DL-Lite is FOL-rewritable, as shown by [18]. As a consequence of this,also query answering of ECQ queries in DL-Lite is FOL-rewritable.

3.3 Linking Data To Ontologies

In the past years, a new paradigm for information integration,called OBDA, has been proposed, which is based on the use ofan ontology (in fact a TBox) acting as mediated (a.k.a. global)schema suitably linked to data sources [50]. In OBDA, datasources are seen as a relational database. As in (virtual) dataintegration, linkage towards data sources is realized throughmapping assertions. The most expressive mapping assertionsconsidered in the data integration literature are the so-calledGLAV assertions [41], which are expressions of the formφ(x) � ψ(x), where φ(x) is a query over the data sourcesand ψ(x) is a query over the global schema (the ontologyin OBDA). Intuitively, such a mapping assertion specifiesthat the tuples returned by the evaluation of φ(x) over thesource database semantically correspond (in a sense that willbe clarified below) to the formulaψ(x) and, therefore, createthe bridge between the data in the sources and the objectssatisfying the predicates in the ontology. When the formulaφ(x) is a single atom formula of the form R(x), with R asource relational predicate, the mapping assertion is calledLocal-As-View (LAV). When the formula ψ(x) is a singleatom formula of the form S(x), with S an ontology predicate,the mapping assertion is called Global-As-View (GAV).

Formally, an OBDA specification is a triple S =〈T ,M,D〉, where T is a DL TBox, D is a database, andM is a set of mappings between T and D. Its semantics isgiven in terms of FOL interpretations I over the alphabet ofT , such that (i) I satisfies T , and (ii) I satisfies M, i.e., foreach assertion φ(x) � ψ(x) in M we have that for everytuple t in the evaluation ofφ(x) over D it holds thatψ(t) eval-uates to true in I , where the notions of evaluation of φ(x)over D and ψ(t) over I depend on the particular languagein which such queries are specified. Notice that the above(classical) notion of mapping satisfaction actually considersmapping assertions as sound implications from the databaseto the ontology. The different notion of complete mappingsconsiders them as opposite implications, i.e., in this case Isatisfies an assertion of the form above if for every tuple t

such that ψ(t) evaluates to true in I it holds that ψ(t) is inthe evaluation of φ(x) over D (cf. [41]). The interpretationssatisfying both the ontology and the mapping are the modelsof the OBDA specification S, and the set of such models isdenoted by Mod(S).

A query posed over and OBDA specification S =〈T ,M,D〉 is a query posed over its TBox T . Given onesuch query q, the notion of certain answers to q over S,denoted cert(q,S), is the natural generalization of the anal-ogous notion given for stand-alone ontologies. Analogously,we can naturally extend the notion of FOL-rewritability toOBDA specifications.

We notice that, for computational reasons, it is necessaryin practice to control the expressive power of the languagesused to specify queries in the mapping. We notice, however,that the query φ(x) in a mapping assertion is posed over adatabase, where query answering actually amounts to simplequery evaluation. We can, therefore, assume that such a queryis a generic FOL query (possibly expressed in SQL), whereasit will consider ψ(x), which is a query over the ontology,expressed in the language ECQ given above.

4 Semantic GSM

In this section we propose Semantic GSM, a novel artifact-centric model which merges the expressing power of ontolo-gies for modeling and querying the data of interest with thecapability of GSM to express data evolution. Two solutionscan be adopted to obtain such a coupling. In the first, theontology models the domain of interest only, whereas the life-cycle is defined as in classic GSM. The second one amountsto use an integrated ontology that describes both the data andthe lifecycle schema. Here, we present the second option,with the aim of providing a unified view of the whole frame-work that allows for answering complex queries. Therefore,arbitrary queries over the lifecycle are now enabled, as wecan access the whole lifecycle structure, i.e., both status anddata attributes. As an example, we can directly inquire whichatomic stages are waiting for a task termination event fromthe environment, or which composite stages have an achievedmilestone, namely, those which have already been executed.Notice that, in classic GSM, answering such queries requiresan extra effort since the lifecycle schema is not explicitlyrepresented.

Let us assume an alphabet � for concepts, value-domains,attributes, roles, and constants. Intuitively, the integratedontology describes, in common language, both the data andthe lifecycle schema and provides as its core, a domainindependent “upper” ontology which has to be specializedby means of a “lower” ontology, in order to represent theschema of a specific process. Figure 3 shows a graphical rep-resentation of the upper ontology, in which EventInstance,

123


Fig. 3 Upper ontology forsemantic GSM artifacts

ArtifactInstance, BusinessObject and StatusObject arethe main concepts. Intuitively, business objects represent (theclassic GSM) data attributes, while status objects (and rela-tionships between them) the lifecycle. An artifact instancebelongsTo a specific ArtifactType. An artifact type is a par-ticular BusinessObject in that it has a lifecycle, i.e., a set ofStatusObjects connected to itself (through the SOrefersTorole). A status object is either a milestone or a stage and thisspecialization is disjoint and complete. Stages, being eitheropen or closed, can be hierarchically organized by the transi-tive substage role and should have at least one guard (whichis a sentry) and a milestone. Milestones have an achievingsentry, and they are either true or false. We distinguish invo-cation events and task termination events, either of which canbe the event currently being consumed by the system. TheLastEvent class is actually a singleton class, i.e., it allowsfor one instance only, given that a GSM system processesone event at a time. Finally, tasks are performed by stagesand they have a invocation and a termination event.

The above upper TBox does not describe a specific arti-fact process as it lacks all the domain-dependent entities.However, it can be specialized to model an actual GSMschema. The main concept of the lower ontology, i.e., theone that is intended to have an associated lifecycle, spe-

cializes ArtifactType, while all other concepts specializeBusinessObject. Having two separate concepts for arti-fact types and instances allows us to decouple data fromthe execution, as we can have artifacts data even whenthey are not evolving. Notice that in classic GSM, instead,artifacts always have an associated lifecycle. Moreover,EventInstance generalizes all event type instances requiredby the process and the same holds for Task. Also roles canbe specialized to connect the specialized concepts. Precisely,inv, term, inputs and outputs should relate specific con-cepts, as well as substage, guards, ach and owns.

Definition 3 A semantic GSM schema is a TBox T over �containing the upper ontology in Fig. 3.

According to its definition, a semantic GSM schema hasto be expressed in an ontology language that allows for thespecification of the ontology in Fig. 3 and for specializ-ing it in the schema of the application at hand. Namely,this means that such language has to capture basic ontol-ogy constructs, such as ISA between named concepts orroles (e.g., LastEvent is a subconcept of EventInstance),disjointnesses between named concepts (e.g., Open is dis-joint from Closed), role typings (e.g., the role owns istyped on StageInstance and MilestoneInstance), cov-

123


ering of concepts (e.g., a StageInstance is either Openor Closed), mandatory participation of concepts to roles(e.g., each Task performs at least a StageInstance), andfunctionality of roles (e.g., each Task performs at mosta StageInstance). We point out that such constructs areall expressible in the W3C OWL standard, but also in lessexpressive DL languages such as those used to capture classi-cal conceptual modeling languages (see, e.g., the UML classdiagram encoding described in [13]). Notably, all the afore-mentioned constructs, with the exception of concept cover-ing, are all enabled also in lightweight ontology languagessuch as DL-Lite, or some languages in the Datalog+/− frame-work [14]. In such languages, however, the ability of express-ing concept covering can be re-gained, without increasingthe complexity of reasoning, through the use of some addi-tional constraints, i.e., logical axioms whose interpretation isbased on the use of an epistemic operator, in the spirit of theapproach proposed by [18]. Intuitively, each such constraintcan be seen as a boolean ECQ query q (cf. Sect. 3), which issatisfied by an ontologyO if and only ifO entails q. This actu-ally means that the constraint is not an axiom to be exploitedfor inferring implicit knowledge by the ontology, as enabledin expressive ontology languages, but is rather an assertionthat has to be satisfied over the actual data. In other terms,an ABox compliant with our upper ontology expressed inDL-Lite enriched with epistemic constraints has to alwaysexplicitly assert whether a StageInstance is Closed orOpen, and a MilestoneInstance is True or False. Weargue that in practical cases this is not a real approxima-tion since we expect to have always complete informationon the closed/open stages or true/false milestones.

We are now ready to provide the notion of snapshot for asemantic GSM schema is given below.

Definition 4 A semantic snapshot for a semantic dataschema T is an ABox A such that:

1. 〈T ,A〉 is satisfiable;2. (T ,A) |� ∀s,m.owns(s,m) → (T rue(m) →

Closed(s)) ;3. (T ,A) |� ∀s, s′.substage(s, s′) → (Closed(s′) →

Closed(s)) .

Intuitively, (2) and (3) corresponds to GSM-1 and GSM-2invariants, respectively, explained in Sect. 2.

The condition language used for specifying semantic sen-tries is a variation of the condition language adopted for clas-sic GSM (cf. Sect. 2), in the sense that the queries over theontologies it uses are interpreted under the certain answersemantics.

A semantic status event is an expression of the form+Stg(s) ≡ Stg(s) ∧ Close(s) ∧ Open′(s) or −Stg(s) ≡Stg(s) ∧ Open(s) ∧ Close′(s) where Stg is a subclass

of StageInstance, or +Mst(m) ≡ Mst(m) ∧ False(m) ∧True′(m) or −Mst(m) ≡ Mst(m) ∧ True(m) ∧ False′(m)where MstName is a subclass of MilestoneInstance. Tosimplify the notation, in sentries we write Stg instead ofStg(s) since, given an artifact instance, in the ABox thereis only one individual for each concept Stg subclass ofStageInstance (analogously for milestones).

Semantic status events are formulas that refers to twosemantic snapshots (A,A′). The intuitive semantics is simi-lar to that presented in Sect. 2, but, once more, formulas areinterpreted under the certain answer semantics.

Definition 5 A semantic sentry for a semantic GSM dataschema is a boolean formula of the form τ ∧ γ , where

– τ is either of the following:

– empty;– Event , where Event is the most specific class

of the (singleton) individual belonging to conceptLastEvent;

– {+,−}Stg {+,−}Mst, where Stg and Mst are asbefore;

– γ is a formula that contains neither Event nor statusevents.

Notice that in semantic GSM there is no need for formaldefinitions of events and tasks, as their properties are alreadycaptured by the ontology.

The operational semantics of semantic GSM is groundedon that for classic GSM, as its fundamental concepts are, froman high-level perspective, orthogonal to the logical represen-tation of data. Hence we again rely on the notion of snapshot,which is now as in Definition 4, and B-step, during which theECA rules are processed. From the technical viewpoint, insemantic GSM ECA rules contain queries over the ontology,and then answering them means reasoning to compute theircertain answers (cf. Sect. 3). For this reason, the check forwell-formedness is in general more involved than in classicGSM. However, when query answering is FOL-rewritable,checking well-formedness can be actually performed as inthe classical GSM setting, modulo the computation of FOL-rewritings of the queries occurring in the ECA rules. In otherterms, we are able in these cases to reduce a complex check,which requires to reason over incomplete information, to astandard GSM well-formedness check. This is, for example,the case of ECQ queries issued over DL-Lite ontologies (cf.Sect. 7). Well-formedness in other, more expressive, settingsrequires further investigation which we leave for future stud-ies.

Example 2 We model the energy process described in Exam-ple 1 with a semantic GSM schema T . Figure 4 shows agraphical representation of the portion of T which describes

123


Fig. 4 Fragment of thesemantic GSM schema for theenergy process in Example 1describing the domain

the domain of interest. As usual, such a graphical repre-sentation is useful for presentation purposes, whereas theontology TBox is in fact specified through a set of logi-cal axioms (possibly going beyond the ER expressivenessshowed in the figure). In this example, the reader may con-sider the ontology modeled in OWL, or even in DL-Lite, pro-vided some suitable approximations, as discussed in beforeand, more in detail, in Sect. 7. The CPA concept special-izes ArtifactType of the upper ontology in Fig. 3, while allother concepts are intended to specialize BusinessObjectas a disjoint and complete hierarchy. A monthly report con-tains a set of (energy) measures and is either a control pointassessment or an application. A control point assessment isbased on exactly two applications (each one edited by a com-pany) and refers to a control point, which connects exactlytwo companies. Measures can be either manual or automaticand they are taken at a specific control point by a specificcompany.

In Fig. 5, a partial fragment of T which describes thelifecycle of the process is graphically represented. Con-cepts Requesting, ReqMR1 and ReqMR2 specialize stageinstance. Their individuals represent a stage instance of a spe-cific artifact. Roles ReqMR1Substg and ReqMR2Substgspecialize the substage role of the upper ontology (oncemore such a generalization is not pictured) and ReqMst,ReqMR1Mst and ReqMR2Mst, specializing the owns role,make the relation between states and milestones explicit.

Fig. 5 Fragment of the semantic GSM schema for the energy processin Example 1 partially describing the process’ lifecycle

Table 3 shows the guards of the energy example, nowexpressed in ECQs over the ontology, where CPADrafting,EqualMeas, DiffMeasAuto and DiffMeasMan are sub-classes of StageInstance. It is easy to see that, despite their

123


Table 3 Guards for Example 2

Stage Guard sentry

on if

Requesting C P ACreationEvent −RequestMR1 +Requesting −RequestMR2 +Requesting −CPADrafting +MRAppsReceived −

EqualMeas +CPADrafting

∃ap1, ap2.App(ap1) �= App(ap2)∧[∃a,m1,m2, d, v.based On(a, ap1) ∧ based On(a, ap2) ∧ cont (ap1,m1) ∧ cont (ap2,m2)∧

date(m1, d) ∧ date(m2, d) ∧ val(m1, v) ∧ val(m2, v)]

DiffMeasAuto +CPADrafting

∃ap1, ap2, v1, v2.App(ap1) �= App(ap2) ∧ v1 �= v2∧[∃a,m1,m2, d.based On(a, ap1) ∧ based On(a, ap2) ∧ cont (ap1,m1) ∧ cont (ap2,m2)∧

date(m1, d) ∧ date(m2, d) ∧ val(m1, v1) ∧ val(m2, v2) ∧ Man(m1) ∧ Auto(m2)]

DiffMeasMan +CPADrafting

∃ap1, ap2, v1, v2.App(ap1) �= App(ap2) ∧ v1 �= v2∧[∃a,m1,m2, d.based On(a, ap1) ∧ based On(a, ap2) ∧ cont (ap1,m1) ∧ cont (ap2,m2)∧

date(m1, d) ∧ date(m2, d) ∧ val(m1, v1) ∧ val(m2, v2)∧((Man(m1) ∧ Man(m2)) ∨ (Auto(m1) ∧ Auto(m2)))]

length due to joins, they are easier to manage than the onesin Table 1, referring to the non-semantic GSM modeling ofthe same process. For example, no unions are requested inthe guard for EqualMeas stage.

We notice also that we can now easily pose over the ontol-ogy complex queries (possibly not among those designedfor the process that the artifact realizes) that could not beimmediately expressed over the data schema of a classicalGSM artifact as the one given in Fig. 1. For example, wecan easily get information about measures sent by a com-pany C to the system operator that are not included in thecontrol point assessment for which they are produced. Thequery Q1(id, y,m, d, v) described below returns indeed theCP identifier, the year, month, and date of the control pointassessment, and the value of the excluded measure4.

∃cpa,ms.[∃app, cp.CPA(cpa) ∧ refersTo(cpa, cp)∧ID(cp, id) ∧ month(cpa,m) ∧ year(cpa, y)∧basedOn(cpa, app) ∧ editedBy(app,C) ∧ cont(app,ms)∧date(ms, d) ∧ val(ms, v)] ∧ ¬[cont(cpa,ms)]

It should be easy to see that to extract the above infor-mation from the schema in Fig. 2, a more involved queryis needed. Indeed, while in the ontology we can exploit theconcept Meas, which generalizes both automatic and manualmeasures, and the role cont which relates generic measuresto generic reports, in the information model schema of clas-sic GSM we have to explicitly access both the AutoMeas

4 We assume that the constant C used in the query denotes the objectrepresenting the company C .

and ManMeas relations and separately consider the asso-ciation of such measures with reports. Such an advantagebecome more significant as the number of specialized con-cepts increases.

We also notice that queries over semantic GSM can natu-rally involve both data and the lifecycle, as done in the queryQ2(s) described below, which returns the closed stages foran artifact instance that is processing the control point assess-ment of January 2013 for CP with ID 17.

∃a, cpa, cp.artifactInstance(a) ∧ SOrefersTo(a, s)∧closed(s) ∧ belongsTo(a, cpa) ∧ month(cpa, ′January′)∧year(cpa, 2013) ∧ refersTo(cpa, cp) ∧ ID(cp, 17)

In classic GSM such a query is not naturally expressible,as stage instances cannot be returned as the result of a query.It is in principle possible to modify the classic GSM informa-tion model to represent the lifecycle schema, but this wouldrequire to use extra data structures in the data attributes, thuslosing the conceptual distinction between business data andlifecycle data. The upper ontology, instead, provides a naturaland clean way to combine such aspects.

Finally, query Q3(cpa) returns the CPAs for which thereexists a stage with a child that has already been executed andone still open. From the business perspective, Q3 returns theartifacts that are still active and are about to evolve. Thisinformation can be useful for the system operator to allocateor deallocate in advance some specific resources.

∃id, ps, s1, s2,m.belongsTo(id, cpa) ∧SOrefersTo(s1, id) ∧ SOrefersTo(s2, id) ∧SOrefersTo(ps, id) ∧ SOrefersTo(m, id) ∧

123


substage(s1, ps) ∧ substage(s2, ps) ∧Open(s1) ∧ Owns(s2,m) ∧ True(m)

Notably the above query is domain and lifecycle indepen-dent. Indeed, not only it does not refer to the lower ontology,but it is also general enough to fit any lifecycle. This is pos-sible by making use of distinctive constructs of ontologies,such as generalization.

Another interesting example of query over the ontologydescribed above is showed in Sect. 7. Such a further querymakes it even more evident the importance of reasoning inquery answering in semantic GSM.

The example above makes clear the advantages of usingsemantic GSM. In the first place, the ontology captures enti-ties of the domain and relationships between them in a clearerand more elegant way than a relational model, which isusually built to serve the implementation level, and not fordescribing the domain per se. For example, the ontology spec-ifies properties, as for instance the fact that a CPS is basedon two applications, which are hidden in the data schemaof classical GSM. Furthermore, it allows easier and moreexpressive queries: on the one hand sentries are more man-ageable as they can be formulated considering the reasoningservices the logics provides (such as certain answer com-putation), and on the other hand, due to the unified viewof the schema, user queries can now directly refer to bothdata and lifecycle. Finally, it allows for specifying businessrelevant high-level queries that can be used on all instan-tiations of semantic GSM, and they are robust to lifecyclerefinements.

5 Linking Semantic GSM With Multiple Front-EndApplications

In this section, we discuss a combination of GSM, ontologies,and mappings that is suitable in the common situation wherethe architecture of the system is decomposed into a uniqueback-end and multiple (possibly legacy) front-ends.

More specifically, we consider the case where

– a unique back-end hosts the business processes thatmanipulate (i.e., read, write and update) the whole datarelated to the entire application domain.

– Multiple front-end applications, conforming to differentlocal database schemas, are employed to show (i.e., readand visualize) the data produced by the back-end and torealize services on top of these data; each such an appli-cation can also write its own data into the correspond-ing local database, but without affecting the informationmaintained by the back-end, that is, local updates in thissetting should not be propagated towards the ontology.

Example 3 Consider a company whose main asset is knowl-edge management in e-agriculture. In particular, the companyemploys domain experts who manage live, evolving infor-mation about plants, insects, parasites, phytosanitary prod-ucts, weather forecasts, and so on. A plethora of web sitesand portals, possibly developed by third parties, rely on thisinformation to realize e-services in the agricultural domain.

A suitable architecture for the company’s information sys-tem is one for which a controlled set of back-end businessprocesses with restricted access is used to insert and updatethe relevant data. On the other hand, the web sites are front-end applications, completely decoupled from the lifecycleof the back-end processes, and relying on their own localdatabase schemas (independent from the “global” schemaemployed by the back-end). However, they need to accessthe back-end information system to fetch the relevant datastored there.

Figure 6 shows how the semantic technologies presentedin this work can be combined to support such an architecture,with a twofold advantage: the back-end can manipulate therelevant data at the conceptual level, while seamlessly accessand “understand” such data in terms of their local schemas.

More specifically, in Fig. 6 an ontology is used to capturethe domain knowledge. Semantic GSM is then employedto construct the back-end processes working on top of thedomain ontology, by retaining all the advantages discussedin Sect. 4. At the same time, multiple (external) databaseschemas are used by the front-end applications. The most crit-ical aspect of the architecture is, therefore, the link betweenthe ontology and such multiple databases. Fortunately, thetechniques recalled in Sect. 3 for linking data to ontologiescan be exploited to attack this challenging problem. Recallthat, in this setting, (part of) the data maintained by the front-end databases must be obtained from the back-end ontol-ogy (which is then accessed by front-end databases in read-mode). This suggests that the form of mapping assertions toformally capture the link is the one of LAV mappings. In fact,to specify that a relation R in one of the local, front-end data-bases, has to be fed with data taken from the ontology, we candevise a mapping assertion of the form R(x) � ψ(x), whichactually expresses that R(x) is a (local) view constructed ontop of the queryψ(x) posed over the ontology. In other terms,such a LAV assertion describes the content of the relation Rin terms of the ontology, which is exactly what we need here.Notice, however, that since in this setting the data flow isfrom the back-end ontology to the front-end databases, map-ping assertions are interpreted as complete rather then soundassertions (cf. Sect. 3). Also, to avoid that local updates ondata propagate towards the ontology, we impose that front-end relations mapped to the ontology are only accessible inread mode by local processes (except for the import of datacoming from the ontology—cf. below).

123


Fig. 6 Semantic GSM withLAV mappings exploited bymultiple front-end applications

(a)

(b)

Figure 6 provides an abstract representation of the evolu-tion of data present in the ontology and the local databases atexecution time. Every time a (semantic) action is performed,the ABox of the back-end ontology is updated according tothe action effects. Through the LAV mapping assertions, thischange in the ontology can be also understood by the front-end databases, putting together their own local data with thedata present in the ontology. From the operational point ofview, this abstract picture can be grounded in the system byexploiting the mapping assertions in two ways:

– effectively transfer data extracted from the ontology tothe local database.

– answer queries posed over the local database by trans-parently accessing the ontology on-demand.

5.1 Data Transfer

In data transfer approach, some data maintained by the ontol-ogy are now replicated in the local database, similarly to dataexchange (cf. [39]). The disadvantage of this approach isthat it introduces redundancy, and consequently correspond-ing mechanisms must be implemented to regularly align thedata maintained by the local database with the ones presentin the ontology. Remember, in fact, that there are back-endprocesses running on top of the ontology, which could lead tochanges that should be propagated to R. On the other hand,

the advantage of this approach is that the back-end and thefront-end only interact at specific, pre-determined momentsin time: a connection between the ontology and the localdatabase is required only when there is an alignment requestissued to the local database. Beside these synchronisationpoints, the two systems operate completely independentlyfrom each other.

As an example, let us consider again the e-agriculturalcompany of Example 3. Supposing that the back-end storesfresh forecast data every day before midnight, a front-endapplication requiring those data can simply trigger an align-ment just after midnight, importing the new information intoits own local database, then using this local “copy” to provideits specific service, without the need of further interactionwith the back-end.

5.2 Transparent Access

With the transparent access approach, the local database doesnot replicate the data present in the ontology. However, whenqueries are issued over the local database, mapping assertionsare exploited to suitably include in the returned result set alsodata present in the ontology. From the viewpoint of a front-end application, there is no difference between this approachand the data transfer one, i.e., transparent access constitutesa form of “virtual” data transfer.

123


While this approach requires a stable, long-running con-nection between each front-end application and the back-end ontology (making it possible to access the ontologyon-demand, every time a query is issued over one of thelocal databases), it has the advantage that front-end applica-tions always access the fresh, latest data, without incurringin alignment issues.

We point out that the architecture described in this sec-tion to link semantic GSM artifacts with a relational stor-age, independently from the approach adopted (data transferor transparent access), goes fairly beyond OBDA. Indeed,access to data in our setting is possible both through the(back-end) ontology and through the (front-end) databases.In particular, from the point of view of the front-end data-bases, the ontology is seen as a data source, but differentlyfrom OBDA, where data sources are always plain databases,it is a source with incomplete information. In this respect,both transferring and on-the-fly querying of data turn out tobe computationally challenging. A simple, but effective, wayto deal with this situation is to assume that in each mappingassertion R(x) � ψ(x), the relation R is a view correspond-ing to the certain answers of ψ(x) over the ontology. This infact means to weaken the semantic interpretation of the map-ping (w.r.t. the completeness assumption discussed above),and at the same time makes the front-end database rely onlyon the query answering service exported by the back-endontology (thus somehow hampering the modularization ofthe overall system in independent components). We pointout that a similar approach has been advocated in the con-text of peer-to-peer (P2P) information management and inte-gration (cf. [16] and [29]), where analogous computationalproblems have been faced and various solutions proposed,ranging from the above possible weakening of the mapping,to the devising of topological restrictions in the P2P networkand in the languages used in the peer schemas or ontologies(see also [5,15,26,30,36]).

Example 4 Let us now consider again our previous runningexample and have a closer look at the processes and datamanaged by the companies which provide monthly reportapplications to the system operator. Each such company hasindeed its own processes, possibly modeled as GSM arti-facts, which are executed independently from the processesof the system operator, as well as from the other companies.For these reasons, from the point of view of the control pointassessment artifact, such processes are operating in the exter-nal environment, and no details on them or on the informationschemas they use are needed for the control point assess-ment to be executed (cf. Figs. 1 and 2). At the same time,databases locally used by various companies can be seen, forthe processes they serve, as front-end databases fed from aback-end ontology for what concerns the official measures

published by the system operator in a control point assess-ment. Such a situation resembles exactly those in Fig. 6.

Assume now that a company C wants to store informa-tion about measures it sends to the system operator that arenot included in the control point assessment. To this aim Cmaintains locally a relation R(CP_ID, year,month,date,value), whose attributes denote, respectively, the identifica-tion number of the control point, the year and the month ofthe control point assessment, the date and the value of therejected measure. This information can be gathered from theontology through the mapping assertion R(id, y,m, d, v) �Q1(id, y,m, d, v), where Q1 is the ontology query in Exam-ple 2.

6 Semantic Monitoring and Governance of RelationalArtifacts

We discuss now an architectural solution that complementsthe one discussed in Sect. 5, but is as much common ina typical industrial setting. The operation of a company istypically encapsulated in a plethora of different intra- andinter-organisational processes, each meant to discipline thework of a branch/group/area inside the company, as well theinteraction with other related areas and/or external stake-holders. Such processes may have a very different nature(flexible, rigid, unpredictable,…), involve different personsand devices (employees, domain experts, consultants, man-agers,…), and be partly not under the control of the company,but of third parties (partner companies, customers, sellers,suppliers,…). Furthermore, from the architectural point ofview, the data they manipulate are typically scattered aroundinto several (typically relational) data sources with differentschemas.

Example 5 Consider a company that maintains a web mag-azine, accessed by a community of users and containingbanners and advertising information for partner companies.Different processes, possibly with different underlying data-bases, are designed and implemented by the company toaccomplish its business objectives. An internal process isexecuted to feed the web magazine information system withfresh news. A CRM system is used to record informationabout the partner companies. A related process is followedto negotiate advertising contracts with such companies andto store the banners to be shown on the web. Finally, a setof web processes are executed to let the users register to themagazine and surf the news, at the same time tracking statis-tical information of banners’ views and clicks.

Despite this architectural fragmentation, however, all theprocesses rely on and concur in the provision of data of inter-est for the company. In particular, the scattered data sourcescontribute altogether to provide the extensional information

123


used by business experts and managers to assess the state ofaffairs, take strategic decisions, refine the company’s goals,and restructure the processes. To understand and commu-nicate such information, a common conceptualization of thedomain is needed and is indeed sometimes adopted, typicallyrepresented using graphical specification languages such asE-R, UML, or ORM diagrams. Of course, such conceptual-ization can be naturally captured by a formal ontology.

Since in this case the purpose is to understand data in thedata sources through the ontology, i.e., (virtually) transferdata from the source schemas to the conceptual schema, themost natural form of mapping to adopt to interconnect thetwo layers is the one of GAV. To define a concept N in theontology in terms of queries posed over the underlying datasources, a set of assertions of the following form may beemployed: φ1(x) � N (x),…, φn(x) � N (x). Similarly,to define a role (i.e., a binary relation) P in the ontology,GAV mapping assertions of the following forms may be used:φ1(x, y) � P(x, y),…, φn(x, y) � P(x, y).

Example 6 Consider again our running example on the ACSIenergy use case. Assume now that the ontology we havedescribed in Example 2 is used for monitoring at the seman-tic level the relational artifact described in Example 1. We,therefore, have to specify suitable mappings from the ontol-ogy towards the data schema of the relational artifact, whichis described in Fig. 1.

We report below some simple GAV mapping assertionsthat specify how some instances of the ontology can bebuilt starting from the values stored in the underlying dataschema5.

SELECT CPAIDFROM CPAs�

CPA(CPAID)

SELECT MeasID, CPAIDFROM CPAMeas�

publishedIn(MeasID,CPAID)

SELECT MeasID,concat(CPAID,company)AS AppIDFROM ManMeas�

cont(MeasID,AppID)

We also notice that in this scenario we can also easily spec-ify mappings towards the data schema of various relationalartifacts, possibly run by different electric companies, thus

5 In the third assertion we construct the identifiers of instances ofMntRep by concatenating CPAID and company.

fostering semantic process integration. For example, let usassume that another company uses a different artifact whosedata schema contains the following two relational tables tostore reports containing claimed measures on a daily basisfor a certain control point in a certain month:

R1(MRA_ID,CP_ID,month, year)R2(MSR_ID,MRA_ID,date, time, value, type).

MRA_ID is the database code (indeed a primary key inR1) assigned to a control point application, CP_ID is thecode of the control point, month and year are, respectively,the month and the year the application refers to, MSR_IDis the database code assigned to measures (indeed a primarykey in R2), date, time, and value are, respectively, the date,the time, and the value associated with a measure, and typeindicates if the measure is taken manually, in this case it hasthe value ‘m’, or in an automatic way, in which case it assumesthe value ‘a’.

We notice that the company, for other own purposes, storesin fact various measures with the same MRA_ID and the samedate, all having a different times (i.e., MRA_ID, date, andtime form together a key in R2). Only the measure with thegreater value for time within a certain date is then commu-nicated to the system operator.

The following SQL query QSQL(msr) selects only sentmeasures taken manually:

SELECT X1.MSR_ID AS msr FROM R2 AS X1 WHEREX1.type = ’M’ AND not exists (SELECT *FROM R2 AS X2 WHERE X2.MSR_ID <>X1.MSR_ID AND X2.MRA_ID = X1.MRA_IDAND X2.date = X1.date AND X1.time > X2.time)

The above query can be then used as database query in amapping assertion QSQL(msr) � Man(msr), where Manis the concept denoting manual measures in the ontologygiven in Fig. 4.

As recalled in Sect. 3, OBDA techniques have been exten-sively employed to enable the concrete usage of a domainontology to integrate and access the company’s data. We dis-cuss here how these benefits carry over the setting where theunderlying data sources are manipulated by artifacts and theircorresponding processes. Figure 7 gives an overview of thesystem architecture that arises in this setting. The differencebetween a classical OBDA setting is that the underlying datasources are regularly subject to changes due to the runningprocesses. This can be appreciated by considering Fig. 7,which provides an abstract representation of the system evo-lution. Ideally, every action execution at the relational leveltriggers a change in at least one of the data sources. Throughthe mapping assertions, this translates into a correspondingchange in the (extensional knowledge of the) ontology (theABox). The new, resulting snapshot can then be queried atthe conceptual level through the ontology itself.

123


Fig. 7 Relational processeswith GAV mappings and aunifying ontology

(a)

(b)

Analogously to what we did in Sect. 5, this abstract pic-ture can be concretely instantiated in two ways: by applyingan effective data transfer from the data sources to the ontol-ogy, or by exploiting the ontology to access the underlyingrelational data on-demand.

6.1 Data Transfer

In the data transfer scenario, data are effectively migratedfrom the underlying data sources to the ontology. SinceGAV mapping assertions are sound w.r.t. the data sources,we can in this case effectively materialize the ABox of theontology by simply evaluating the queries used in the left-hand side of the corresponding assertions and populating theABox with the union of the obtained result sets.For exam-ple, for the aforementioned concept N , its population can beobtained as the answer of the query

∨i∈{1,...,n} ϕ(x) (simi-

larly for roles).This approach resembles the one of data warehousing,

though in this case the central repository is constituted bya rich, conceptual model. Business managers and analystscan in fact exploit the ontology to query the obtained inte-grated data at a high level of abstraction and by exploiting a“business-level” vocabulary. This, in turn, provides the basisfor reporting and analysis.

The data transfer approach can also be fruitfully exploitedto support external audits. In fact, part of an audit is typi-cally dedicated to analyze (a portion of) the real data main-tained by the company’s information system, to check com-pliance with regulations and best practices. Obviously, thisanalysis can be facilitated if, instead of directly access-ing the data sources with their heterogeneous schemas,compliance queries are expressed in terms of the ontology.

Another important application of the data transfer appro-ach is in process mining [1].See the discussion about the“semantic event log”, provided below.

6.2 Transparent Access

Transparent, on-demand access to the concrete data sourcesthrough the ontology can be obtained by relying on the clas-sical framework of OBDA. Here, to achieve transparency,no data are materialized in the ABox, but query answeringis realized through query rewriting, which reformulates theuser query into a new query expressed in the alphabet of thesources, whose evaluation over the source database returnsthe certain answer of the user query. Such reformulation takesinto account the TBox ontology and the mappings. [50] and[51] provide notable examples in which the above rewrit-ings can be expressed in SQL, thus allowing for delegatingits evaluation to the underlying relational data managementsystems.

123


Fig. 8 Ontology-basedgovernance of a relationalartifact system

The described framework can be used in our settingto obtain meaningful information about the current stateof affairs, reached as a result of the execution of (possi-bly multiple) business artifacts working over the relationalsources and the corresponding processes. Understanding thesemantics of data contained in this low-level sources isimportant to

– Conceptual query answering, with the same advantagesdiscussed in Sect. 4. In particular, if the underlying rela-tional processes are specified in terms of GSM artifactlifecycles, then part of the ontology can be dedicated tocapture GSM itself, as shown, e.g., in Figs. 3 and 5. Con-cepts and relationships used to model the lifecycle can beeasily mapped to the status attributes maintained by theunderlying GSM information schemas through suitableGAV mapping assertions, supporting the possibility offlexibly pose conceptual queries asking about the currentprocess status, possibly relating it also to the current data,as shown in Example 2.

– Govern the underlying processes, blocking the final-ization of those process actions that manipulate thedata leading to a globally inconsistent situation, wheresome semantic constraint in the ontology is violated(see Fig. 8). Obviously, this requires a mechanism toevaluate the action effects before effectively enforcingthem, triggering an exceptional behavior if a violationis detected. In particular, this must be propagated downto the process responsible of the action, which in turncan activate a compensation phase, finding an alterna-tive execution path. To show how this approach can beapplied also with classical process specifications, Fig. 9sketches the meta-model of a BPMN task executionthat exploits a transaction to coordinate with the ontologygovernance service, triggering a roll-back (and a corre-sponding compensation sub-process) in the case of non-conformance.

– Relate different artifacts that share information, thoughpossibly with very different representation, in their arti-fact instances. This is even more critical in the caseof inter-organizational processes combining artifacts ofmultiple companies.

– Discipline the introduction of new artifacts and processesin the system, checking whether they seamlessly integratewith the already existing artifacts and processes, and sup-porting various forms of conformance tests.

– Facilitate the realization and enforcement of authoriza-tion views, proposed by [42] in the context of so-calledartifact interoperation hubs (see [37]), to formally reg-ulate to which pieces of information the various stake-holders participating to the hub share an access.

An example of a real-world application in which weadopted the framework for semantic monitoring of relationalartifact discussed in this section is given in Sect. 7.

6.3 Semantic Event Log

We discuss now a particular scenario in which the approachpresented in this section is exploited to provide the basis forprocess analysis, improvement, and re-engineering. In par-ticular, we sketch how the combination of ontology and map-ping assertions from the process data sources to the ontologycan be used as a basis for process mining [1].

Process mining combines business process analysis withdata mining, to the aim of discovering, monitoring, diagnos-ing, and ultimately improving business processes. Tradition-ally, process mining is applied to post-mortem data, i.e., datarelated to already completed process instances. Recently, itsapplicability has been broadened including also a plethoraof operational decision support tasks that are exploited atrun-time, i.e., by considering live data of running processinstances.

123


Fig. 9 Meta-model of anontology-governed BPMN task

Fig. 10 E–R diagram capturinga portion of the XESmeta-model; the dashed part isreported for clarity, but isconcretely realized in XESthrough the notion of extensionand corresponding requiredattributes for the events

Independently from the phase in which process miningtechniques are exploited, central for their applicability is theavailability of process event logs, which explicitly trace allthe relevant events (and the corresponding data) occurred sofar due to the process execution. The availability of eventlogs of good quality poses a twofold challenge. On the onehand, the meaningful information associated with the eventsis typically scattered around different tables in the underly-ing database and possibly even in several data sources. Onthe other hand, standard formats for event logs, such as XES(http://www.xes-standard.org/), have been proposed to makeit possible to apply process mining algorithms and tools with-out the need of customizing how they are fed with input dataon a per-company basis.

For these reasons, the extraction of a unique, standardevent log from a company’s information system is far fromtrivial. In this respect, OBDA can be effectively appliedto

– include in the ontology a set of concepts and relationshipsdedicated to capture event logs according to the chosenformat representation standard (see, e.g., Fig. 10).

– Establish mapping assertions from the data sources tothis portion of the ontology, in such a way to be ableto understand event-related data in terms of the standardrepresentation.

In this setting, the data transfer scenario can be exploitedto extract an event log containing post-mortem data and toapply process mining techniques off-line. Conversely, theon-demand access approach can be used for monitoring pur-pose, writing compliance queries on top of the event log, andconsequently checking whether a process execution trace iscurrently respecting or violating certain business rules, in thestyle of [22,45].

7 Instantiation of the Framework for SemanticMonitoring in the ACSI Project

In this section we briefly describe how the framework pro-posed in this paper has been instantiated within the Euro-pean Project ACSI (Artifact-Centric Service Integration), forsemantic governance and verification of artifact systems.

123

http://www.xes-standard.org/


We point out that such instantiation has been concretelyapplied within a real-world use case, called the energy usecase, concerning the context of Spanish electric supply sys-tem and focused on the energy exchange between differentelectric companies, controlled by a central system operator.An excerpt of such a scenario has been already presentedthroughout the paper in the form of a running example. Todescribe semantic governance of artifact systems in ACSI,we continue to refer to such a scenario.

Within ACSI, the processes of interest in the energy usecase were extensively modeled in terms of (classical) GSMartifacts. Example 1 describes one of them, namely CPA arti-fact. Other artifacts that have been realized refer to submis-sion of possible objections issued by a company concerningthe measures published by the system operator and to thefinal liquidation process related to the energy exchange. Therealized artifacts were effectively deployed in the so-calledACSI Interoperation Hub, an environment for the executionof independent GSM artifacts operating for a common goal.

In a parallel activity, we designed an ontology for theenergy setting, by leveraging on the upper level ontologygiven in Fig. 3, and by specializing it into a specific ontologyrepresenting the domain of interest for the energy use case,similarly to what is described in Example 2. We specifiedthe resulting overall ontology in DL-Lite. A graphical rep-resentation of a portion of this ontology is given in Figs. 4and 5. We point out that some properties represented in suchfigures, non-expressible in native way in DL-Lite, such asthe covering of the concept Meas, or the maximum cardi-nality 2 on roles basedOn, connTo and cont, have beensuitably approximated by means of epistemic constraints (cf.Sect. 4). On the other hand, we fully exploited the expressivepower of DL-Lite, which also allows for specifying addi-tional constraints that are not rendered graphically in theform of ER-constructs, such as the following identificationassertions:

– no two applications based on the same CPA are edited bythe same company (i.e., App is identified by its partici-pation in the roles editedBy and basedOn);

– it is not possible for two CPAs to have the same year–month and refer to the same control point.

In a second phase, we connected the artifact and semanticlayers by linking the data schema of the underlying GSMartifacts to the concepts and relations of the ontology, forgovernance and monitoring purposes. In accordance with theframework described in Sect. 6, the linkage of the ontologytowards the underlying data schema was realized throughGAV mapping assertions. Each such assertion associates anelement of the ontology with an SQL query over the dataschema of the artifact layer (see Example 6). This allowed atransparent access to the data stored in the artifact layer, by

fetching information only on-demand, without performingany (costly) materialization of the data maintained by theartifacts in terms of an ABox instantiating the TBox of theontology.

Some prototype tools have been developed within ACSIfor the management and exploitation of the semantic layer.One such a tool has the purpose of maintaining and docu-menting the ontology, of storing and inspecting the mappingsand, most notably, of providing support for querying theontology. A snapshot of the querying environment providedby this tool is given in Fig. 11. We stress that the system werealized is coupled with an OBDA reasoner featuring a soundand complete technique for query answering over ontolo-gies. Specifically, within ACSI we employed Mastro [23],an OBDA reasoner able to process unions of conjunctivequeries expressed in SPARQL syntax, and posed over OBDAspecifications where the TBox is expressed in DL-Lite, andthe mappings are GAV. Mastro is also able to process ECQqueries specified in a proprietary syntax. To process queries,Mastro proceeds in two steps. In the first step, called ontologyrewriting, it computes the FOL-rewriting of the input querywith respect to the TBox of the OBDA specification. In thesecond step, called mapping rewriting, it further rewrites theFOL-rewriting obtained in the previous step by consider-ing the contribution of the mappings, following a classicalunfolding procedure [41]. Intuitively, this procedure substi-tutes each ontology predicate in the query with the SQL viewthat the mapping associates to it. Notably, the final rewritingis expressed into a standard SQL query that can be directlyprocessed at the artifact layer, i.e., OBDA specifications man-aged by Mastro support FOL-rewritable query answering (cf.Sect. 3).

We effectively exploited this tool for monitoring artifactsystems by issuing queries over the ontology. Through suchqueries, we were able to understand at the conceptual levelthe impact of the evolution of the artifact system in termsof the managed information and govern it. For example, wehad the possibility of relying on specific queries so as toidentify execution steps which, once executed at the artifactlayer, would lead to a new (virtual) semantic snapshot that isinconsistent with the ontology TBox.

An example of monitoring query is given in the following:

∃ rep,ms1,ms2, d.[MntRep(rep) ∧ cont(rep,ms1)∧cont(rep,ms2) ∧ date(ms1, d) ∧ date(ms2, d)]∧ms1 �= ms2

This query is an ECQ asking for the existence of a monthlyreport that contains two distinct measures referring to thesame date. This is an unwanted situation, since every reporthas to exhibit only a single measure per day6.

6 Notice that the above query in fact constitutes an epistemic constraintsspecified over the TBox.

123


Fig. 11 Semantic artifactmonitoring ACSI tool

To show the benefit of writing queries at the semanticlevel, we briefly discuss how Mastro processes the afore-mentioned query, and how the final SQL rewriting looks like.According to the semantics of ECQs, answering the abovequery amounts to first compute the certain answers of theCQ Q = MntRep(rep)∧cont(rep,ms1)∧cont(rep,ms2)∧date(ms1, d) ∧ date(ms2, d) and then verify the inequalityms1 �= ms2 over such certain answers. To this aim, Mastrofirst computes the ontology rewriting Qo, which is the FOLquery (in fact a UCQ with inequalities) partially reportedbelow:

∃ rep,ms1,ms2, d.((App(rep) ∧ cont(rep,ms1)∧cont(rep,ms2) ∧ date(ms1, d) ∧ date(ms2, d))∨(CPA(rep) ∧ publishedIn(rep,ms1)∧publishedIn(rep,ms2) ∧ date(ms1, d) ∧ date(ms2, d))∨ . . .) ∧ ms1 �= ms2

We notice that Qo is obtained by rewriting the conjunc-tive query Q and then imposing the inequality ms1 �= ms2

over the rewriting. Intuitively, to rewrite Q Mastro exploitsinclusions between concepts and roles, typings of roles, andmandatory participations of concepts into roles asserted in theontology (for further details we refer the reader to [18]). Inthe above example, since App is a sub-concept of MntRep,and publishedIn is a sub-role of cont, the rewriting algo-rithm substitutes the predicates MntRep and cont in Q withApp and publishedIn, respectively, generating all disjunctsthe stem from all possible substitutions. In total, Mastro gen-erates an overall number of approximately 50 disjuncts. Tofinalize the FOL-rewriting, Mastro then computes the map-ping rewriting of Qo by unfolding it according to the map-ping assertions (which we partially described in Example 4).This produces a final number of disjuncts which is in gen-eral exponential w.r.t. the number of atoms in Qo. We notice

that the ontology rewriting phase and, therefore, the reason-ing over the ontology, turned out to be crucial in this query:unfolding directly the original query would have returned anempty query, since MntRep and cont are in fact not directlyassociated with any query in the mappings.

We close this section by discussing the impact of choosingDL-Lite as the ontology language for expressing the dynam-ics of an artifact-centric system. The choice of DL-Lite issupported by the fact that the same benefits of the FOL-rewritability property for query answering and ODBA alsoapply to the system dynamics. Such benefits hold not onlywhen the dynamics of the artifact system is directly speci-fied at the semantic level, as in the case of semantic GSM(cf. Sect. 4), but also when the dynamics is specified at alower level, which is relational in the majority of cases (asin standard GSM), and declarative dynamic constraints areexpressed over the semantic layer posed on top of the rela-tional one. In the first setting, the semantic GSM specifica-tion of an artifact can be automatically manipulated by tech-nologies like Mastro to be translated into a correspondingstandard GSM specification (by simply rewriting each sen-try). The resulting specification can then be directly executedusing the ACSI Interoperation Hub (or any execution enginefor relational GSM artifacts). In the second setting, tempo-ral/dynamic properties are added to the semantic layer, soas to express conceptual constraints on the expected or for-bidden evolutions of concepts and relations. Within ACSI,we used in particular a first-order variant of μ-calculus, vir-tually the most powerful logic for verification, to expresssuch dynamic constraints. In this variant, local propertiesare in fact ECQs over the alphabet of the ontology TBox.Notably, [20] show that, by virtue of the FOL-rewritability,the rewriting of dynamic constraints (taking into accountboth TBox assertions and GAV mappings) can be directlyposed over the data schema of relational artifacts. This result

123


has been effectively implemented in the OBGSM tool [10]and it is applicable in the setting described in Sect. 6, thatis, when relational GSM artifacts are linked to a (DL-Lite)ontology through GAV mappings. This tool implements thereformulation technique described by [20] and returns a cor-responding temporal formula which, together with the speci-fication of GSM artifacts, can be directly fed into the GSMCmodel checker proposed by [33] for verification of GSMartifacts.

In summary, the key property of FOL-rewritabilityenjoyed by DL-Lite makes it possible to start from a richsemantic artifact framework and reformulate it so as to usetraditional ontology-agnostic tools for query answering, exe-cution, and verification of artifact systems.

8 Related Work

Since the introduction of artifact-centric processes as a meansto integrate processes and data [48], extensive research hasbeen done along two main lines: foundational approachesfor formally representing and verifying artifact-centric sys-tems (such as, e.g., [11,27]), or proposals of modeling lan-guages and corresponding execution environments (such as,e.g., [38,54]).

The majority of approaches related to artifact-centric sys-tems assumes that artifacts maintain relational data. Further-more, as far as we know only a few works exploit the artifactinformation model beyond just its use to support the artifactexecution. An example is [54], where a unifying view encom-passing two different artifact modeling languages (GSM andEZ-flow) is used to wrap the execution of hybrid artifact-centric systems, implemented partly in GSM and partly inEZ-flow. The reconciliation between the underlying informa-tion models and the common, abstract one is done by exploit-ing very simple mapping assertions and could be, therefore,accommodated by the framework here presented.

As for the verification of artifact-centric (or, more in gen-eral, data-centric) business processes, the same trend dis-cussed above can be seen: the majority of proposed frame-works relies on the relational model for the data component[9,11,27]. A notable exception is constituted by the so-calledDL Knowledge and Action Bases (KABs, see [8]). KABs areconstituted by a knowledge component modeled by means ofa data-oriented DL ontology called DL-Liteand by an actioncomponent containing a set of actions (and a process builton top of them) used to progress the knowledge compo-nent. In this respect, the work here presented can be consid-ered as a concretization of the KAB framework, where theaction component is grounded in the GSM language. Verifi-cation of (classical) GSM artifacts has been tackled only veryrecently, with some preliminary but promising theoreticalresults [12,53] and practical developments [33]. One could

wonder whether verification of GSM could be extended to thesemantic GSM presented here. Thanks to [53], we have, inprinciple, an affirmative answer. In fact, to show verifiabilityof GSM artifacts, [53] presents a translation mechanism intoa formal framework whose action component closely corre-sponds to the one of KABs. Since the translation is orthog-onal w.r.t. the data component, it can be seamlessly appliedto obtain a KAB starting from a semantic GSM specifica-tion, consequently enabling the possibility of applying theverification results obtained for KABs.

In [20], a framework for the verification of data-centricprocesses is proposed, which mixes a formal representationof relational artifacts with the notion of ontology as consid-ered here. Differently from KABs, processes progress theirexecution at the relational layer, and the ontology is usedto understand and govern the execution at a higher levelof abstraction. The link between the underlying relationalinformation models and the unified conceptual representa-tion provided by the ontology is established through mappingassertions, i.e., relying to OBDA. This work exactly providesthe formal underpinning for the governance and monitoringarchitectural framework presented in Sect. 6, showing thatthis two-level architecture can be subject to verification. Inparticular, it recasts the query unfolding approach typicallyused in DL-LiteOBDA to the more general case of tempo-ral properties composed by temporal operators combinedwith queries over the ontology. Such a result constitutedthe basis for the verification framework described in [10](cf. Sect. 7).

9 Discussion and Conclusion

In this paper we have provided a comprehensive frameworkfor semantic GSM artifacts. The key characteristic of theframework is that it allows for expressing both the data andthe lifecycle schema of GSM artifacts in terms of an ontol-ogy. We have provided an upper ontology for the semanticGSM model, which is specialized in each artifact with spe-cific lifecycle elements, relations, and business objects. Wehave then enriched our framework by enabling the linkageof the ontology to autonomous database systems through theuse of mapping assertions, as done in data integration andOBDA. We have discussed two main scenarios of practi-cal interest where the use of mappings turn out to be cru-cial for enabling collaboration and communication amongdifferent and heterogeneous systems. We have discussed aconcrete instantiation of our framework based on the useof DL-Lite ontologies and have shown a real-world appli-cation of it in the energy domain, investigated in the con-text of the EU project ACSI. Notably, FOL-rewritability ofquery answering enjoyed by DL-Lite ontologies allowed usto rely in the practice on the same artifact management tools

123


based on relational technology used for classical GSM arti-facts. We leave to future studies the investigation of thosecases where ontology languages are more expressive thanDL-Lite. However, from the computational point of viewwe can already notice that in these more expressive set-tings a purely intensional approach based on query rewrit-ing is difficult to achieve. Indeed, as shown in [21], DL-Liteis one of the maximal ontology languages enjoying FOL-rewritability of query answering, and even limited exten-sions of it lead to inherently more complex query answer-ing, which can, therefore, not be encoded into evaluation ofa FOL query (which can be directly translated into SQL and,therefore, managed under relational technology). We arguethat in these cases, approaches (even partially) based on somedata manipulation should be pursued, to achieve reasonableperformances.

A natural follow-up of this paper is to study, from boththe practical and the theoretical perspective, the scenariothat results from the merging of the systems described inSect. 5 and in Sect. 6. In this scenario, global semantic GSMartifacts, operating over the ontology, and relational GSMartifacts, operating over autonomous databases, coexist, andact both as consumers and producers of information. There-fore, all forms of mapping assertions and semantic assump-tions on them considered in this paper need to be adopted,to guarantee both semantic governance of relational artifactand access to ontology data by front-end systems. On theone hand, this scenario sets the stage for advanced and com-pletely distributed semantic reasoning over artifacts, and onthe other, it presents challenging computational issues. Forexample, data manipulation at the ontology level in this set-ting is closely related to the longstanding problem of viewupdates [56].

Acknowledgments This work has been supported by the EU FP7project ACSI (Grant No. 257593) and by the FP7 large-scale integratingproject Optique (Grant No. 318338).

References

1. van der Aalst WMP, et al (2011) Process mining manifesto. In:Proceedings of BPM Workshops, LNBIP. Springer, New York, pp169–194

2. van der Aalst WMP, Barthelmess P, Ellis CA, Wainer J (2001) Pro-clets: a framework for lightweight interacting workflow processes.Int J Coop Inf Syst 10(4):443–481

3. Abiteboul S, Hull R, Vianu V (1995) Foundations of databases.Addison-Wesley, UK

4. Abiteboul S, Bourhis P, Galland A, Marinoiu B (2009) The AXMLartifact model. In: Proceedings of TIME 2009, pp 11–17

5. Adjiman P, Chatalic P, Goasdoué F, Rousset MC, Simon L (2006)Distributed reasoning in a peer-to-peer setting: application to theSemantic Web. J Artif Intell Res 25:269–314

6. Apt KR, Blair HA, Walker A (1988) Towards a theory ofdeclarative knowledge. In: Foundations of deductive databases

and logic programming. Morgan Kaufmann, Massachusetts,pp 89–148

7. Baader F, Calvanese D, McGuinness D, Nardi D, Patel-SchneiderPF (eds) (2007) The description logic handbook: theory, imple-mentation and applications, 2nd edn. Cambridge University Press,Cambridge

8. Bagheri Hariri B, Calvanese D, De Giacomo G, De Masellis R,Felli P, Montali M (2013a) Description logic knowledge and actionbases. J Artif Intell Res 46:651–689

9. Bagheri Hariri B, Calvanese D, De Giacomo G, Deutsch A, MontaliM (2013b) Verification of relational data-centric dynamic systemswith external services. In: Proceedings of PODS, pp 163–174

10. Bagheri Hariri B, Calvanese D, Montali M, Santoso A, SolomakhinD (2013c) Verification of semantically-enhanced artifact systems.In: Proceedings of ICSOC, LNCS. Springer, New York

11. Belardinelli F, Lomuscio A, Patrizi F (2012a) An abstraction tech-nique for the verification of artifact-centric systems. In: Proceed-ings of KR, pp 319–328

12. Belardinelli F, Lomuscio A, Patrizi F (2012b) Verificationof GSM-based artifact-centric systems through finite abstrac-tion. In: Proceedings of ICSOC, LNCS. Springer, New York,pp 17–31

13. Berardi D, Calvanese D, De Giacomo G (2005) Reasoning on UMLclass diagrams. Artif Intell 168(1–2):70–118

14. Calì A, Gottlob G, Lukasiewicz T (2009) A general datalog-basedframework for tractable query answering over ontologies. In: Pro-ceedings of PODS, pp 77–86

15. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R(2004a) What to ask to a peer: ontology-based query reformulation.In: Proceedings of KR, pp 469–478

16. Calvanese D, De Giacomo G, Lenzerini M, Rosati R (2004b) Log-ical foundations of peer-to-peer data integration. In: Proceedingsof PODS, pp 241–251

17. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, RosatiR (2007a) EQL-Lite: Effective first-order query processing indescription logics. In: Proceedings of IJCAI, pp 274–279

18. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, RosatiR (2007b) Tractable reasoning and efficient query answering indescription logics: the DL-Lite family. J Autom Reason 39(3):385–429

19. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Poggi A,Rodriguez-Muro M, Rosati R, Ruzzi M, Savo DF (2011) The Mas-tro system for ontology-based data access. Semant Web J 2(1):43–53

20. Calvanese D, De Giacomo G, Lembo D, Montali M, Santoso A(2012) Ontology-based governance of data-aware processes. In:Proceedings of RR, LNCS, vol 7497. Springer, New York, pp 25–41

21. Calvanese D, De Giacomo G, Lembo D, Lenzerini M, Rosati R(2013) Data complexity of query answering in description logics.Artif Intell 195:335–360

22. Chesani F, Mello P, Montali M, Riguzzi F, Sebastianis M, Storari S(2009) Checking compliance of execution traces to business rules.In: Proceedings of BPM Workshops, LNBIP, vol 17. Springer, NewYork, pp 134–145

23. Civili C, Console M, De Giacomo G, Lembo D, Lenzerini M,Lepore L, Mancini R, Poggi A, Rosati R, Ruzzi M, Santarelli V,Savo DF (2013) MASTRO STUDIO: managing ontology-baseddata access applications. PVLDB 12:1314–1317

24. Cohn D, Hull R (2009) Business artifacts: a data-centric approachto modeling business operations and processes. IEEE Bull DataEng 32(3):3–9

25. Damaggio E, Hull R, Vaculín R (2011) On the equivalence ofincremental and fixpoint semantics for business artifacts withguard-stage-milestone lifecycles. In: Proceedings of BPM, LNCS.Springer, New York, pp 396–412

123


26. De Giacomo G, Lembo D, Lenzerini M, Rosati R (2007) On recon-ciling data exchange, data integration, and peer data management.In: Proceedings of PODS, pp 133–142

27. Deutsch A, Hull R, Patrizi F, Vianu V (2009) Automatic verificationof data-centric business processes. In: Proceedings of ICDT, ACM,pp 252–267

28. Doan A, Halevy AY, Ives ZG (2012) Principles of data integration.Morgan Kaufmann, Massachusetts

29. Franconi E, Kuper G, Lopatenko A, Serafini L (2003) A robustlogical and computational characterisation of peer-to-peer databasesystems. In: Proceedings of DBISP2P, pp 64–76

30. Fuxman A, Kolaitis PG, Miller RJ, Tan WC (2005) Peer dataexchange. ACM Trans Database Syst 31(4):1454–1498

31. Gelder AV (1989) Negation as failure using tight derivations forgeneral logic programs. J Log Program 6(1&2):109–133

32. Glimm B, Horrocks I, Lutz C, Sattler U (2008) Conjunctive queryanswering for the description logic SHIQ. J Artif Intell Res 31:151–198

33. Gonzalez P, Griesmayer A, Lomuscio A (2012) Verifying GSM-based business artifacts. In: Proceedings of ICWS, IEEE, pp 25–32

34. Gruber TR (1993) Formal ontology in conceptual analysis andknowledge representation. In: Poli R, Guarino N (eds) Towardsprinciples for the design of ontologies used for knowledge sharing.Kluwer Academic Publishers, New York

35. Haarslev V, Möller R (2001) RACER system description. In: Pro-ceedings of IJCAR, LNAI, vol 2083. Springer, pp 701–705

36. Halevy A, Ives Z, Suciu D, Tatarinov I (2003) Schema mediationin peer data management systems. In: Proceedings of ICDE, pp505–516

37. Hull R, Narendra NC, Nigam A (2009) Facilitating workflow inter-operation using artifact-centric hubs. In: Proceedings of ICSOC,LNCS, vol 5900. Springer, New york, pp 1–18

38. Hull R, Damaggio E, De Masellis R, Fournier F, Gupta M,Heath FT III, Hobson S, Linehan M, Maradugu S, Nigam A,Sukaviriya PN, Vaculin R (2011) Business artifacts with guard-stage-milestone lifecycles: managing artifact interactions with con-ditions and events. In: Proceedings of DEBS, ACM, pp 51–62

39. Kolaitis PG (2005) Schema mappings, data exchange, and metadatamanagement. In: Proceedings of PODS, pp 61–75

40. Kontchakov R, Lutz C, Toman D, Wolter F, Zakharyaschev M(2010) The combined approach to query answering in DL-Lite.In: Proceedings of KR, pp 247–257

41. Lenzerini M (2002) Data integration: a theoretical perspective. In:Proceedings of PODS, pp 233–246

42. Limonad L, Boaz D, Hull R, Vaculín R, Heath F (2012) Ageneric business artifacts based authorization framework for cross-enterprise collaboration. In: Proceedings of SRII, pp 70–79

43. Linehan M (2011) GSM expression language. Technical report,IBM Research, available on request

44. Meyer A, Smirnov S, Weske M (2011) Data in business processes.EMISA. Forum 31(3):5–31

45. Montali M, Maggi FM, Chesani F, Mello P, van der Aalst WMP(2013) Monitoring business constraints with the event calculus.ACM Trans Intell Syst Technol 5(1):17

46. Motik B, Fokoue A, Horrocks I, Wu Z, Lutz C, Cuenca Grau B(2009) OWL 2 web ontology language profiles. W3C Recommen-dation, World Wide Web Consortium. Available at http://www.w3.org/TR/owl-profiles/

47. Motik B, Cuenca Grau B, Horrocks I, Wu Z, Fokoue A, Lutz C(2012) OWL 2 Web Ontology Language profiles, 2nd edn. W3Crecommendation, World Wide Web Consortium. Available at http://www.w3.org/TR/owl2-profiles/

48. Nigam A, Caswell NS (2003) Business artifacts: an approach tooperational specification. IBM Syst J 42(3):428–445

49. Object Management Group (OMG) (2013) Case managementmodel and notation vers. 1.0 - Beta 1. http://www.omg.org/spec/CMMN/1.0/Beta1/PDF/

50. Poggi A, Lembo D, Calvanese D, De Giacomo G, Lenzerini M,Rosati R (2008) Linking data to ontologies. J Data Semant X:133–173

51. Rodriguez-Muro M, Calvanese D (2012) High performance queryanswering over DL-Lite ontologies. In: Proceedings of KR, pp308–318

52. Sirin E, Parsia B, Kalyanpur A, Katz Y (2007) Pellet: a practicalOWL-DL reasoner. J Web Semant 5(2):51–53

53. Solomakhin D, Montali M, De Masellis R, Tessaris S (2013) Verifi-cation of artifact-centric systems: decidability and modeling issues.In: ICSOC, pp 252–266

54. Sun Y, Xu W, Su J, Yang J (2012) Sega: A mediator for artifact-centric business processes. In: Proceedings of of CoopIS, LNCS,vol 7567. Springer, New York, pp 658–661

55. Tsarkov D, Horrocks I (2006) FaCT++ description logic reasoner:system description. In: Proceedings of IJCAR, pp 292–297

56. Winslett M (1988) A model-based approach to updating data-bases with incomplete information. ACM Trans Database Syst13(2):167–196

123

http://www.w3.org/TR/owl-profiles/

http://www.w3.org/TR/owl-profiles/

http://www.w3.org/TR/owl2-profiles/

http://www.w3.org/TR/owl2-profiles/

http://www.omg.org/spec/CMMN/1.0/Beta1/PDF/

http://www.omg.org/spec/CMMN/1.0/Beta1/PDF/

Documents

ORIGINAL ARTICLEmontali/papers/demasellis-etal...e-mail: [email protected] R. De Masellis e-mail: [email protected] M. Montali · D. Solomakhin Free University of Bozen-Bolzano