34
A Process Catalog for Workflow Generation Michael Wolverton, David Martin, Ian Harrison, Jerome Thomere SRI International

A Process Catalog for Workflow Generation Michael Wolverton, David Martin, Ian Harrison, Jerome Thomere SRI International

Embed Size (px)

Citation preview

A Process Catalog for Workflow Generation

Michael Wolverton, David Martin,Ian Harrison, Jerome Thomere

SRI International

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Outline

Program overview Project overview Qualitative (capabilities) layer *

– Modeling & query handling

Quantitative (“quality of service”) layers– Modeling & query handling

Implementation

* Primary focus in this talk

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Tangram Program Objectives

Support the intelligence analyst in using data analysis tools effectively

Automatic instantiation of data analysis workflows– Maximize performance within acceptable resource constraints– Reusable workflow templates– Flexible workflow requests

Automatic selection of data analysis components and datasets

Quick & easy characterization of component descriptions– By non-experts– Supporting precise capabilities queries– Incorporating empirical measures of speed and effectiveness

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Example Workflow Template

EntityEquivalence

(Alias Resolution)

SuspicionScoring

GroupDetection

GroupDetection

GroupHypothesis

Merging

LikelihoodRatio

Detection

InexactGraph

Matching

LogicalInference

EventEquivalence

Entity/Transaction

Data

GroupSeedSet

Recognized Events/Alerts

• Backwards sweep

• Forwards sweep

ISWC 2008 A Process Catalog for Workflow Generation David Martin

VulnExpVulnExpPatternPattern

Simple Example: “Backward Sweep”

(containsNodeType ?DS ‘SuspiciousEvent)(containsNodeType ?DS ‘SuspiciousEvent)

ProcessProcessDescriptionsDescriptions

AccuracyAccuracyModelsModels

(containsNodeType ?DS(containsNodeType ?DS‘‘memberOf)memberOf)

(containsNodeType ?DS(containsNodeType ?DS‘‘suspiciousEntity)suspiciousEntity)

LAWLAW

Threat ResourceThreat ResourceAcquire PatternAcquire Pattern

CADRECADRE

‘‘memberOfmemberOf

NetKitNetKit

‘‘suspiciousEntitysuspiciousEntity

UWisc SuspicionUWisc SuspicionScoringScoring

Qualitative QueryQualitative Query

Process + PreconditionsProcess + Preconditions

Qualitative QueryQualitative Query

Process + PreconditionsProcess + Preconditions

(containsLinkType (containsLinkType ?DS ‘suspiciousEntity)?DS ‘suspiciousEntity)

(containsNodeType ?DS ‘Group)(containsNodeType ?DS ‘Group)

ISWC 2008 A Process Catalog for Workflow Generation David Martin

VulnExpVulnExpPatternPattern

Simple Example: “Forward Sweep”

LAWLAW

Threat ResourceThreat ResourceAcquire PatternAcquire Pattern

CADRECADRE

‘‘memberOfmemberOf

NetKitNetKit

‘‘suspiciousEntitysuspiciousEntity

UWisc SuspicionUWisc SuspicionScoringScoring

Data ModelData Model

Query: Process + Problem + Data ModelQuery: Process + Problem + Data Model

Data ModelData Model

Query: Process + Problem + Data ModelQuery: Process + Problem + Data Model

ProcessProcessDescriptionsDescriptions

AccuracyAccuracyModelsModels

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Program Architecture

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Outline

Program overview Project overview Qualitative (capabilities) layer *

– Modeling & query handling

Quantitative (“quality of service”) layers– Modeling & query handling

Implementation

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Project Overview: Objectives and Approach

Challenge: Characterize individual components in a way that allows a workflow management component to reason about them effectively

Approach: Characterize processes & answer queries in terms of:– Process capabilities

• What kinds of problems they are capable of answering

– How they modify the available data• What data looks like before running the process and what it looks

like after– Content– Accuracy

– Performance• System requirements (memory, OS, etc.)• Time, memory use, etc.

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Approach: Layered Process Description

Layer Name

Contents FormalismSource of

knowledge

Capabilities

Qualitative “functional” descriptions, hard resource constraints, invocation details

Static characteristics in OWL; pre & postconditions in rules

Hand-coded by component developers

Data Modification

Statistical “before/after” descriptions of data

Problem X Data Model

=>

Data Model

Experimental analysis, theoretical analysis

AccuracyStatistical description of expected accuracy of algorithm results

Problem X Data Model X Accuracy Model

=>Accuracy Model

Experimental analysis, theoretical analysis

PerformanceStatistical prediction of performance of algorithm

Problem X Data Model X Resource Model

=>Performance Model

Experimental analysis, theoretical analysis

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Relationship to Service Discovery Problem

Easier in some ways (simplifying assumptions)– Components operate on data only

• No side-effects “in the world”

– Simple patterns of I/O shared by most components– Smallish domain model (ontology)

Harder in some ways– Need to return preconditions related to specific needs

• “Least sufficient conditions”

– Need hi-fidelity (quantitative) “Quality of Service” models

– Compute QoS for specific datasets at query-time

ISWC 2008 A Process Catalog for Workflow Generation David Martin

ProCat Architecture

. . .

CL Reasoner Quantitative Layer Prediction

Quantitative Models Repository

Linear PredictorPM Non-LinearSearch Model

Predictor

. . . . . .

Process

Query Handler

. . .

. . .

PM1

PM2

GD1

GD2

Capabilities Layer KB

. . .

. . .

Ontologies

Data TEO

PM1 PM2 GD1 GD2

. . .Coeff. Coeff.GD1 GD2

. . .Data

GD2

Data

PM1Pattern

1Pattern

1

OWL

SPARQL

RDFS++ Reasoning

RDF/XML Syntax with ExtensionSPARQL

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Outline

Program overview Project overview Qualitative (capabilities) layer

– Modeling & query handling

Quantitative (“quality of service”) layers– Modeling & query handling

Implementation

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer

. . .

CL Reasoner Quantitative Layer Prediction

Quantitative Models Repository

Linear PredictorPM Non-LinearSearch Model

Predictor

. . . . . .

Process

Query Handler

. . .

. . .PM2

GD1

GD2

Capabilities Layer KB

. . .

. . .

Ontologies

Data TEO

PM1 PM2 GD1 GD2

. . .Coeff. Coeff.GD1 GD2

. . .Data

GD2

Data

PM1Pattern

1Pattern

1

PM1

I/O Behavior

I/O Behavior

Pattern 1

Pattern 2I/O Rules

Process

Requirements

Class

Proc Inst 1

• Invoc. Command

• Resource Requirements Site

Proc Inst 2

• Invoc. Command

• Resource Requirements Site

. . .. . .

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Example: Capabilities Query

<pcat:FindInputDataRequirements> <pcat:component> <rdf:Description rdf:about="http://...#?component2"> <rdf:type rdf:resource="http://.../Process.owl#PatternMatchingProcess"/> <pdl:hasOutput rdf:resource="http://...#?dataVariable5"/> <pdl:hasInput rdf:resource="http://...#?dataVariable4"/> <pdl:hasInput rdf:resource="http://...#?dataVariable3"/> </rdf:Description> </pcat:component> <pcat:constraints> <rdf:Description rdf:about="http://...#?dataVariable5"> <pdl:hasRole rdf:resource="http://.../Process.owl#HypothesisOutputRole"/> <rdf:type rdf:resource="http://...#Hypothesis"/> <pdl:containsNodeType rdf:resource="http://...#MoneyLaunderingEvent"/> </rdf:Description> ….. </pcat:constraints></pcat:FindInputDataRequirements>

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Process Description Ontology

Process– Class hierarchy– Parameters

• Types• Roles• Default values• Multiple inheritance

– Pre- and post-conditions Process Usage Template Process installation

– Resource requirements• Memory, disk space, libraries, etc.

– Invocation conventions• Environment variables, paths

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Process Ontology

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Process

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer Challenges

Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process

pre: (input containsNodeType Person)

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer Challenges

Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process

Propagation of values (in “backwards sweep”)

pre: (input containsNodeType ?T)

post: (output containsNodeType ?T)

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer Challenges

Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process

Propagation of values (in “backwards sweep”) Universally quantified conditional rules

(output containsNodeType ?T) :- (input1 containsNodeType ?T), (input2 containsNodeType ?T).

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer Challenges

Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process

Propagation of values (in “backwards sweep”) Universally quantified rules Queries may contain pre- and post-condition elements

(including arbitrary pre-condition elements)

pre: (input1 rdf:type PersonDataset) (input2 rdf:type EventDataset) (input2 temporalRange <...>) post: (output containsLinkType ParticipatedIn)

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Capabilities Layer Challenges

Pre- and post-conditions– Hypothetical in nature– Inherently “reified”– Applicable to execution instances of a process

Propagation of values (in “backwards sweep”) Universally quantified rules Queries may contain pre- and post-condition elements Least sufficient precondition is desired

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Solution

Process Usage Template (PUT)– “Snapshot” of an arbitrary successful occurrence of a process– Each process can have multiple PUTs

2 declarative units– Pre / post condition (existentially quantified)– Conditional effect rules (universally quantified)

Two-stage query processing– SPARQL queries identify candidate processes based on “static” properties– Prolog-based evaluation of pre/post-condition query clauses

Asymmetric treatment of pre vs. post– Query postcondition clauses must be derivable from PUT postcondition (or

conditional effect)– Query precondition clauses must be consistent with PUT precondition

Result precondition is accumulation of– Precondition (with propagated variable bindings)– Bodies of CE rules used to establish postcondition clauses

(with propagated variable bindings)– Precondition clauses given in query

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Outline

Program overview Project overview Qualitative (capabilities) layer *

– Modeling & query handling

Quantitative (“quality of service”) layers– Modeling & query handling

Implementation

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Quantitative Layers

Layer Name

Contents FormalismSource of

knowledge

Capabilities

Qualitative “functional” descriptions, hard resource constraints, invocation details

Static characteristics in OWL; pre & postconditions in rules

Hand-coded by component developers

Data Modification

Statistical “before/after” descriptions of data

Problem X Data Model

=>

Data Model

Experimental analysis, theoretical analysis

Accuracy

Statistical description of expected accuracy of algorithm results

Problem X Data Model X Accuracy Model

=>Accuracy Model

Experimental analysis, theoretical analysis

Performance

Statistical prediction of performance of algorithm

Problem X Data Model X Resource Model

=>Performance Model

Experimental analysis, theoretical analysis

ISWC 2008 A Process Catalog for Workflow Generation David Martin

ProCat Quantitative Layers Architecture

Quantitative Layer Prediction

Quantitative Models Repository

Linear PredictorPM NonlinearSearch Model

Predictor

. . . . . .

Query Handler

PM1 PM2 GD1 GD2

. . .Coeff. Coeff.GD1 GD2

. . .Data

GD2

Data

PM1Pattern

1Pattern

1

SR

4.2 and 5.2 Queries Quantitative Data, Accuracy, and Performance Predictions

DC MetricsOntology

TEE

ExperimentalResults

PredictionEngine

ComponentExecution

Data

Models

+ Data Characterizations

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Quantitative Layers

Requirements– Precise– Efficient– Composable

Quantitative models represented declaratively – Tabular format (not in OWL)

Query result generation done procedurally – Using lisp functions

Coefficients for the linear model can be learned through a regression method

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Process-specific Prediction Models

0

100

200

300

400

500

600

700

800

900

H_Y4_4018 H_Y4_4019 H_Y4_4020 H_Y4_4021 H_Y4_4028

Dataset

Res

ult

s

Predicted

Actual

0

500

1000

1500

2000

2500

3000

3500

4000

4500

H_Y4_4018 H_Y4_4019 H_Y4_4020 H_Y4_4021 H_Y4_4028

Dataset

Sta

tes

Exp

and

ed

Predicted

Actual

Recurrence relation Pattern Matcher model compared to LAW actual results Mean error:

– Data Modification: 20%– Performance: 19%

Runtime differs from LAW by over 2 orders of magnitude

Data Modification Performance

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Outline

Program overview Project overview Qualitative (capabilities) layer *

– Modeling & query handling

Quantitative (“quality of service”) layers– Modeling & query handling

Implementation

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Implementation

Triple Store

Sparql RDFS++ Prolog Access APISOAP

ProCat Server

AllegroGraph

ProCat infrastructure

Tangram Workflow Services API ProCat API

Domain ontologiesWINGS

ComponentdescriptionsTEE

ProCatGUI

Concurrent queries

Logging

Web service API

GUI

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Future directions

Validity checking of ontology updates Validity checking of new / updated process

characterizations Allow for disjunction in pre- and post-conditions Process characterization editor Automation of quantitative model acquisition Assistance for updating process descriptions against

ontology changes Better online browsing and catalog management

ISWC 2008 A Process Catalog for Workflow Generation David Martin

Summary

Design & implementation of a Process Catalog for Workflow Generation– Qualitative (capabilities) layer– Quantitative (“quality of service”) layers

Novel elements– Quantitative layers (“Quality of Service”)

• Numeric models for data modification, accuracy, performance

Novel approach to reasoning about pre- and post-conditions– Propagation of values (in “backwards sweep”)– Universally quantified rules– Queries may contain pre- and post-condition elements– Computation of least sufficient precondition