26
u Graz University of Technology CIKM2014 S C I E N C E P A S S I O N T E C H N O L O G Y u Graz University of Technology CIKM2014 Sequential Action Patterns in Collaborative Ontology-Engineering Projects: A Case-Study in the Biomedical Domain Simon Walk 1 , Philipp Singer 2 and Markus Strohmaier 2,3 1 Graz University of Technology 2 Gesis Leibniz Institute for the Social Sciences 3 University of Koblenz

Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

Embed Size (px)

DESCRIPTION

Simon Walk's talk at CIKM '14 about our paper titled "Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain"

Citation preview

Page 1: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

1

u Graz University of Technology CIKM2014

S C I E N C E P A S S I O N T E C H N O L O G Y

u Graz University of Technology CIKM2014

Sequential Action Patterns in

Collaborative Ontology-Engineering Projects:

A Case-Study in the Biomedical Domain

Simon Walk1, Philipp Singer2 and Markus Strohmaier2,3

1 Graz University of Technology2 Gesis – Leibniz Institute for the Social Sciences3 University of Koblenz

Page 2: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

2

u Graz University of Technology CIKM2014

2 Introduction & Motivation

The importance of collaborative ontology-engineering

projects increased over recent years due to an

increase in

• complexity of the modeled domains

• requirements for the resulting ontology

No individual is able to single-handedly cover the increased

complexity and requirements.

Hence, it is crucial to better understand and steer the

underlying processes of how users collaboratively

work on an ontology (i.e., via predictive models).

Page 3: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

3

u Graz University of Technology CIKM2014

3 Approach & Objective

To that extend we analyzed five collaborative ontology-

engineering projects from the biomedical domain to:

1. explore regularities and common patterns in user

action sequences

2. fit and select models using Markov chains of

varying order

3. predict user actions via the fitted Markov chains

Our main objective is to predict future user actions

in collaborative ontology-engineering projects.

Page 4: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

4

u Graz University of Technology CIKM2014

4 Datasets

Five collaborative ontology-engineering projects from

the biomedical domain with varying sizes of features.

Note that all ontologies were created with WebProtégé

or derivatives of WebProtégé!

Page 5: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

5

u Graz University of Technology CIKM2014

5 Types of Action Paths

Page 6: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

6

u Graz University of Technology CIKM2014

6 Types of Action Paths

Page 7: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

7

u Graz University of Technology CIKM2014

7 Types of Action Paths

Page 8: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

8

u Graz University of Technology CIKM2014

8 Types of Action Paths

Page 9: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

9

u Graz University of Technology CIKM2014

9 Extracted Action Paths

1. Users for Classes

Sequences of users that changed a class.

2. Change Types for Users & Classes

Sequences of change types performed by a user / on

a class.

3. Properties for Users & Classes

Sequences of properties changed by a user / for a

class.

Page 10: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

10

u Graz University of Technology CIKM2014

10

Exploring Regularities and

Sequential Patterns

Page 11: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

11

u Graz University of Technology CIKM2014

11 Exploring Regularities

Randomness & Regularities

Wald-Wolfowitz runs test Adapted by O’Brien and Dyck (1985)

For ~60% of our paths, regularities could be detected.1

Sequential Pattern Mining

PrefixSpan to investigate commonly used sequential

patterns.

Only immediately succeeding states build patterns.

E.g., “A B C” contains “A B” and “B C” but not “A C”

1https://github.com/psinger/RunsTest

Page 12: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

12

u Graz University of Technology CIKM2014

12 Results for the Sequential Pattern Analysis

Users for Classes Paths

Page 13: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

13

u Graz University of Technology CIKM2014

13 Results for the Sequential Pattern Analysis

Users for Classes Paths

Page 14: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

14

u Graz University of Technology CIKM2014

14

Model Fitting & Selection

Page 15: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

15

u Graz University of Technology CIKM2014

Modeling Fitting

Markov chains are stochastic processes

representing transition probabilities between

a countable number of known states.

A state space: listing all possible states

A transition matrix: listing all transition-probabilities

between states

A Markov chain of n-th order means that n previous

states contain predictive information about the next

state.

Page 16: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

16

u Graz University of Technology CIKM2014

16 Modeling Fitting & Selection

We fitted Models from orders of zero to five.2

Lower order models are nested within higher order

models.

Higher orders need exponentially more parameters

and may result in overfitting.

Bayesian model selection (Singer et al. 2014)2

Higher order models receive a penalty due to higher

complexity.

2 https://github.com/psinger/PathTools

Page 17: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

17

u Graz University of Technology CIKM2014

17 Results Bayesian Model Selection

Page 18: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

18

u Graz University of Technology CIKM2014

18

Predicting User Actions

Page 19: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

19

u Graz University of Technology CIKM2014

19 K-Fold Cross-Fold Prediction Experiment

1. Fit Markov chain model.

Split Paths into training and test set (stratified).

Rank transitions for each row in the transition matrix.

1. Determine position of test set transition in the fitted

Markov chain model.

1. Calculate average over all positions.

Average Position of 1 equals best prediction

accuracy.

Page 20: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

20

u Graz University of Technology CIKM2014

20 K-Fold Cross-Fold Prediction Results

Page 21: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

21

u Graz University of Technology CIKM2014

21 Results for the Prediction Task

Page 22: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

22

u Graz University of Technology CIKM2014

22 Conclusions

A number of sequences were produced in a non-

random way and frequent patterns can be extracted.

Memory effects (serial dependence) can increase

prediction accuracy.

The resulting prediction models can (potentially) be

used for

the creation of various recommendations as well as

to assess the impact of potential changes on the

ontology and the community.

Page 23: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

23

u Graz University of Technology CIKM2014

23 Future Work

Include additional data sources (e.g., Semantic

MediaWikis).

Analyze higher order patterns and compare patterns

of different data sources

Conduct live-lab experiments with generated

prediction-models (recommendations).

Page 24: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

24

u Graz University of Technology CIKM2014

24

Questions?

Page 25: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

25

u Graz University of Technology CIKM2014u Graz University of Technology CIKM2014

Thank you for your attention!

Page 26: Sequential Action Patterns in Collaborative Ontology Engineering Projects: A Case-study in the Biomedical Domain

26

u Graz University of Technology CIKM2014

26 References

Wald and J. Wolfowitz. On a test whether two samples are from

the same population. The Annals of Mathematical Statistics,

11(2):147–162, 1940.

P. C. O’Brien and P. J. Dyck. A runs test based on run lengths.

Biometrics, pages 237–244, 1985.

P. Singer, D. Helic, B. Taraghi, and M. Strohmaier. Detecting

memory and structure in human navigation patterns using

markov chain models of varying order. PloS one,

9(7):e102070, 2014.