Reference Representation in Large Metamodel-based Datasets

Markus Scheidgen

Model representations for large meta-model based data-sets

￭ Introduction: Technological spaces and model representations￭ Comparison of representation ￭ Implementation￭ Application

Introduction:Technological Spaces

Software Models

revers

e engin

eering

code genera

persistence / exchange

databases

persistence/versioning

processing (via ORMs: e.g. JPA)

Objects(e.g. POJOs)

ofilin

processin

g (e.g

. dom/jaxb)

exchange

(e.g. i

n web-servic

es) xslt/xsl/xquery/xpath

model-transformation/-constraints/-queries

static analysis/compilation/refactoring

running programs

other data

a other dataother d

ata other data

Introduction: State of the Art

Meta-ModelsModels

SchemasXML

GammarsCode

ClassesObjects

ER-SchemasRelational Data

visualization and editing by human users

processing in computer programs

exchange

large data-sets/persistence and querying

Introduction: New Class of DBMS

Meta-ModelsModels

SchemasXML

GammarsCode

ClassesObjects

-Big Data

-Graphs

ER-SchemasBig Relational Data

Representation: Strategies

Object-by-object Fragments

Part-of-source Morsa, ( Java) XMI, EMF-Frag

Relations CDO ?

Objects

Representation: Object-by-object vs. Fragmentation(considering traversal, theoretical results)

100 101 102 103 104 105 106

Number of loaded objects [l]

no fragmentation [f=m]

optimal fragmentation

total fragmentation [f=1]

1e+001e+011e+021e+031e+041e+051e+06

Fragment size [f]

Representation: Object-by-object vs. Fragmentation(considering traversal, theoretical results vs. implementation)

100 101 102 103 104 105 106

no fragmentation [f=m]

total fragmentation [f=1]

1e+001e+011e+021e+031e+041e+051e+06

Fragment size [f]

100 101 102 103 104 105 106100

1e+011e+021e+031e+041e+05

Fragment size [f]

Representation: Object-by-object vs. Fragmentation(considering traversal, implementation with actual model)

￭Model traversal of Grabats models with four different sizes and different characteristics

set0 set1 set2 set3 set40

EMFFrag coarse

EMFFrag fine

CDO/Morsa

EMFFrag coarse

EMFFrag fine

Representation: Object-by-object vs. Fragmentation(considering query, implementation with actual model)

￭Query of Grabats models with four different sizes and different characteristics

CDO/Morsa

EMFFrag coarse

EMFFrag fine

CDO w/o SQL

Morsa w/o index

EMFFrag coarse

EMFFrag fine

easure

Representation: Part-of-source vs. Relations(real implementation, artificial model)

100 102 104 106

number of outgoing references

100 102 104 106

Part of source implementation Relation implementation with individual access

access of one outgoing referencetraversal of all outgoing references

Representation: Part-of-source vs. Relations(real implementation, artificial model)

100 102 104 106

Part of source implementation

100 102 104 106

Relation implementation with scanning

Implementation: EMF-Fragments

map/reduce(hadoop)

“Share Nothing” Nodes(cluster, adhoc-network)

DFS (HDFS)

key-value-store(hbase)

structured datadata-sets

applications meta-model

structured datamodel transformations

Implementation: Datastore mapping

regular containment

metamodel

part of source fragmentation

relation based fragmentation

Implementation: Meta-mode-based declaration of representations

Project

Package

CompilationUnit

FieldMethod

«fragments»

Call«relation»

Implementation: Architecture

FragmentedModel extends Resource

ResourceSet

FObject extends EObject©UHÁHFWLYH�IHDWXUH�GHOHJDWLRQª

FStore extends EStore©VLQJOHWRQ��VWDWHOHVVª

ResourceSet

Fragment extends Resource

FInternalObject extends DynamicEObject

URIHandler

DataStore©GHULYHGª

©GHOHJDWHVª

1GDWDEDVH

EMF-Fragments ClassesRegular EMF Classes

1EList

EObjectEList FValueSetList

Applications: Mining and Analyzing Software Repositories

￭ Software repositories contain more information than the current software code:￭ “developers who changed class/method/statement X also changed class/

method/statement Y”￭ this information leads to knowledge about dependencies that cannot be

determined through static or even dynamic analysis￭ this can be used to• predict/find bugs• understand/improve the code-base

￭ dependency information should be stored as relational data

￭ When a piece of software evolves, its metrics change. Such dynamic metrics describe software better than static code metrics. Could lead to a better assessment of methodologies or understanding of software engineering in general.

Applications: Mining and Analyzing Software Repositories

￭ JGit: Java implementation of the Git version control system￭ MoDisco: Reverse engineering framework for eclipse java

projects based on EMF￭ EMF-Compare: Determines matches and differences between

models￭ EMF-Fragments: My own framework for large models￭ over 300 Git repositories with eclipse plug-ins that

constitute the whole eclipse foundation source base as “example” data-set

Applications: Model of a Software Repository

PB1.R1

Repository

Revision Diff

CompilationUnit

Package Class

* * * *

prevnext

JGit MoDisco

metamodel

usageInPackageAccess

package1

«relation,fragmentation»

«fragmentation» «relation,fragmentation»

«relation»

«fragmentation»

* * extends1

Summary￭ Choosing the right representation makes a difference ￭Meta-model-based declaration of representations works

(might not be good enough)￭ There are applications that can benefit from different

representations

Object-by-object Fragments

Part-of-source Morsa, ( Java) XMI, EMF-Frag

Relations CDO ?

Objects

Backup

Possible Approaches: Different Target Platforms

SchemasXML

-Big Data

-Graphs

CAP-Theorem1

1Eric A. Brewer: Towards robust distributed systems; 19th ACM Symposium on Principles of Distributed Computing, 20002K. Barmpis and D.S. Kolovos. Comparative Analysis of Data Persistence Technologies for Large-Scale Models. XM 2012

XMI+Resources

ACID,structured data

ER-SchemasBig Relational Data

BASE,structured data

Possible Approaches: Different Types of Mapping

1Javier Espinazo-Pagán, Jesús Sánchez Cuadrado, Jesús García Molina: Morsa, A Scalable Approach for Persisting and Accessing Large Models; MoDELS 2011

bject m

g fragmentation

fast query,slow traversal,slow entry,(fine transactions)

fast query,slow traversal,slow entry,(fine transactions)1

per object m

apping

slow query,fast traversal,fast entry,(coarse trans.)

*ER-SchemasBig Relational Data/

Fragmentation: Types of references

￭ organizing large artifacts in different resources is already implemented in EMF￭ resources are loaded if necessary, objects in unloaded

resources are represented by proxy objects￭ objects in different resources (as all related objects) are

related through references, therefore models are fragmented along references￭ EMF-Fragments automatically fragments large models based

on annotations in the meta-model￭ resources are identified via URIs and can be serialized (e.g.

XMI), therefore resources can be stored in a key-value store

Fragmentation: Types of references

*normal

references

*«fragments»fragmenting

references

large value sets *

Applications

￭ HWL sensor and network operation data (or experiment data in general)￭ realtime persistence required ➜ fast data entry￭ hierarchical structured data (different sensors and other data sources) ➜ meta-modeling￭ queries for experiments, sensors, specific time periods ➜ only coarse simple queries￭ traversal of larger sub-trees, mostly applications based on data aggregation￭ actual demand for big-data depends on size of sensor network ➜ scalability

￭ CityGML models (or geo-spatial data in general)￭ standardized as XML-schemas ➜ XML based data￭ special proprietary indexes (e.g. spacial indexes like R-trees) and corresponding queries￭ rather query intense applications￭ actual demand for big-data depends on LOL of the models ➜ scalability

￭ Software Engineering￭ Code/Model Version Control￭ Mining Software Repositories (MSR)￭ revisions of AST-trees and differences between AST-trees ➜ existing meta-model based frameworks (e.g. designed

for reverse engineering purposes)￭ large number of revisions causes many large value sets￭ queries for revisions, compilation-units ➜ rather coarse queries￭ aggregations and statistics ➜ can be expressed in an OCL-like language￭ immediate demand for processing in (at least smaller) clusters￭ has to be mixed with relational data for some applications

Applications: Scientific Data

click *

xml-to-model

text-to-model*

Applications: CityGML

￭ XML-based standard ➜ meta-models can be generated (1-to-1 mapping)￭ different standards define XML-schemas that extend each

other: GML⇽CityGML⇽extensions￭ transparent use of spacial indexes ￭ map onto existing platforms (e.g. SpatialHadoop)￭ use existing implementations and persist into the key-value

￭ extensions to CityGML can be facilitated to reference CityGML-models as spatial context for sensor data

backup

Research Overview

WIRELESS SENSOR NETWORKS

DATA ANALYSIS FRAMEW

GEO INFORMATION SYSTEMS

sensor data

heterogenous networks

mesh-networks

cellular-networks

spatial dataregular databases

spatial databases

distributeddata stores

distributedanalysis

data homo-genisation

domain speci!c analysis languages

HWL: Commodity Hardware

‣120+ Nodes

‣indoor and outdoor

‣dense and sparse

‣short and long links

‣stationary and mobil nodes

‣120+ Nodes

‣indoor and outdoor

‣dense and sparse

‣short and long links

‣stationary and mobil nodes

Richtung Groß-Berliner Damm

Richtung Institut

Markus Scheidgen: H

WL – A

High-Perform

ance Wireless Sensor R

esearch Netw

Experiments: The Test Site

§ simplest case: two lane, newly paved road

§ spatially equally distributed nodes on both sides of the rode

§ 2x5 nodes§ homogeneous test-bed:

same nodes, equally calibrated, same stone ground

§ one camera to record control data

0 20 40 60 80 100 120 140 160 180 2000

450Single−sided Amplitude Spectrum

Frequency (Hz)

Channel ZChannel YChannel X

0 500 1000 1500 2000 2500 3000−2

−1.5

−0.5

Time sample (1/400 sec)

Time signal of all 3 channels

Channel ZChannel YChannel X

Markus Scheidgen: H

WL – A

High-Perform

esearch Netw

Experiments: Example Data

Amplitudes Frequencies

Markus Scheidgen: H

WL – A

High-Perform

esearch Netw

Experiment: Algorithm

§ Similar to earthquake detection: comparison of short and long moving averages (S=0.2s, L=4s)

= xth acceleration value (1)

mavg(s

,W ) =

i=x�W

� avg(s

, L)| (3)

= mavg(s

, S) (4)

= mavg(s

, L) (5)

�w = w

Data Management

Research Overview

WIRELESS SENSOR NETWORKS

DATA ANALYSIS FRAMEW

GEO INFORMATION SYSTEMS

sensor data

heterogenous networks

mesh-networks

cellular-networks

spatial dataregular databases

spatial databases

distributeddata stores

distributedanalysis

data homo-genisation

domain speci!c analysis languages

internetcellular

cellular

zigbee

Technological Infrastructure

Logical Infrastructure

actions

visualization

sensors

information

internetcellular

cellular

zigbee

information/knowledge

distributed programming models

data bases

data representation

algorithmsprocesses

programming languages

machine code radios

network protocols

hard drives

software engineering

algorithmsprocesses

programming languages

information/knowledge

distributed programming models

data bases

data representation

Complex Data Types

➡ complex data structures➡ lots of links between data objects➡ evolving structures➡ requires a type safe programming

environment that proliferates re-use

Large Amounts of Data

➡ a certain amount of data needs to be stored per second (HWL: 120 nodes)

~140x103 data objects per second~7MB/s serialized

➡ a certain amount of data needs to be stored all together (24h)

~12x109 data objects~600GB serialized

➡ Data analysis must complete in reasonable time. For live applications in real time.

From Click to ClickWatch

Click API software

Element

CompoundHandler

Complex Data Types: Meta-Modeling

This [ ] happens all the time in software modeling

state charts class diagrams MSCsOCL

context Fooself.properties-> foreach(a|a.x != a.y)

eclipse modeling framework (EMF)

➡ Distributed storage and links between different types of data is only a simple extension of existing technology: multi resource persistence is already implemented

DFS (HDFS)

key-value-store1

(hbase)

Large Amounts of Data: Problem Statement

1. Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. Bigtable: A distributed storage system for structured data (awarded best paper!). In Brian N. Bershad and Jeffrey C. Mogul, editors, OSDI, pages 205–218. USENIX Association, 2006.

2. Jeffrey Dean and Sanjay Ghemawat. Map/reduce: Simplified data processing on large clusters. In OSDI, pages 137–150. USENIX Association, 2004.

map/reduce2

(hadoop)

hierarchical data(XML, OGC standards)

data series(sensor data)

signal analysis, statistics, sensor-fusion

Large Amounts of Data: Approach

map/reduce(hadoop)

DFS (HDFS)

key-value-store(hbase)

hierarchical data(XML, OGC standards)

data series(sensor data)

signal analysis, statistics, sensor-fusion meta-model

structured datamodel transformations

Reference Representation in Large Metamodel-based Datasets

Technology

Module 7 TOGAF Content Metamodel

Standardization of representation of materials ... · Data-Supported Simulation-Driven Integrated Engineering of Materials, Manufacturing ... Material Metamodel –Steel Ontology

Multimodal Machine LearningMultimodal representation: a representation of data using information from ... component end-to-end. They achieve good performance but require large datasets

ODM - Ontology Definition Metamodel (OMG)

Tropos Metamodel and its Use

Structured Assurance Case Metamodel

automated sequential composition of deltas and related ... › smash › get › diva2:276872 › FULLTEXT04.pdf · An additional research to metamodel independent difference representation

Metamodel-based sensitivity analysis: polynomial … · Metamodel-based sensitivity analysis: polynomial ... Metamodel-based sensitivity analysis: polynomial chaos expansions

Information Management Metamodel

M05 Metamodel

Design Patterns for Metamodel Design

Metamodel Framework for Ontology

UWE Metamodel and Profile - uni-muenchen.deuwe.pst.ifi.lmu.de/download/UWE-Metamodel-Reference.pdf · UWE Metamodel and Profile – User Guide and Reference 7 The relationship between

UWE Metamodel Reference

Common Warehouse Metamodel (CWM) Specification2001/01/02 · 2 February 2001 CWM 1.0 vii 9.4.6 TriggerUsingColumnSet protected 9-242 9.5 OCL Representation of Relational Constraints

SPEM: Software Process Engineering Metamodel

Common Warehouse Metamodel (CWM) Specification

Common Service Metamodel - Repository home

Common Warehouse Metamodel (CWM)

Common Warehouse Metamodel (CWM) Specification · 2 February 2001 CWM 1.0 vii 9.4.6 TriggerUsingColumnSet protected 9-242 9.5 OCL Representation of Relational Constraints . . .