70
IoT Data Modelling with DDS Angelo Corsaro, PhD Chief Technology Ocer ADLINK Technologies Inc. [email protected]

Data Modeling for the Internet of Things with the Data Distribution Service

Embed Size (px)

Citation preview

IoT Data Modelling with DDSAngelo Corsaro, PhD Chief Technology Officer

ADLINK Technologies Inc. [email protected]

Information Management in IIoT Reference Architectures

Cop

yrig

ht P

rism

Tech

, 201

4

Shared Data Models

• Consumer and Industrial IoT are about extracting value from data

• To enable this, it is essential for IoT systems to share common data modes / taxonomies

• Without common data models systems evolve quickly into integration bazars

Cop

yrig

ht P

rism

Tech

, 201

4

RAMI 4.0

The Industrie 4.0 Reference Architecture (RAMI) is three dimensional and organises the life-cycle/value streams and the manufacturing hierarchy levels across the six layers of the IT representation of Industrie 4.0

Cop

yrig

ht P

rism

Tech

, 201

4

Information Layer

• Persistence of the data which represent the models

• Data integrity

• Consistent integration of different data

Cop

yrig

ht P

rism

Tech

, 201

4

I4.0 Component

The ability to virtualise physical entities and make information available is key to RAMI4.0 and captured as part of the I4.0 Component

Cop

yrig

ht P

rism

Tech

, 201

4

IIRA Functional Domains

• The IIRA decomposes an Industrial Internet System (IIS) in five functional domains: Control, Operation, Information, Application and Business

• Data flows and control flows take place in and between these functional domains.

Cop

yrig

ht P

rism

Tech

, 201

4

Information Domain

The Information Domain represents the collection of functions for gathering data from various domains, most significantly from the control domain, and transforming, persisting, and modelling or analysing those data to acquire high-level intelligence about the overall system.

Cop

yrig

ht P

rism

Tech

, 201

4

Data Modelling in IIRA and RAMI4.0

• IIRA and RAMI4.0 does not recommend a specific approach to data modelling

• That said, Object Oriented and Relational are the two mainstream data modelling techniques

• The data-centric nature of IoT applications makes — from my perspective — relational models more appropriate

DDS in IoT

Cop

yrig

ht P

rism

Tech

, 201

4

DDS in RAMI 4.0

• DDS addresses the requirement of the Communication and Information Layers

• Thus can be use as the interoperable mean of representing and sharing data

DDS

DDS

DD

S

Cop

yrig

ht P

rism

Tech

, 201

4

DDS in IIRA

• DDS is widely used for horizontal (east-to-west) communication on the Control and Information Layers and is in general applicable for horizontal across any view

• Due to its ability to deal with regular as well as real-time data flows DDS is applicable from the Control to the Business Layer

Real-Ti

me

SoftRe

al-Time

Interact

ive

DDS

DD

S

Data Modelling in DDS

Cop

yrig

ht P

rism

Tech

, 201

4

A Recurring Question

• People new to DDS recurrently ask a question: what are the techniques and patterns that we can use to design DDS-based Systems?

• My answer is usually: Start with the powerful tools and techniques provided by relational data modelling and then add some DDS-specific spice

• I’ve come to the conclusion that many people are not very familiar with relational data modelling, or perhaps it is way too long that they have studied/reviewed these concepts

• This webcast, will provide a relatively well introduction to the relational data model

The Relational Model

Cop

yrig

ht P

rism

Tech

, 201

4

Relational Model

• Introduced by Edward Codd in 1970 as a way of representing data models for Data Bases

• Simple and Elegant: A database becomes a collections of one or more relations where each relation is a table with rows and columns

Cop

yrig

ht P

rism

Tech

, 201

4

Relation

• The relation is the construct used representing data in the relational model, it consists of two dimensional table

• The columns of a relation are called attributes

• The name of the relation along with the set of attributes defines the relation schema

• The rows of the relation, other than the header containing the attribute names, are called tuples

Cop

yrig

ht P

rism

Tech

, 201

4

Relation’s Schema

• The relation schema specifies: - relation’s name - the name of each field/attribute, e.g. column - the domain of each field, e.g. the type of the field

• Example: - Student(sid:string,name:string,age:integer,gpa:real)

Cop

yrig

ht P

rism

Tech

, 201

4

Tuples• An instance of a relation is a set of tuples (records) in which each tuple has

the same number of fields as in the relation schema.

• A relation’s instance can be visualised as table where each tuple is a row and all rows have the same number of fields (columns)

• Notice that rows are all different. This is a requirement of the relational model, as a relation instance is a collection of unique tuples (or rows)

sid name age gpa

1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce Wayne 23 3.5

Cop

yrig

ht P

rism

Tech

, 201

4

Cardinality and Degree

• The cardinality of a relation R is defined as the number of tuples belonging to the relation

• The degree, or arity, of a relation R is defined as the number of its fields

Cop

yrig

ht P

rism

Tech

, 201

4

Keys

• The key of a relation is a set of fields that uniquely identifies a tuple

• A superkey is a set of attributes that includes the primary key

• Example: - The sid field is the key for the Students relations

sid name age gpa1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce Wayne 23 3.5

Cop

yrig

ht P

rism

Tech

, 201

4

Foreign Keys

• A foreign key allows to introduce a link between two relations

• For instance, the sid in the Courses relation is a foreign key allow to refer as well as introduce an integrity constraint to the students relations

sid name age gpa1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce Wayne 23 3.5

cid sid gradePhysics303 1234 A+Robotics323 2345 A+Calculus343 2345 A

Courses Students

Quick DDS Intro

Cop

yrig

ht P

rism

Tech

, 201

4

Data Distribution Service (DDS)

• DDS provides a Global Data Space abstraction that allow applications to autonomously, anonymously, securely and efficiently share data

• DDS’ Global Data Space is fully distributed, highly efficient and scalable

Cop

yrig

ht P

rism

Tech

, 201

4

Data Distribution Service (DDS)

• DataWriters and DataReaders are automatically and dynamically matched by the DDS Discovery

• A rich set of QoS allows to control existential, temporal, and spatial properties of data

Information Definition

Cop

yrig

ht P

rism

Tech

, 201

4

Topic• A Topic defines a domain-wide

information’s class

• A Topic is defined by means of a (name, type, qos) tuple, where - name: identifies the topic within the domain - type: is the programming language type

associated with the topic. Types are extensible and evolvable

- qos: is a collection of policies that express the non-functional properties of this topic, e.g. reliability, persistence, etc.

TopicType

Name

QoS

Cop

yrig

ht P

rism

Tech

, 201

4

Topic and Instances• As explained in the previous slide a topic defines a class/type of

information

• Topics can be defined as Singleton or can have multiple Instances

• Topic Instances are identified by means of the topic key

• A Topic Key is identified by a tuple of attributes -- like in databases

• Remarks: - A Singleton topic has a single domain-wide instance - A “regular” Topic can have as many instances as the number of different

key values, e.g., if the key is an 8-bit character then the topic can have 256 different instances

Cop

yrig

ht P

rism

Tech

, 201

4

Topic Example

• A Topic type can be defined in different syntaxes

• IDL is the most commonly used syntax

• Example:

TopicTypeName

QoS

structStudent{longsid;stringname;intage;floatgpa;};#pragmakeylistStudentsid

Cop

yrig

ht P

rism

Tech

, 201

4

Topics as Relations

• A Topic cans be seen as defining a relation

sid name age gpa

1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce Wayne 23 3.5

structStudent{longsid;stringname;intage;floatgpa;};#pragmakeylistStudentsid

Student(sid,name,age,gpa)

Cop

yrig

ht P

rism

Tech

, 201

4

Mapping DDS to the Relational Model

• Topics Types => Relation Schema

• Topic Instance => Key

• Topic Sample => Tuple

Cop

yrig

ht P

rism

Tech

, 201

4

Relational Design

• Start identifying corse relations and properties of data

• Start decomposing based on properties

• Apply a normal form - Functional Dependencies => Boyce-Codd Normal Form - Multivalued Dependencies => Fourth Normal Form

UML Data Modelling

Cop

yrig

ht P

rism

Tech

, 201

4

UML Data Modelling

• A subset of UML can be used to model Data Models

• The resulting model can be easily translated into a relational model and the used in a DBMS or DDS

• The allowed subset of UML are: - Classes (with only attributes) - Associations - Association Classes - Subclasses - Composition and Aggregation

• UML Data Models can be automatically translated into relational model as far as each “regular” class defines a primary key

Cop

yrig

ht P

rism

Tech

, 201

4

Class

• A UML class is mapped to a relation that has the same name of the class, shares its key and attributes

sid: intname: stringage: intgpa: float

Student

Student(sid,name,age,gpa)

Cop

yrig

ht P

rism

Tech

, 201

4

Association

• By default association can be mapped as follows, yet, depending on the multiplicity of the association different mappings may be possible/desirable

• The key definition in the association depends on the multiplicity

C1(K1,O1)

C2(K2,O2)A(K1,K2)K1: PK

O1

C1K2: PKO2

C2A

Cop

yrig

ht P

rism

Tech

, 201

4

1-to-many AssociationThere are two ways of mapping a 1-to-many association to the relational model

M1 Use a relation to capture the association

M2 Embed the association on the many side of the association

M1C1(K1,O1),C2(K2,O2),A(K1,K2)

M2C1(K1,O1),C2(K2,O2,K1)K1: PKO1

C1K2: PKO2

C2A0..1 *

Cop

yrig

ht P

rism

Tech

, 201

4

many-to-many Associations

C1(K1,O1)

C2(K2,O2)A(K1,K2)K1: PK

O1

C1K2: PKO2

C2A* *

Cop

yrig

ht P

rism

Tech

, 201

4

Relationships -arity

One to One One to Many Many to Many

K1 K2 K1 K2 K1 K2

Key = K2 Key = K1, K2Key = _Association Relation

Cop

yrig

ht P

rism

Tech

, 201

4

Association Classes

C1(K1,O1)

C2(K2,O2)A(K1,K2,a1,a2)

K1: PKO1

C1K2: PKO2

C2A

AAssociation

Cop

yrig

ht P

rism

Tech

, 201

4

Self Association

• Self association are modelled as traditional relations, which the only difference that attributes may be conserved

sid: intname: stringage: intgpa: float

Student

*

*

Slbling

Student(sid,name,age,gpa)

Sibling(sidParent,sidSibling)

tsdotd14

Cop

yrig

ht P

rism

Tech

, 201

4

SubclassesThree ways of mapping subclassing to the relational model

T1 Subclass relations contain the superclass key and the specialised attributes

T2 Subclass relations contain all attributes

T3 One relation containing all superclass and subclass attributes

T1 A(K, X), B(K, Y), C(K, Z)

T2 A(K, X), B(K, X, Y), C(K, X, Z)

T3 A(K, X, Y, Z)

The best translation may depend on the the context, e.g. T3 good for heavily overlapping subclasses, T2 good for disjoint and complete subclasses

K: PKX

A

YB

ZC

Cop

yrig

ht P

rism

Tech

, 201

4

Composition and Aggregation

• The precondition to easily map composition to the relational model is for the part not to have a key

K: PKW

Whole

PPart Whole(K, W) Part(P, K)

• When mapping aggregation (unfilled diamond), the key K on the Part should have a domain that allows for null values

Cop

yrig

ht P

rism

Tech

, 201

4

Summing Up

• A subset of UML can be used to model relational data models

• The mapping rules can be used to help translating existing Object Oriented data models into their relational counter-part

Refinement

Cop

yrig

ht P

rism

Tech

, 201

4

Why Relation Refinement?

• The UML/ER Data Models provide usually a good starting point toward the data model that we’ll actually use in the system

• The relations implied by the UML/ER Data Model often need to be normalised and re-organised to address performances and workload considerations

• The goal of relation refinements is to remove redundancy and/or decompose a relation with smaller relations

• Normal forms provide a way to measure the amount of redundancy that may be in our data model

Cop

yrig

ht P

rism

Tech

, 201

4

Redundancy

• Redundant Storage: Information may be stored multiple times leading to space, and perhaps time, inefficiencies

• Update Anomalies: If one copy of the redundant information is update this may create inconsistencies in other copies — unless all copies are updated at the same time

• Insertion Anomalies: It may not be possible to store some information, unless some other information is stored as well

• Deletion Anomalies: It may not be possible to delete some information without loosing some other information as well

Cop

yrig

ht P

rism

Tech

, 201

4

Decomposition

• Unconsidered decomposition can lead more problems than benefits, thus when decomposing you always want to ensure that: - You really need to decompose the relation - You fully understand the implications of the decomposition (lossless join,

dependency preservation)

• Normal Forms provide good guidelines for relations decompositions as they guarantees that certain class of problems cannot be introduced

• Notice that decomposition can have a performance impact as it may lead to an increase in joins

Cop

yrig

ht P

rism

Tech

, 201

4

Functional Dependencies

• A Functional Dependency (FD) is a kind of Integrity Constraint (IC) that generalises the concept of a key

• Given a relation R along with two nonempty sets of attributes X and Y in R, we say that R satisfies the FD X ⟶ Y if the following holds for every pair of tuples t1 and t2 in R:

• In other terms, the FD says that if two tuple agree on the set of attributes on X they also agree on the set of attributes in Y

• Notice that a primary key constraint is a special kind of FD

if t1.X = t2.X then t1.Y = t2.Y

Cop

yrig

ht P

rism

Tech

, 201

4

Example

• Let’s assume our Student relation now includes a new attribute that measure the percentile of the student GPA, e.g. which percentage of students has a GPA that is smaller of equal

• Clearly we have that the percentile attribute functionally depends on gpa, or equivalently gpa ⟶ percentile

sid name age gpa percentile

1234 Peter Parker 21 4.0 1002345 Tony Stark 15 4.0 1003456 Bruce Wayne 23 3.5 75

Normal Forms

Cop

yrig

ht P

rism

Tech

, 201

4

Normal Forms

• Different Normal Forms (NF) exist that provide guidance on how to decompose relations

• If a relation is in a given normal form then we are guarantees that some anomalies cannot arise, e.g. update anomaly, etc.

• The normal forms based on functional dependencies are the first normal form (1FN), second normal form (2FN), third normal form (3NF) and the Boyce-Codd normal form (BCNF)

• Every relation in BCNF is also in 3NF, every relation in 3FN is also in 2FN and finally every relation in 2NF is also in 1NF

• The 2NF and 3NF have only historical interest, while the BCNF has important practical applicability

Cop

yrig

ht P

rism

Tech

, 201

4

1NF

• A relation is in 1NF if every field contains only atomic values, that is not lists, or sets

Cop

yrig

ht P

rism

Tech

, 201

4

Boyce-Codd Normal Form (BCNF)Let R be a relation, X a subset of attributes of R and a an attribute of R. R is in Boyce-Codd Normal Form (BCNF) if for every FD: X ⟶ {a} that holds over R, one of the following is true:

• a ∊ X, that is it is a trivial FD, or

• X is a superkey

Intuitively, in a BCNF relation the only nontrivial dependencies are those in which a key determines some attributes. Each attribute must describe the key, the whole key, and nothing but the key

key attr 1 attr 2 attr k

Functional Dependencies in BCNF

Cop

yrig

ht P

rism

Tech

, 201

4

Shortcomings of BCNF and 4NF

• Dependency enforcement may require joins

• Query workload — due to excessive joins

• Over-decomposition

Relational Algebra

Cop

yrig

ht P

rism

Tech

, 201

4

Selection and Projection

• Relational algebra provides operators to select rows (σ) an to project columns from a relation (π)

• These operation allow to operate on a single relation

Examples:

sid name age gpa

1234 Peter Parker 21 4.02345 Tony Stark 15 4.03456 Bruce

Wayne23 3.5

σage<20 (Student)sid name age gpa

2345 Tony Stark 15 4.0

Student πname,gpa(Student)

name gpa

Peter Parker 4.0Tony Stark 4.0

Bruce Wayne

3.5

Cop

yrig

ht P

rism

Tech

, 201

4

Joins

• Join is one of the most useful operator in relational algebra and is most commonly used to combine/reassemble information from two or more relations

• Join is conceptually a cross product followed by a selection and projection

Cop

yrig

ht P

rism

Tech

, 201

4

Condition Joins

• Condition joins are the most general form of joins. This operation takes a condition and two relations and is defined as follows:

R ⋈c C = σc(RxS)

Cop

yrig

ht P

rism

Tech

, 201

4

Equijoin

• Equijoin is a special case of the Condition Join, where the condition predicates on attribute equality

Cop

yrig

ht P

rism

Tech

, 201

4

Natural Join

• A Natural Join is a special Equijoin that operates on all the attributes having the same name in R and S

Back to DDS

Cop

yrig

ht P

rism

Tech

, 201

4

Relational Design in DDS

• Start identifying corse relations and properties of data

• Start decomposing based on properties (can use UML for this)

• Apply a normal form - Functional Dependencies => Boyce-Codd Normal Form - Multivalued Dependencies => Fourth Normal Form

• Define QoS for the resulting relations and further decompose if you incur in some QoS Mix (more later)

Cop

yrig

ht P

rism

Tech

, 201

4

Relational Algebra

• DDS Supports: - Selection for a given Topic DDS queries and filters - Conditional Joins across multiple Topics via the Multi-Topics

• DDS uses a subset of SQL-92 to express selections, projections and joins

Cop

yrig

ht P

rism

Tech

, 201

4

DDS Specific Decomposition

• In some instances you may find that a topic (relation) R has two disjoint sets of attribute X and Y that have conflicting temporal, reliability or durability requirements

• In this case this relation has to be further decomposed

Cop

yrig

ht P

rism

Tech

, 201

4

Frequency Mix

• Suppose you have a relation R(K, X,Y) were the set of attributes X changes far more frequently than the set of attributes Y (e.g. position, vs. velocity)

• In this case you should decompose the relation R into:

• This will reduce the resource usage in your system, e.g. bandwidth as well as CPU but may introduce consistency issues. If consistency is essential then coherent updates should be used to atomically update R1 and R2

R1(K, X), R2(K, Y)

Cop

yrig

ht P

rism

Tech

, 201

4

Reliability Mix

• Suppose you have a relation R(K, X,Y) were the set of attributes Y represent some soft-state.

• In this case you should decompose the relation R into:

• This decomposition allows to only use reliable distribution for R1 and best-effort for R2 thus reducing resource usage in the system

R1(K, X), R2(K, Y)

Cop

yrig

ht P

rism

Tech

, 201

4

Durability Mix

• Suppose you have a relation R(K, X,Y) were the set of attributes X requires a different durability than the set of attributes Y, e.g. X need sto be persistent while Y volatile

• In this case you should decompose the relation R into:

• This will reduce the resource usage in your system and reduce the pressure on the Durability Service

R1(K, X), R2(K, Y)

Summing Up

Cop

yrig

ht P

rism

Tech

, 201

4

Concluding Remarks

• The relational model provides a powerful toolkit for data-modelling in IoT

• DDS Topics are relations and DDS supports a subset of relational algebra to manipulate these relations (topics)

• The design process is as follows: - Start modelling your system using the UML Data Modelling subset - Ensure your model is in BCNF or 4NF — make sure your understand why some

violations are necessary/desirable for your system - Add QoS to your relations - Evaluate if further decomposition is required due to QoS mixes — if your data

model is properly normalised