17
Sensor Data Storage Performance: SQL or NoSQL, Physical or Virtual van der Veen et al 2012 Luca Rossetto

Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Sensor Data Storage Performance:

SQL or NoSQL, Physical or Virtual van der Veen et al 2012

Luca Rossetto

Page 2: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Introduction

Large increase in sensor usage

Leads to more sensor data

Data has to be stored and processed

How to do that best?

29.11.2012 Luca Rossetto 2

Page 3: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Sensor Data

29.11.2012 Luca Rossetto 3

Generated by small microcontroller based devices

Usually raw data from one or many different

instruments

Measurements taken periodically or on certain

events

Generates lots of small data packets

Usually composed of

Instrument/Sensor/Node Id

Timestamp

Measured value

Page 4: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

The Goal

Compare three database systems on relative

performance with sensor data, both natively an

virtualized (using XenServer)

PostgreSQL

Cassandra

MongoDB

Give suggestions which system to use for which kind

of application

29.11.2012 Luca Rossetto 4

Page 5: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

PostgreSQL

SQL database

Mostly SQL 2008 compliant

Maintained by an open source community since

1997

Primarily designed to run on one server

Written in C

29.11.2012 Luca Rossetto 5

Page 6: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Apache Cassandra

NoSQL database

Structured key-value Store

Designed to scale over very large amount of

commodity servers

Open Source since 2008

Originally developed by Facebook, now maintained

by Apache

Written in Java

Currently used by Twitter, Digg and Reddit

29.11.2012 Luca Rossetto 6

Page 7: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

MongoDB

NoSQL database

Key-value store for BSON objects

Designed for one or few commodity servers

Developed since 2007, public since 2009

Written in C++

29.11.2012 Luca Rossetto 7

Page 8: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

The Setup

I’m up to 16

sensor nodes

24 CPU cores @ 2.2GHz

64GB RAM

2 x 2TB HDD

1Gb/s Ethernet

29.11.2012 Luca Rossetto 8

Page 9: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Scenarios

Single client, single operation

One sensor node writes data whenever a new

measurement is taken

One analyst accesses single entries

Single client, multiple operations

One sensor node writes data in batches

One analyst accesses many data items at once

Multiple clients, single operation

Multiple clients, multiple operations

29.11.2012 Luca Rossetto 10

Page 10: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Use of indexes

MongoDB and PostgreSQL can be used with or

without indexes

Cassandra automatically generates indexes

«…The speed difference between writes with and

without indexes is very low…»

Read access with indexes is significantly higher

Therefore, indexes are used in all three database

systems

29.11.2012 Luca Rossetto 11

Page 11: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Single client, single operation

Write Read

29.11.2012 Luca Rossetto 12

Page 12: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Single client, multiple operations

Write Read

29.11.2012 Luca Rossetto 13

Page 13: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Multiple clients, single operation

Write Read

29.11.2012 Luca Rossetto 14

?

Page 14: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Multiple clients, multiple operations

Write Read

29.11.2012 Luca Rossetto 15

?

Page 15: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Recommendations

29.11.2012 Luca Rossetto 16

Use Cassandra for large scale critical sensor

applications

Use MongoDB for small to medium sized non-critical

sensor applications

Use PostgreSQL for applications which require

flexible or complex querying and a high read

performance

Page 16: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

Some comments

29.11.2012 Luca Rossetto 17

Test setup does not accurately represent actual use

case

No simulation of real network

To few nodes

Inappropriate assumptions concerning node capabilities

To few measurements

Only one server used

Cloud application planned as future work

Results directly reflect properties of tested database

systems

Page 17: Sensor Data Storage Performance: SQL or NoSQL, Physical or ...€¦ · Sensor Data 3 Luca Rossetto 29.11.2012 Generated by small microcontroller based devices Usually raw data from

SELECT questions FROM audience

29.11.2012 Luca Rossetto 18

Thank you for your attention

Questions?