41
THE ERA OF BIG DATA: From IoT to NewSQL Daniela Barreiro Claro

THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

THE ERA OF BIG DATA: From IoT to NewSQL

Daniela Barreiro Claro

Page 2: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

The era of Big Data

RDBMS

NOSQL

NewSQL

Big Data Analytics

Where is our course?

Outline

2 de X;X=38 Prof. Daniela Barreiro Claro

Page 3: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Introduction

3 de X

Are you ready for the BigData era?

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Page 4: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Introduction

Are you ready for the BigData era?

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Page 5: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Introduction

Big Data = cloud+social+mobile

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Page 6: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Introduction

FORMAS - UFBA 6 de X

What is BIG DATA?

Big data is data that exceeds the processing

capacity of conventional database systems.

The data is too big, moves too fast, or doesn’t fit

the structures of a database architecture

The buzzword started by 2012

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 7: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Internet of Things

1. Adrian McEwen & Hakim Cassimally. Designing the Internet of Things, 7 de X

Physical Objects

+

Controller, Sensor, and Actuators

+

Internet

=

Internet of Things

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 8: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Internet of Things

FORMAS - UFBA 8 de X

Integrate things into

the existing web

HTML and REST

Smart things

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 9: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Introduction

FORMAS - UFBA 9 de X

RDBMS are 25- year-old legacy code lines

that should be retired in favor of a collection

of from-scratch specialized engines

(Stonebraker et al.)

Are we really prepared to the death of the

relational area?

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 10: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS

10 de X

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

One-size-fits-all

If you wanted to

build

an ecommerce shop

a banking core

rental car website

Database skills:

You need to deeply know

about a UNIQUE RDBMS

Page 11: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS

11 de X

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Page 12: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS

12 de X

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Page 13: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS

13 de X

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Prof. Daniela Barreiro Claro

Drawbacks:

experts in only one database

technology.

Vertical scalability

Hard and costly to make horizontal

scalability

Models do not fit all cases

Structured

Do not deal well with non structured

data

Strengths

Experts in only one

database technology

Standard

SQL

Security (ACID)

Triggers

Joins

Composed keys

Structured

Page 14: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS

FORMAS - UFBA 14 de X

ACID are absolutely essential for most operational systems and

online transaction processing systems, including retail, banking,

and finance

ACID compliance may not be important to

a search engine that may return different results to two users

simultaneously, or

to Amazon when returning sets of different reviews to two users.

In these applications, speed and performance triumph the

consistency of the results.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 15: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

FORMAS - UFBA 15 de X

No SQL then Not Only SQL

Non structured

Eventual consistency

Cap Theorem (Consistency, Availability, Partitions

tolerance)

Main memory

Data stored in graphs, key-value, columns format

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 16: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

FORMAS - UFBA 16 de X

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 17: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

High performance

Horizontal scalability

Diversity of models

Flexible schema

High availability

Manage well non structured

data and big data

Flexible schema

It is not secure at all

Eventual consistency

There is not a standard

query language

Strengts Drawbacks

NOSQL

17 de X FORMAS - UFBA

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 18: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

18 de X

3-4 “V”s

Volume

Variety

Velocity

Value

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 19: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Few

solutions

are here

Most

NOSQL lives here

Cap theorem:

You can only have

two out of three

Consistency,

Partition tolerance,

Availability

Page 20: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

FORMAS - UFBA

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 21: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

21 de X Prof. Daniela Barreiro Claro

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 22: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

22 de X Prof. Daniela Barreiro Claro

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

select sum(salary)

from customerperson

Analytical queries

Page 23: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

23 de X Prof. Daniela Barreiro Claro

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Compression

Poor compression

ratio (low repetition)

Good compression

ratio (high repetition)

Page 24: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NOSQL

24 de X Prof. Daniela Barreiro Claro

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Insertion

Insert * into customerperson

Page 25: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 25 de X

A problem situation

Perhaps you have gigabytes to terabytes of data that needs high-speed

transactional access.

You have an incoming event stream (sensors, mobile phones, network access

points) and need per-event transactions to compute responses and

analytics in real time.

Your problem follows a pattern of “ingest, analyze, decide,” where the

analytics and the decisions must be calculated per-request and not post-

hoc in batch processing.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 26: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 26 de X

A problem situation

Perhaps you have gigabytes to terabytes of data that needs high-speed

transactional access.

You have an incoming event stream (sensors, mobile phones, network access

points) and need per-event transactions to compute responses and

analytics in real time.

Your problem follows a pattern of “ingest, analyze, decide,” where the

analytics and the decisions must be calculated per-request and not post-

hoc in batch processing.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 27: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 27 de X

It is a new concept from 2011

Bring together the best of relational

database and the best of NOSQL

More tables…distributed database

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 28: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

28 de X FORMAS - UFBA

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 29: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

ACID

SQL

Standard

Structured

High performance

Horizontal scalability

High availability

Model does not fit all cases

Does not tackle well with

non structured data

Structured

New concept (2011)

Do not have resources, tools

as relational and nosql

Strengths Drawbacks

NewSQL

29 de X FORMAS - UFBA

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 30: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 30 de X

NuoDB

a cluster-first SQL database with a focus on cloud:

run on many nodes across many datacenters

let the underlying system manage data locality and consistency for you

NuoDB is the closest to being called eventually consistent of

the NewSQL systems

Hekaton

adds sophisticated in-memory processing to the more traditional

Microsoft SQL Server.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 31: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 31 de X

MemSQL

often offers faster OLAP analytics than all-in-one OldSQL systems,

with higher concurrency and the ability to update data as it’s

being analyzed

focus on clustered analytics

Distributed, with MySQL compatibility

VoltDB

the most mature of these systems, combines streaming

analytics, strong ACID guarantees and native clustering

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 32: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

NewSQL

FORMAS - UFBA 32 de X

VoltDB

Is the system-of-record for data-intensive applications, while

offering an integrated high-throughput, low-latency

ingestion engine.

It’s a great choice for policy enforcement, fraud/anomaly

detection, or other fast-decisioning apps

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 33: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

RDBMS x NOSQL x NewSQL

33 de X FORMAS - UFBA

Page 34: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Data Analytics

FORMAS - UFBA 34 de X

Traditional approach

Decision makers wait for reports from disparate OLTP systems

Put it all together in a spread-sheet

Highly manual process

In the Web context

Data capture at the user interaction level:

in contrast to the client transaction level in the Enterprise context

As a consequence the amount of data increases significantly

Greater need to analyze such data to understand user behaviors

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 35: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Data Analytics

FORMAS - UFBA 35 de X

Scalability to large data volumes:

Scan 100 TB on 1 node @ 50 MB/sec = 23 days

Scan on 1000-node cluster = 33 minutes

Divide-And-Conquer (i.e., data partitioning)

Cost-efficiency:

Commodity nodes (cheap, but unreliable)

Commodity network

Automatic fault-tolerance (fewer admins)

Easy to use (fewer programmers)

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 36: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Data Analytics

36 de X

Evolution

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Operational:

Reporting

Tactical:

Data Analysis

Strategic:

Mining &

Statistics

Future:

Learning?

Page 37: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Big Data Analytics

37 de X FORMAS - UFBA

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 38: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Big Data Analytics

FORMAS - UFBA 38 de X

Big data analytics is the process of examining large data sets

containing a variety of data types (i.e. Big Data) to discover hidden

patterns, unknown correlations, market trends, customer preferences

and other useful business information.

To analyze large volumes of transaction data, as well as other forms

of data

Examples: Web server logs and Internet stream data, social media content and social

network activity reports, text from customer emails and survey responses, mobile-

phone call detail records and machine data captured by sensors connected to the

Internet of Things.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 39: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Big Data Analytics

FORMAS - UFBA 39 de X

Traditional analytical tools comprise basic business intelligence

examine historical data

Tools for advanced analytics

focus on forecasting future events and behaviors, allowing businesses to conduct what-

if analyses to predict the effects of potential changes in business strategies.

Predictive analytics, data mining, big data analytics, and location

intelligence are just some of the analytical categories that fall under

the heading of advanced analytics.

These technologies are widely used in industries including marketing,

healthcare, risk management, and economics.

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 40: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

Where is our course?

FORMAS - UFBA 40 de X

Data Analytics

Big Data Analytics

Data Mining for Structured Data

Big Data

RDBMS

NOSQL

NewSQL

Data Analytics

Our course

Page 41: THE ERA OF BIG DATA - Ufbaformas.ufba.br/dclaro/mat700/Aula 1 - The Era of...Introduction FORMAS - UFBA 6 de X What is BIG DATA? Big data is data that exceeds the processing capacity

/formasresearchgroup /formasresearchgroup

www.formas.ufba.br

Semantic Applications and Formalisms Research Group

Prof. Daniela Barreiro Claro

Email: [email protected]