32
Apache Phoenix with Actor Model (Akka.io) for Real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ? By http://nguyentantrieu.info Tech Lead at eClick team - FPT Online

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack

Embed Size (px)

DESCRIPTION

Apache Phoenix with Actor Model (Akka.io) for real-time Big Data Programming Stack Why we still need SQL for Big Data ? How to make Big Data more responsive and faster ?

Citation preview

Page 1: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Apache Phoenix with Actor Model (Akka.io) for Real-time Big Data Programming Stack

Why we still need SQL for Big Data ?How to make Big Data more responsive and faster ?

By http://nguyentantrieu.infoTech Lead at eClick team - FPT Online

Page 2: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Contents

1. What is Big data and Why ?2. When standard relational database (Oracle,MySQL, ...) is

not good enough3. Common problems in big data system4. Introducing open-source tools in Big Data System

a. Apache Phoenix for ad-hoc queryb. Actor Model and Akka.io for reactive data processing

Page 3: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

What Does Big Data Actually Mean?

“Big data means data that cannot fit easily into a standard relational database.”

Hal Varian- Chief Economist, Googlehttp://www.brookings.edu/blogs/techtank/posts/2014/09/11-big-data-definition

Page 4: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

When standard relational database (Oracle,MySQL, ...) is not good enough

the “analytic system” MySQL database from a startup, tracking all actions in mobile games: iOS, Android, ...

Page 5: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Complex analytic system and the “scale” pain

Page 6: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Definition from the crowd

“Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

Jonathan Stuart Ward and Adam BarkerSource:http://arxiv.org/abs/1309.5821http://www.technologyreview.com/view/519851/the-big-data-conundrum-how-to-define-it/

Page 7: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

“Chaotic” fact and the demand

80% of that data is unstructured or “chaotic”Photos, videos and social media posts - data that says so much about us - but cannot be analyzed via traditional methods

Demand:

“Finding order among chaos”

Page 8: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

3 common problems in Big Data System

1. Size: the volume of the datasets is a critical factor.

2. Complexity: the structure, behaviour and permutations of the datasets is a critical factor.

3. Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.

Page 9: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Introducing open-source tools in Big Data System

Apache Phoenix

as SQL ad-hoc query engine

Actor Model as nano-service for reactive data computation

in the dawn of “Fast data”

Page 10: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Some innovative tools were born in the dawn of Big Data Age

Page 11: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

But could an elephant fly without wings ?

Page 12: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack
Page 13: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

But a phoenix can fly !

Page 14: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

What is Apache Phoenix ?

Apache Phoenix is a SQL skin over HBase. It means scaling Phoenix just like scale-up and scale-out the Hbase

Page 15: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

PhoenixSQL Engine

Page 16: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Interesting features of Apache Phoenix ● Embedded JDBC driver implements the majority of java.sql interfaces,

including the metadata APIs.● Allows columns to be modeled as a multi-part row key or key/value cells.● Full query support with predicate push down and optimal scan key

formation.● DDL support: CREATE TABLE, DROP TABLE, and ALTER TABLE for

adding/removing columns.● Versioned schema repository. Snapshot queries use the schema that was

in place when data was written.● DML support: UPSERT VALUES for row-by-row insertion, UPSERT

SELECT for mass data transfer between the same or different tables, and DELETE for deleting rows.

● Limited transaction support through client-side batching.● Single table only - no joins yet and secondary indexes are a work in

progress.● Follows ANSI SQL standards whenever possible● Requires HBase v 0.94.2 or above ● 100% Java

Page 17: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack
Page 18: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

the Phoenix table schema

Page 19: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Setting JDBC Phoenix Driver

Page 20: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Phoenix and SQL tool in Eclipse 4

Page 21: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Phoenix vs Hive (running over HDFS and HBase)

http://phoenix.apache.org/performance.html

Page 22: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Actor Model in the dawn of “Fast data”

Page 23: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

http://youtu.be/TnLiEWglqHk - Google I/O 2014 - The dawn of "Fast Data"

Page 25: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

What is actor model ?

● Carl Hewitt defined the Actor Model in 1973 as a mathematical theory that treats “Actors” as the universal primitives of concurrent digital computation.

● A fitting model for heavily-parallel processing in a cloud environment

Page 26: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

What actor model ?

Page 27: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

is the framework for implementing Actor computation

Page 28: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Inspired by MillWheel of Google and Storm of Twitter, I have developed my own framework, the “Rfx” (Reactive Functor Extension) with Akka as core

Page 29: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

The pipeline of finding social trends in real-time analytics

Page 30: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Facebook Social Trending from a website

Page 31: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Quick demo

Using Akka (Rfx) and Apache Phoenix for Social Media Real-time Analytics

Page 32: Apache Phoenix with Actor Model (Akka.io)  for real-time Big Data Programming Stack

Links for self-study and researchActor Model and Programming:● http://nguyentantrieu.info/blog/the-architecture-for-real-time-event-processing-

with-reactive-actor-model● http://www.slideshare.net/drorbr/the-actor-model-towards-better-concurrency● http://www.infoq.com/articles/reactive-cloud-actors● http://www.mc2ads.com/p/rfx-for-big-data-developer.html

Apache Phoenix● http://java.dzone.com/articles/apache-phoenix-sql-driver

● http://phoenix.apache.org/Phoenix-in-15-minutes-or-less.html

Big Data and Data Science● http://www.mc2ads.com and http://www.mc2ads.org● http://datascience101.wordpress.com● http://lambda-architecture.net● http://www.bigdata-startups.com● https://www.coursera.org/course/datasci