23
Big data - Overview - 2016/03/04 Mulodo Vietnam Co., Ltd.

Big data (overview) - (MOSG)

Embed Size (px)

Citation preview

Page 1: Big data (overview)  - (MOSG)

Big data - Overview -

2016/03/04 Mulodo Vietnam Co., Ltd.

Page 2: Big data (overview)  - (MOSG)

“Big data”

Page 3: Big data (overview)  - (MOSG)

Types Science :

LHC: Large Hadron Collider

Medical : Gene analysis

Market (IT?): Business use

What is “Big data”?

Page 4: Big data (overview)  - (MOSG)

Types Science :

LHC: Large Hadron Collider

Medical : Gene analysis

What is “Big data”?

Market (IT?): Business use

Page 5: Big data (overview)  - (MOSG)

History of Data processing

50’s - “BI : Business Intelligence” (1958) 80’s - “DSS : Decision support system” (80’s) - “SQL86” (1986) - “Knowledge Discovery in Databases” (1989) - “BI (Redefinition)” (1989) 90’s - “Data Warehouse” (1990) - “OLAP: online analytical processing” (1993) - “Improvement of computing power” (90’s) - “Price reduction of storage” (90’s) - “Data Mining” (1996)

Page 6: Big data (overview)  - (MOSG)

History of Data processing2000’s - “Spread of The Internet” (00’s) - ‘Google: Big data stack 1.0’ (00’s) - “MapReduce framework” (2004) - “Independence of Hadoop project from Nutch” (2006) - “Amazon: S3” (2006) - “Explosive prosperity of EC” (00’s)

2010’s - “Big data” in ‘The Economist(UK)’ (2010) - “Google: BigQuery” (2010) - “fluentd” (2011) - “Amazon: Redshift” (2012) - “DMP: data management platform” (10’s) - “Google: Big data stack 2.0-3.0” (10’s) - “Apache crunch, Implara, Prest,...” (10’s)

Page 7: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Let's look back on the history of Big data

(Especially storage and query engine)

Page 8: Big data (overview)  - (MOSG)

80's 90's 00's 10's

SQL(86)

Easy to use, structured/ruled.

independent from storage

Page 9: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

SQL(86)

big data stack/GFS

use HUGE data batch like process (for huge logs)

But, Proprietary

Too Huge to treat on usual RDBMS

Page 10: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

HBaseOpen source products!

We need source. We love freedom.

Page 11: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

pig

Easy to useE-commerce require huge data analysis.

M/R is too heavy to use......

Page 12: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

pig Hive SQL -> (M/R) -> Result

Pig Original language <=> (M/R)

Page 13: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

Dremel

pig

Google announced Dremel

for interactive analysis

of huge data

BigQuery

We want analyze huge data interactively.

Page 14: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

Dremel

pig

BigQuery

Dremel 1. divide SQL for shards 2. process them in parallel.

It’s Not a wrapper of M/R, but process SQL super parallel. (ie. full scan for each query with thousands servers w/o index)

Page 15: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

DremelPrestoImpala

pigOpen source products!

We need source. We love freedom.

Page 16: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

Hive

HBase

DremelPrestoImpala

pig

Add social circumstances on this figure.

Page 17: Big data (overview)  - (MOSG)

80's 90's 00's 10's

Map Reduce

big data stack/CFS

BigQuery

SQL(86)

Hadoop

big data stack/GFS

Hive

HBaseHDFS

DremelPrestoImpala

pig

RedshiftS3

DWHDataMining

BI BIDSS

DMP

computing powerImprovement of

StoragePrice reduction of Spread of The Internet

Explosive prosperity of EC

Page 18: Big data (overview)  - (MOSG)

Many requests Many solutions...

Page 19: Big data (overview)  - (MOSG)

Many requests Many solutions...

But you can think which solution is better for your project. (I hope)

Page 20: Big data (overview)  - (MOSG)

How to use Big dataA) How to aggregate data? - huge amount of data - too high frequency data

B) How to maintenance data? - Data will increase.... - Query engine cost, Storage cost. - Data check cost

C) How to analyze data? (what for?) - UI / UX — Understanding of business requirements

Page 21: Big data (overview)  - (MOSG)

How to aggregate data<Libevent shock> parallel -> event driven. * similar to “parallel -> USB” Fluentd - Async - (Puseudo) realtime <-> Periodic Batch

other - logstash - Lamda and Kinesis (AWS) - ...

Page 22: Big data (overview)  - (MOSG)

How to analyze dataUI / UX <solution set for log monitering> * ELK : logstash + Elastic search + Kibaa

* Fluentd + Norikra + GrowthForecast

Page 23: Big data (overview)  - (MOSG)

Next : * Trying some storage

* Trying to build system design

* Diving to some solutions