17
Advances and Challenges of Big Data Computing Platforms Liqiang Wang Associate Professor Department of Computer Science UCF

Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Advances and Challenges of Big Data Computing Platforms

Liqiang WangAssociate Professor

Department of Computer ScienceUCF

Page 2: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Big Data: Batch Processing &

Distributed Data StoreHadoop/Spark;

HBase/Cassandra

BI ReportingOLAP &

Dataware house

Business Objects, SAS, Informatica, Cognos other

SQL Reporting Tools

Interactive Business Intelligence &

In-memory RDBMSTableau, HANA

THE EVOLUTION OF BUSINESS INTELLIGENCE

1990’s

2000’s

2010’s

Big Data:More Intelligent and Real Time

Ongoing

Page 3: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

3Source: Dion Hinchcliffe, “The enterprise opportunity of Big Data: Closing the ‘clue gap,'”

Page 4: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Essential Training at UCF (Pending)

4

Fundamentals of Cyberinfrastructure

Programming Models and Languages

Data Exploration and

Visualization

Big Data Computing

Data Analytics Case Studies

Adaptive Learning

Virtualization-based Lab Training Sustainability

Effectiveness

Training Concepts

Enhancement Methods Training Aims

Scalability

Data Mining & machine Learning

Page 5: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Hadoop Architecture Hadoop consists of Hadoop 1.0: HDFS and MapReduce Hadoop 2.0: HDFS, Yarn, and MapReduce

5

Page 6: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Hadoop 1 vs 2

6

Hadoop1 Hadoop 2

Components HDFS, MapReduce HDFS, Yarn,MapReduce, other module

Scalability Less More

Name Node Single Multiple

Resource Management Slot Container

Job Type MapReduce MapReduce, MPI, Spark

Reliability Worse Better

JVM re-use Yes No

Page 7: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Yarn & HDFS

7

Page 8: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

combinecombine combine combine

ba 1 2 c 9 a c5 2 b c7 8

partition partition partition partition

mapmap map map

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

Shuffle and Sort: aggregate values by keys

reduce

reduce

reduce

a 1 5 b 2 7 c 2 8 9

r1 s1 r2 s2 r3 s3

c 2

Page 9: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Why Use MapReduce Instead of Classical Supercomputing?

9

Page 10: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

ComparisonMPI Hadoop/Spark

Node Communication Supports more frequent node communication (tightly coupled)

Usually nodes do not communicate directly (loosely coupled)

Disk I/O Usually load data once Every nodes read/write its own data

Fault tolerance No Yes

Auto-Scaling No Yes

ApplicationsCPU-Intensive Scientific Computing

Data-Intensive Analytics

Challenging ResearchIssues

Scalability Resilience (including

checkpointing) Energy-efficiency

Performance Tuning Integration with Edge

Computing & IoT 10

Page 11: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Hadoop is Slow in Machine Learning!

11

Logistic regression in Hadoop and Spark

Page 12: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Spark vs Hadoop

Spark key features Apache Spark Hadoop MapReduce

Speed Ten to hundred times faster than MapReduce

Slower

Analytics Supports streaming, machine learning, complex analytics, etc

Simple Map and Reduce tasks

Suitable for Real-time streaming Batch processing

Coding Lesser lines of code More lines of code

Processing location In-memory Local disk

12

Page 13: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Spark is Based on Hadoop

COSC 4010/5010 Introduction to HPC 13

Page 14: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Why is Machine Learning Booming Now?

14

Big Data Big Computing Power

Page 15: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Evolution of Machine Learning

15

Page 16: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Distributed Machine Learning

Examples: Tensorflow Simple structure Based on MPI

COSC 4010/5010 Introduction to HPC 16

Page 17: Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP & Dataware house. Business Objects, SAS, Informatica, Cognos other SQL Reporting

Thank you !