Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP...

Advances and Challenges of Big Data Computing Platforms

Liqiang WangAssociate Professor

Department of Computer ScienceUCF

Big Data: Batch Processing &

Distributed Data StoreHadoop/Spark;

HBase/Cassandra

BI ReportingOLAP &

Dataware house

Business Objects, SAS, Informatica, Cognos other

SQL Reporting Tools

Interactive Business Intelligence &

In-memory RDBMSTableau, HANA

THE EVOLUTION OF BUSINESS INTELLIGENCE

1990’s

2000’s

2010’s

Big Data:More Intelligent and Real Time

Ongoing

3Source: Dion Hinchcliffe, “The enterprise opportunity of Big Data: Closing the ‘clue gap,'”

Essential Training at UCF (Pending)

Fundamentals of Cyberinfrastructure

Programming Models and Languages

Data Exploration and

Visualization

Big Data Computing

Data Analytics Case Studies

Adaptive Learning

Virtualization-based Lab Training Sustainability

Effectiveness

Training Concepts

Enhancement Methods Training Aims

Scalability

Data Mining & machine Learning

Hadoop Architecture Hadoop consists of Hadoop 1.0: HDFS and MapReduce Hadoop 2.0: HDFS, Yarn, and MapReduce

Hadoop 1 vs 2

Hadoop1 Hadoop 2

Components HDFS, MapReduce HDFS, Yarn,MapReduce, other module

Scalability Less More

Name Node Single Multiple

Resource Management Slot Container

Job Type MapReduce MapReduce, MPI, Spark

Reliability Worse Better

JVM re-use Yes No

Yarn & HDFS

combinecombine combine combine

ba 1 2 c 9 a c5 2 b c7 8

partition partition partition partition

mapmap map map

k1 k2 k3 k4 k5 k6v1 v2 v3 v4 v5 v6

ba 1 2 c c3 6 a c5 2 b c7 8

Shuffle and Sort: aggregate values by keys

reduce

a 1 5 b 2 7 c 2 8 9

r1 s1 r2 s2 r3 s3

Why Use MapReduce Instead of Classical Supercomputing?

ComparisonMPI Hadoop/Spark

Node Communication Supports more frequent node communication (tightly coupled)

Usually nodes do not communicate directly (loosely coupled)

Disk I/O Usually load data once Every nodes read/write its own data

Fault tolerance No Yes

Auto-Scaling No Yes

ApplicationsCPU-Intensive Scientific Computing

Data-Intensive Analytics

Challenging ResearchIssues

Scalability Resilience (including

checkpointing) Energy-efficiency

Performance Tuning Integration with Edge

Computing & IoT 10

Hadoop is Slow in Machine Learning!

Logistic regression in Hadoop and Spark

Spark vs Hadoop

Spark key features Apache Spark Hadoop MapReduce

Speed Ten to hundred times faster than MapReduce

Slower

Analytics Supports streaming, machine learning, complex analytics, etc

Simple Map and Reduce tasks

Suitable for Real-time streaming Batch processing

Coding Lesser lines of code More lines of code

Processing location In-memory Local disk

Spark is Based on Hadoop

COSC 4010/5010 Introduction to HPC 13

Why is Machine Learning Booming Now?

Big Data Big Computing Power

Evolution of Machine Learning

Distributed Machine Learning

Examples: Tensorflow Simple structure Based on MPI

COSC 4010/5010 Introduction to HPC 16

Thank you !

Advances and Challenges of Big Data Computing Platforms€¦ · HBase/Cassandra. BI Reporting. OLAP...

Documents

Dataware Housing and Mining 16-Mar-06

HTrace: Tracing in HBase and HDFS (HBase Meetup)

Faster HBase queries - events.static.linuxfound.org · Faster HBase queries Introducing hindex – Secondary indexes for HBase Rajeshbabu Chintaguntla rajeshbabu@apache.org ApacheCon

Introduction to Hbase. Agenda What is Hbase About RDBMS Overview of Hbase Why Hbase instead of RDBMS Architecture of Hbase Hbase interface

HBase Backups

Flume HBase

Hbase Operations

HBase + Hue - LA HBase User Group

HBase User Group #9: HBase and HDFS

Hbase hivepig

Alternity - Dataware

BIG DATA HADOOP FULLlBulk Loading in HBase lCreate, Insert, Read Tables in HBase lHBase Admin APIs l HBase Security lHBase vs Hive lBackup & Restore in HBase lApache HBase External

Secure HBase

HBase Presentation

Building a LINQ Provider for HBase MapReduce · 2019-04-30 · HBase/ Hadoop Building a LINQ Provider for HBase MapReduce Building a LINQ Provider for HBase MapReduce Summary HBase

Formal Powerpoint presentationmitu.co.in/wp-content/uploads/2017/10/Apache-HBase.pdf · Hbase Shell • HBase contains a shell using which you can communicate with HBase. • Hbase

Introduction Big Data - BCIT School of Businessfaculty.bcitbusiness.ca/kevinw/4800/Lecture_Slides/...Hadoop/Spark; HBase/Cassandra BI Reporting OLAP & Dataware house Business Objects,

hadoop developer - SevenMentor · 2021. 2. 17. · D. HBASE: Introduction to HBASE Basic Configurations of HBASE Fundamentals of HBase What is NoSQL? HBase Data Model Table and Row

HBase Tracing

Hbase in action - Chapter 09: Deploying HBase