52
INSTITUTE OF COMPUTING TECHNOLOGY BigDataBench: Benchmarking Big Data Systems Lei Wang Institute of Computing Technology, CAS 2013-10-31 1 http://prof.ict.ac.cn/BigDataBench/

BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

INS

TIT

UTE

OF

CO

MP

UTIN

G T

EC

HN

OL

OG

Y

BigDataBench: Benchmarking Big Data Systems

Lei Wang

Institute of Computing Technology, CAS

2013-10-31

1

http://prof.ict.ac.cn/BigDataBench/

Page 2: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Big chance in big data era

2

It is an innovation chance, but how to do it?

Measuring big data architecture, systems

and data management quantitatively

Page 3: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

What is BigDataBench

An open source project on big data benchmarking:

• http://prof.ict.ac.cn/BigDataBench/

• Six raw real data sets – Synthetics data can scale up to PB

• Six application scenarios

– Micro-benchmarks, Search engine, Social network and E-Commerce

• A full spectrums of system software stacks

– Hadoop, MPI, Spark, Hive and Impala……..

3/

Page 4: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Who can use BigDataBench

4/

BigDataBench

Architecture design the innovative Processor

the innovative Memory

the innovative Network

…….....

System design the innovative OS for big data

the innovative File system for big data

…………………………..

Data management

design …………..

Performance

optimization micro-architecture

characterization

Distributed system

optimization scheduling policy

program model

Page 5: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Outline

5/

Benchmarking Methodology and Decision 1

2

Case Study

4 How to use

5

Scalable Data Generation Tool

3

Page 6: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Methodology

6/

Representative

Real Data Sets

Diverse and

Important

Workloads

Data

Sources Text data

Graph data

Table data

Extended …

Data Types Structured

Semi-structured

Unstructured

Big Data Sets

Preserving

4V

BigDataBench

Investigate

Typical

Application

Domains

Synthetic data generation tool

preserving data characteristics

Application

Types Offline analytics

Realtime analytics

Online services

Basic & Important

Operations and

Algorithms

Extended…

Represent

Software Stack

Extended…

Big Data

Workloads

Page 7: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Typical Application Domains

7/

Search Engine, Social Network and Electronic Commerce hold 80% page

views of all the Internet service.

Page 8: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Data Sets Chosen

Data type Pay equal attention to structured, semi- structured and

unstructured data

Data source Important data source in the domain application

Application domain

Search engine, Social network and E-commence

8/

Page 9: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Representative Data sets

9/

Application Domain Data Type Data Source Data set

Search Engine

unstructured data Text data Wikipedia Entries

Graph data Google Web Graph

Semi-structured

data

Table data ProfSearch Person

Resume

E-commence

Semi-structured

data

Text data

Amazon Movie

Reviews

structured data Table data ABC Transaction

Data

Social Network unstructured data Graph data Facebook Social

Graph

Page 10: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Workloads Chosen

10/

• Covering workloads in diverse and representative application scenarios • Search Engine, E-commerce, Social Network

• Paying equal attentions to different applications: • online service, real-time data analysis, offline data analysis

• Including different data sources • Text data, Graph data, Table data

• Covering the representative software stack • Data store system, Data management system, Programming framework

Page 11: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Chosen Workloads Summary

11/

Application Scenarios

Micro-Benchmark

Operations & Algorithm

Basic Operations

Basic Cloud OLTP

Basic Relational Query

Search Engine

E-commerce

Social Network

Page 12: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Basic Operations

12/

Operations &

Algorithm

Data Type Data

Source

Software

stack

Application

type

Sort Unstructured Text MapReduce,

Spark, MPI

Offline

Analytics

Grep Unstructured Text MapReduce,

Spark, MPI

Offline

Analytics

WordCount Unstructured Text MapReduce,

Spark, MPI

Offline

Analytics

BFS Unstructured Graph MapReduce,

Spark, MPI

Offline

Analytics

Page 13: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Basic Cloud OLTP

13/

Operations & Algorithm Data Type Data

Source

Software

stack

Applicatio

n type

Read Semi-structured Table Hbase,

Cassandra

MongoDB,

MySQL

Online

Service

Write Semi-structured Table Hbase,

Cassandra

MongoDB,

MySQL

Online

Services

Scan Semi-structured Table Hbase,

Cassandra

MongoDB,

MySQL

Online

Services

Page 14: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Basic Relational Query

14/

Operations & Algorithm Data Type Data

Source

Software

stack

Application

type

Select Query Structured Table Impala,

Shark,

MySQL, Hive

Realtime

Analytics

Aggregate Query Structured Table Impala,

Shark,

MySQL, Hive

Realtime

Analytics

Join Query Structured Table Impala,

Shark,

MySQL, Hive

Realtime

Analytics

Page 15: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Operations & Algorithms

in Search Engine

15/

Operations & Algorithm Data Type Data

Source

Software

stack

Applicatio

n type

Nutch Server Structured Table Hadoop Online

Services

PageRank Unstructured Graph Hadoop, MPI,

Spark

Offline

Analytics

Index Unstructured Text Hadoop, MPI,

Spark

Offline

Analytics

Page 16: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Operations & Algorithms

in Social Network

16/

Operations & Algorithm Data Type Data

Source

Software

stack

Applicatio

n type

Olio Server Structured Table MySQL Online

Service

Kmeans Unstructured Graph Hadoop, MPI,

Spark

Offline

Analytics

Connected Components Unstructured Graph Hadoop, MPI,

Spark

Offline

Analytics

Page 17: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Operations & Algorithms

in E-commerce

17/

Operations & Algorithm Data Type Data

Source

Software

stack

Applicatio

n type

Rubis Server Structured Table MySQL Online

Service

Collaborative Filtering Unstructured Text Hadoop, MPI,

Spark

Offline

Analytics

Naive Bayes Unstructrued Text Hadoop, MPI,

Spark

Offline

Analytics

Page 18: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Outline

18/

Benchmarking Methodology and Decision 1

2

How to Use BigDataBench

4 Case Study

5

Scalable Data Generation Tool

3

Page 19: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Data Generation Tools

Seed Data Source

Text, Graph and Table

• Six real raw data

Synthetics Data Scale

From GB to PB

Features of the synthetics data

To preserve the characteristics of real-world data

19/

Page 20: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Text generator Use latent dirichlet allocation to generate text

corpus.

topic model & generative probabilistic model

David M Blei, et al., “Latent

dirichlet allocation,” the

Journal of machine Learning

research, vol. 3, pp. 993–1022,

2003.

Page 21: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Graph generator

Use the Stochastic Kronecker Graph model (Jure Leskovec,et al.) to generate graph Application-specific: obtained from real represented data set of

specific applications.

Page 22: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Table generator

Related structured table Parallel Data Generation Framework (Tilmann

Rabl, et al.)

Page 23: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Outline

23/

Benchmarking Methodology and Decision 1

2

Case Study

4 How to Use BigDataBench

5

Scalable Data Generation Tool

3

Page 24: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Case study of BigDataBench

24/

BigDataBench

Evaluating Different

Platforms

Performance evaluation

Characterizing Workloads

Performance

diagnosis Evaluating Energy

Efficiency

USTC

ICT, CAS

SIAT, CAS

CNCERT XJTU

SJTU

Page 25: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Evaluating Different Platforms

Evaluating the different system platforms performances in big data computing • University of Science and Technology of China

25/

"The Implications from Benchmarking Three Different Data Center Platforms“ First BPOE in

conjunction with IEEE Big Data 2013

Page 26: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Big Data Workload Characterization

26/

"The Implications from Benchmarking Three Different Data Center Platforms“ First BPOE in

conjunction with IEEE Big Data 2013

Analyzing the redundancy among big data benchmarks • Shenzhen Institutes of Advanced Technology, CAS

Page 27: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Performance diagnosis

27/

An ensemble MIC(Maximum Information Criterion)-based approach to pinpoint the culprits of performance problems in the big data platform. • XI’AN JiaoTong University

"An Ensemble MIC-based Approach for Performance Diagnosis in Big Data Platform “ First

BPOE in conjunction with IEEE Big Data 2013

Page 28: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Evaluating energy efficiency

28/

New metrics that measures the power usage effectiveness of IT equipment and data center systems • National Computer network Emergency Response Technical Team Coordination

Center of China

"AxPUE: Application Level Metrics for Power Usage Effectiveness in Data Centers” First BPOE

in conjunction with IEEE Big Data 2013

Page 29: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Evaluating Virtualization Systems

29/

A new network socket library in virtualization scenario which utilizes shared memory for data transmission. • Shanghai JiaoTong University

"Virtualization I/O Optimization Based on Shared Memory” First BPOE in conjunction with IEEE

Big Data 2013

Page 30: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Big Data Workload Characterization

30

BigDataBench: a Big Data Benchmark Suite from Internet Services, Lei Wang etc. ICT Technical Report

Big data workloads have very low floating point operation intensities (on the

average 0.009), which is two order of magnitude lower than the theory number

of state of practice CPU

Page 31: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Big Data Workload Characterization

31

BigDataBench: a Big Data Benchmark Suite from Web Search Engines, Wanling Gao etc.

ASBD 2013 in conjunction with The 40th ISCA

Architecture researches using only simple applications and limited data sets are not feasible for big data scenarios.

Page 32: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

One International Benchmark Workshop

http://prof.ict.ac.cn/bpoe

BigDataBench

32

HPCA 2013 a full –day tutorial

http://prof.ict.ac.cn/HPCA/

Two Invited Talks WBDB (workshop on big data benchmarking)

Page 33: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BigDataBench Website

33/

We expect more users join us and we will do our best for you

English Website

http://prof.ict.ac.cn/BigDataBench/

Chinese Website

http://prof.ict.ac.cn/BigDataBench/zh/

Highlights

Benchmark introduction

Benchmark download

Publications & News

User……

Page 34: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Outline

34/

Benchmarking Methodology and Decision 1

2

Case Study

4 How to Use BigDataBench

5

Scalable Data Generation Tool

3

Page 35: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BigDataBench Class

For Architecture

19 of 19 workloads

For OS

19 of 19 workloads

For Runtime environment (Hadoop)

9 of 19 workloads • Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components,

Collaborative Filtering and Naive Bayes.

For Data management

6 of 19 workloads • Read, Write, Scan, Select Query, Aggregate Query, Join Query

35/

Page 36: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BigDataBench Class: data source

Text related

6 of 19 workloads • Sort, Grep, WordCount, Index, Collaborative Filtering and Naive Bayes

Graph related

4 of 19 workloads • BFS, PageRank, Kmeans and Connected Components

Table related

9 of 19 workloads • Read, Write, Scan, Select Query, Aggregate Query, Join Query, Nutch Server,

Olio Server and Rubis Server

36/

Page 37: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BigDataBench Class: application type

Online Services

6 of 19 workloads • Read, Write, Scan, Nutch server, Olio Server and Rubis server

Offline Analytics 10 of 19 workloads

• Sort, Grep, WordCount, BFS, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.

Realtime Analytics 3 of 19 workloads

• Select Query, Aggregate Query and Join Query

37/

Page 38: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BigDataBench Class: application domains

Search engine related: Basic Operations + Search Engine

7 of 19 workloads • Sort, Grep, WordCount, BFS, PageRank, Index and Nutch Server

Social network related: Basic Cloud OLTP+ Basic Relational Query+ Social

Network

9 of 19 workloads • Read, Write, Scan, Select Query, Aggregate Query, Join Query, Olio Server, Kmeans and

Connected Components

E-commerce related: Basic Cloud OLTP+ Basic Relational Query+ Social

Network

9 of 19 workloads

• Read, Write, Scan, Select Query, Aggregate Query, Join Query, Rubis server,

Collaborative Filtering and Naive Bayes

38/

Page 39: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Usage Examples

Designing Experiments

What will I do ?

Choosing workloads and data sets

Workloads chosen are determined by your need

Date sets chosen are determined by your platform scale and workloads requirements

Experiments configurations

Doing the experiments & Analyzing the results

39/

Page 40: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

One Example

Motivation

Assuming that I have five Xeon nodes cluster and want to evaluate the performance of one optimized version Hadoop

40/

Native Hadoop

Optimized Hadoop

How to

evaluate

performances

under different

data scale?

Page 41: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Step 1: Designing Experiments

Test bed

Choosing the five nodes cluster as the platform

Set up

Set up native Hadoop and optimized Hadoop

Metric

DPS (Data processing per second)

• (input data size)/(wall time)

Data Scale

1GB-500GB

41/

Page 42: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Step 2-1: Choosing workloads

Map Reduce related workloads

Sort, Grep, WordCount, PageRank, Index, Kmeans, Connected Components, Collaborative Filtering and Naive Bayes.

9 (of 19) workloads in the BigDataBench

42/

Page 43: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Step 2-2: Choosing data sets

Text and Graph data Wikipedia Entries: Sort, Grep, WordCount, Index, Naive Bayes.

Google Web Graph: PageRank, Connected Components, Kmeans

Amazon Movie Reviews: Collaborative Filtering

Data scale

Vary from 1GB to 500GB

Generating Data

Using data generation tool

43/

Page 44: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Step 3: Experiments configurations

Hadoop configuration One master node, four slave nodes

Map slot, Reduce slot and Java heap….

http://hadoop.apache.org/

Monitor

Perf: architecture level

linux/proc: OS level

44/

Page 45: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Step 4: Doing the experiments

Running the workloads one by one

Clearing the runtime environment after each experiment

multi-times

Analysis

……………………………………

45/

Visiting http://prof.ict.ac.cn/BigDataBench/ for more…

Page 46: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Quick Tutorial

Running Naïve Bayes as the example

Generating Text Data

• Analyzing the Seed Data

• Generating Data

Running the workloads

46/

Page 47: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Generating Text Data

Analyzing the Seed Data

47/

Page 48: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Generating Big Data

Generating Data

48/

• An example

- $HADOP_HOME/bin/hadoop jar TextProduce.jar test file-100G 20

75000000 5

Page 49: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Running programs

Training

./run-train.sh <in-dir> <out-dir> • <in-dir>: the data directory which is used to train

• <out-dir>: the training model output directory

Classification

./run-bayes.sh <in-dir> <out-dir> • <in-dir>: the training model directory

• <out-dir>: the input data directory

An example

./run-train.sh file-1G file-1G-Model

./run-bayes.sh file-1G-Model file-100G

49/

Page 50: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

THANKS

50/

Visiting http://prof.ict.ac.cn/BigDataBench/ for more…

Page 51: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

BACKUP

51/

Page 52: BigDataBench: Benchmarking Big Data Systemsprof.ict.ac.cn/BPOE-HPC-China/wp-content/uploads/... · Institute of Computing Technology, CAS 2013-10-31 1 ... Nutch Server Structured

BPOE HPCChina 2013

Metrics

User observation metrics

For Cloud OLTP online services

• The number of processed operations per second (OPS)

For other online services

• The number of processed requests per second (RPS)

For Analytics applications

• The data processed per second (DPS)

52/