BigDataBench First Part-1213 - Big Data and AI Benchmark...

Preview:

Citation preview

BigDataBench Tutorial 

Jianfeng Zhan, Zhen Jia, and Gang Lu

INS

TITUT

http://prof.ict.ac.cn/BigDataBench

TE OF C

OM

PU

TING

Institute of Computing Technology, Chinese  Academy of Sciences  and University of Chinese 

Academy of Sciences

G TEC

HN

OLO

GY

BigDataBench TutorialMICRO 2014  Cambridge, UK

AcknowledgementsAcknowledgements 

BigDataBench contributorsLei Wang, Chunjie Luo, Zhen Jia, Wanling Gao, Dr. g, j , , g ,Rui Han, Dr. Yuqing Zhu, Qiang Yang, XinlongLin, Jingwei Li, Wei  Zhu, g ,Shujie Zhang, Dr. Chuliang Weng     Dr Yongqiang HeDr. Yongqiang HeXiaona Li      Bizhu QiuKent Zhan, Zijian Ming

BigDataBench MICRO  2014

, j g

Acknowledgements (Cont’)Acknowledgements (Cont ) 

Great thanks for  Prof. Jason Mars to invite us to give this tutorial at Micro’14. g

BigDataBench MICRO  2014

BigDataBench Tutorial Program (1)BigDataBench Tutorial Program (1)

9:00‐9:40 Jianfeng ZhanWhat is BigDataBench?  gBigDataBench benchmarking methodology

9 40 10 00 G L9:40‐10:00   Gang LuBigDataBench data sets and workloads

10:00‐10:30 Coffee break 

BigDataBench MICRO  2014

BigDataBench Tutorial Program (2)BigDataBench Tutorial Program (2)

10:30‐11:00  Gang LuHow to use BigDataBench data sets  and gworkloads?How to generate Large‐scale data sets?How to generate Large scale data sets?Multi‐tenancy version of BigDataBench 

11:00‐12:00  Zhen Jia BigDataBench subsettingg gHow to use the simulator versions of BigDataBench ?

BigDataBench MICRO  2014

BigDataBench ? 

Handbook of BigDataBenchHandbook of BigDataBench 

Please feel free to download and distribute this handbook.

http://prof.ict.ac.cn/BigDataBench_micro_14/Draft versionDraft version93 pages

BigDataBench MICRO  2014

BigDataBench handbook: 1st partBigDataBench handbook: 1st part

Section 1 Summary of BigDataBench 3.1Section 2 Benchmarking methodologySection 2 Benchmarking methodologySection 3 BigDataBench specificationSection 4 BigDataBench implementationsSection 5 BigDataBench subsettingSection 5 BigDataBench subsetting

BigDataBench MICRO  2014

BigDataBench handbook: 2th partBigDataBench handbook: 2th part

Section 6 BigDataBench simulator versionSection 7 Multi‐tenancy version ofSection 7 Multi tenancy version of BigDataBenchS i 8 U lSection 8 User manual Section 9 BigDataBench usersgSection 10 Q&A

BigDataBench MICRO  2014

OutlineOutline

What is BigDataBench?  

BigDataBench benchmarking methodology

BigDataBench MICRO  2014

Why Big Data Benchmarking?Why Big Data Benchmarking?

Measuring big data systems and architectures quantitatively

BigDataBench MICRO  2014

quantitatively

What is BigDataBench?What is BigDataBench? An open source big data benchmarking project p g g p j• http://prof.ict.ac.cn/BigDataBench

• Search Google using “BigDataBench”

BigDataBench MICRO  2014

What is BigDataBench (cont’)?g ( )BDGS(Big Data Generator Suite) for scalable data

Facebook Social Network

ImageNet

Wikipedia  Entries

E‐commerce  Transaction

English broadcasting audio

Amazon Movie Reviews

ProfSearch Resumes

DVD Input Streams

Google Web Graph

g p

SoGou Data

Image scene

MNIST

Genome sequence data Assembly of the human genome

ImpalaNoSql

14  Real‐world Data Sets

Shark

Impala

Search Engine

H d RDMA

SocialNetwork

E-commerce

MPI

Software Stacks33 Workloads

DataMPI

Hadoop RDMAMultimedia Bioinformatics

BigDataBench MICRO  2014

Software Stacks33 Workloads

BigDataBench evolutionBigDataBench evolution 

5 application domains: 14 data sets and 33 workloadsSame specifications: diverse implementationsMulti‐tenancy version

h b d l

BigDataBench 3.1

Multidisciplinary effort

BigDataBench subset and simulator version  2014.12

BigDataBench 3.0

Typical Internet service domains

2014.4p y

32 workloads: diverse implementations 

BigDataBench 2.0

2013.12

Typical Internet service domainsAn architectural perspective19 workloads & data generation tools

CloudRank 1.0DCBench 1.0BigDataBench 1.02013.7

Search engine6 workloads 

11 data analytics workloads

Mixed data analytics workloads 

BigDataBench MICRO  2014

Why BigDataBench?Why BigDataBench?Specification

Application  domains

Workload Types

Workloads

Scalable data sets (from real 

Multipleimpleme

Multitenancy

Subsets

Simulatorca o do a s ypes oads se s ( o ea

data)p e e

ntationsa cy e s o

version

BigDataBench Y Five Four 33 8 Y Y Y Y

BigBench Y One Three 10 3 N N N NCloudSuite N N/A Two 8 3 N N N Y

HiBench N N/A Two 10 3 N N N N

CALDA Y N/A One 5 1 Y N N N/YCSB Y N/A One 6 N/A Y N N NLinkBench Y One One 10 1 Y N N N

AMPBenchmarks

N N/A One 4 1 Y N N N

BigDataBench MICRO  2014

Observations from CloudSuiteObservations from CloudSuiteThe characteristic of big data workloadsThe characteristic of  big data workloads

High L1I miss • Frontend inefficiencies 

Low ILPC i ffi i i• Core inefficiencies 

LLC  is good but overprovision• Data access inefficiencies• Data‐access inefficiencies

Less off‐chip bandwidth and MLP• Bandwidth inefficienciesBandwidth inefficiencies

Clearing the Clouds, ASPLOS 2012 Best paper

BigDataBench MICRO  2014

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O time

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O timeLow CPU utilization

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

Low CPU utilization relatively and lots of I/O time

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

System Behaviors Of BigDataBenchSystem Behaviors Of BigDataBench CPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O timeLow CPU utilization

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

Low CPU utilization relatively and lots of I/O timeM di CPU tili ti

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

Medium CPU utilization and I/O

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

Workloads Classification Of BigDataBench

Finding from system behaviorsSystem behaviors vary across different workloads y yWorkloads can be divided into 3 categories:

BigDataBench MICRO  2014

IPC of BigDataBench vs. other 

2

benchmarks

1

1.5

IPC

0

0.5

rep(

7)an

s(1)

ank(

1)un

t(1)

yes(

1)Ba

yes

mea

nsR

ank

d(10

)ce

(9)

ry(9

) un

t(8)

ect(4

)By

(3)

rep(

1)-G

rep

ry3(

9)By

(7)

C-D

S-…

ry8(

1)or

t(1)

Cou

ntM

-Sor

tgD

ata

PC-C

Suite

HPC

CR

SEC

PEC

fpEC

int

H-G

rS-

Kmea

S-Pa

geR

aH

-Wor

dCou

H-B

ay M-B

M-K

mM

-Pag

eH

-Rea

H-D

iffer

enI-S

elec

tQue

S-W

ordC

ouS-

Proj

eS-

Ord

erS-

Gr

M-

-TPC

-DS-

quer

I-Ord

erS-

TPC

-TPC

-DS-

quer

S-So

M-W

ordC M

AVG

_S_B

ig TPAV

G_C

loud

Avg_

HAv

g_PA

RAV

G_S

PAV

G_S

PE

H- S-

The average IPC of the big data workloads are larger than CloudSuite, SPECFP and SPECINT,  same as PARSEC and slightly smaller than HPCCPARSEC and slightly smaller than HPCC

The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite

BigDataBench MICRO  2014

Some workloads have high IPC (M_Kmeans, S‐TPC‐DS‐Query8)

Instructions Mix of BigDataBench vs. other benchmarks

For big data workloads:b hMore branch instructions

The percentage is 20% (1.5 times larger than others), except for TPC‐CThe ratio of integer instructions to FP instructions is very high

BigDataBench MICRO  2014

The average is 73

Cache Behaviors of BigDataBench th b h kvs. other benchmarks 

CPU‐intensive I/O‐intensive hybrid

L1I MPKIL1I MPKILarger than traditional benchmarks, but lower than that of CloudSuite (12 Vs. 31)

Different among big data workloadsDifferent among big data workloadsCPU‐intensive(8), I/O intensive(22), and hybrid workloads(9)

MPI workloads have less instruction cache miss 

BigDataBench MICRO  2014

Only 3.4 on the average 

Cache Behaviors of BigDataBenchCache Behaviors of BigDataBench

L2 Cache:The IO‐intensive workloads undergo more L2 MPKI

L3 Cache:The average L3 MPKI of the big data workloads is smallerThe average L3 MPKI of the big data workloads is smaller than all of the other workloads

The underlying software stacks impact dataThe underlying software stacks impact data locality

MPI workloads have better data locality and less cache misses

BigDataBench MICRO  2014

Our observation from BigDataBenchOur observation from BigDataBench

Unique characteristicMore branch instructions &  Higher ratio of integer to FP instructions

Different behaviors between Big Data workloadsDisparity of ILP and memory access behaviorsDisparity  of ILP and memory access behaviors• Several workloads can achieve higher IPC • Several workloads can achieve higher Off‐chip bandwidth 

CloudSuite is a subclass of Big Data

High front‐end stall is not the unique characteristics of big data workloads 

Related with software stacks

BigDataBench MICRO  2014

BigDataBench PublicationsBigDataBench Publicationsi h i h k S i f S i 20 hBigDataBench: a Big Data Benchmark Suite from Internet Services.  20th IEEE 

International Symposium On High Performance Computer Architecture (HPCA‐2014).Characterizing and Subsetting Big Data Workloads. Zhen Jia, Jianfeng Zhan, Wang Lei, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, and Jingwei Li. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014.Characterizing data analysis workloads in data centers.  2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award)BigOP: generating comprehensive big data workloads as a benchmarking framework.  19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014)

BigDataBench MICRO  2014

BigDataBench usersBigDataBench users 

More than 20 groups have published papers  using BigDataBenchg g

h // f i /Bi D B h/ /http://prof.ict.ac.cn/BigDataBench/users/

BigDataBench MICRO  2014

OutlineOutline

What is BigDataBench?  

BigDataBench benchmarking methodology

BigDataBench MICRO  2014

Five StepsFive Steps

Multi‐tenancy

Big data 

Diverse implementation

yor subset

Investigate 

Typical workloads and data set

benchmark specification

important application domain

BigDataBench MICRO  2014

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Multi‐tenancy 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation 

version

Mix with different percentages

pp

Domain … Data operations & 

workload patterns

specification …

tools

Reduce benchmarking cost

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench

subset

benchmarking cost

BigDataBench MICRO  2014

Investigate Application DomainsInvestigate Application DomainsWh t li tiInternet Services What application domains should we pay attention to?

Internet Services 

BigDataBench MICRO  2014

Internet ServicesInternet Services

Search Engine Social Network

Taking up 80% of internet services 

5%15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

according to page views and daily visitors

40%

15%

5%

25%

http://www.alexa.com/topsites/global;0Top 20 websites

BigDataBench MICRO  2014

Multimedia DataMultimedia Data

BigDataBench MICRO  2014

The Explosive Growth of Multimedia Data

new 

VIDEOS on YouTube 

new 

PHOTOS on FLICKR every hours MUSIC streaming every minute minute on PANDORA every minute

VIDEO feeds fromdata growth 

arei t VOICE llVIDEO feeds from surveillance cameras

are IMAGES, VIDEOS, documents, …

minutes VOICE calls on Skype every minute

htt // ld l / t t/ l d /2014/11/ h ti bi d t DKB 2 df

BigDataBench MICRO  2014

http://www.oldcolony.us/wp‐content/uploads/2014/11/whatisbigdata‐DKB‐v2.pdf

The Explosive Growth of Human Genome Data

htt // h / it /d f lt/fil / i d bi d t 1 0 df

BigDataBench MICRO  2014

http://www.osehra.org/sites/default/files/genomics_and_big_data_1_0.pdf

Application Domain We ChooseApplication Domain We Choose

Internet ServicesServices

Search EngineSocial NetworkSocial NetworkE‐commerceMultimedia

Emerging 

MultimediaBioinformatics

but Important

BigDataBench MICRO  2014

BigDataBench MethodologyBigDataBench Methodology

Application 

Domain 1

Application 

Data models of different 

types & semantics

Domain … Data operations & 

workload patterns

Application 

Domain N

BigDataBench MICRO  2014

Success story: Relational model of data

E. F. Codd, A relational Model of Data for Large shared data banks Communication ofLarge shared data banks. Communication of ACM, vol 13. no.6, 1970.Set concept : general mathematical meaning

General representation of datapBasis of relational algebra (theoretical foundation of database)of database)5 basic operations

S l P j P d U i Diff• Select, Project, Product, Union, Difference 

BigDataBench MICRO  2014

Success story: parallel computing

By a multidisciplinary group of well‐known researchers

e.g.:Jim Gray,Michael JordanDavid A. Patterson

Operations & PatternsAbstracted from 13representative parallelcomputation patternsParallel computation  inherent demand for big data 

BigDataBench MICRO  2014processing (volume & complexity)

Primitive Operations & Patterns in Big DataPrimitive Operations & Patterns in Big Data 

3 Categories of Operations11 basic operations

3 P i P tt3 Processing Patterns

BigOP: generating comprehensive big data workloads as a benchmarking framework. 

BigDataBench MICRO  2014

g g g p g gDASFAA 2014

Some Examples (not exhausted)Some Examples (not exhausted)

BigOP: generating comprehensive big data workloads as a benchmarking framework.  g g g p g g19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)

BigDataBench MICRO  2014

Image SearchImage  Search

Perceptual Hash  Algorithm 

Input Image

samping Basic information

of image

SIFT Imagefeatures

Hash Imagefingerprint Database

Set Ope

SimilarSort

ration

SimilarimagesOutput

Sort

BigDataBench MICRO  2014

42

Feature Extraction SIFTFeature Extraction‐‐SIFT

GaussianFilter

InputImage Convolution

Image ScaleSpace

Sampling ImagePyramid

MatrixSubtraction DOG

Image

Sort

Key PointOf Image

GaussianWindow

CountFeatureVectorsOutput Sampling

BigDataBench MICRO  2014 43

Workload & Data Set in BigDataBench

•Text•Graph•Table•Multimedia

• Structured• Semi‐Structured•Unstructured

Data Model Semantics

Workload Patterns

Data Operations

•Different combination of units of computation  

•Unit of computation

BigDataBench MICRO  2014

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark pp

Domain … Data operations & 

workload patterns

specification …

Application 

Domain N

Benchmark 

specification N

BigDataBench MICRO  2014

Data management’s traditionData management s tradition 

Specification First. Functions of abstraction are units ofFunctions of abstraction are units of computation that appear frequently in the application domain being benchmarkedapplication domain being benchmarked. They are expressed in a generic form that is independent of the underlying system implementation.implementation.

BigDataBench MICRO  2014

TPC C examplesTPC‐C examples 

BigDataBench MICRO  2014

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation pp

Domain … Data operations & 

workload patterns

specification …

tools

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench MICRO  2014

Real World Data setsReal‐World Data sets

14 real‐world data sets

BigDataBench MICRO  2014

Big Data Generation Tool BDGSBig Data Generation Tool‐‐BDGS

Provide scalable data set extracted from real‐world data sets

Text

BDGSbased on real data

TableGraph

BigDataBench MICRO  2014

Naïve Text generatorNaïve Text generator

select word randomly

big

hit t

system

miningdata

evaluatemachine

select word randomlyarchitectureCPU

benchmarkingmemory

learning

cpu

wordsfollowing multinomial distribution

documnet

Only modeling on word level;Words are selected according to the same gdistribution

BigDataBench MICRO  201451/

Improved Text generatorImproved Text generator

big CPUevaluatemachine

topic2 CPUselect word randomly

architecture

CPU

benchmarking

miningdata

t

topic1

topic3

select topic randomlyCPU

words

systemmemorylearning

documnet

topic3

topicsfollowing multinomial distribution under topic2following multinomial distribution

Modeling on topic and word levelWords are drew from distribution under particular topicTopics are drew from same distribution, as a result, each document has same topic proportion

BigDataBench MICRO  201452/

Optimized Text GeneratorOptimized Text Generator

big systemdata

evaluatemachine

topic1topic2

select word randomly

architecture

benchmarking

miningdata

CPU

memory

l i

CPU

topic3

select  topic randomlyselect topic 

proportion randomly

words

following multinomial distribution under topic1

learning

documnettopics

following multinomial distribution

Topic distribution parameters

following dirichlet distribution

Modeling on topic and word levelWords are drew from distribution under particular topicTopics are selected from different distribution with parameters following a dirichlet distribution

BigDataBench MICRO  201453/

Workloads With Diverse Implementations

MapReduce

Software Stack

DataMPIMPI

Spark

BigDataBench MICRO  2014

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Multi‐tenancy 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation 

version

Mix with different percentages

pp

Domain … Data operations & 

workload patterns

specification …

tools

Reduce benchmarking cost

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench

subset

benchmarking cost

BigDataBench MICRO  2014

Multi‐tenancy version of BigDataBench

Scenarios of multiple tenants running heterogeneous applications in cloud datacenters

Latency‐critical online servicesLatency‐insensitive offline batch applications

Mining real‐world Workload traces Mixed workloads

Benchmarking scenarios

Workload traces (Google and Facebook) Workload matching 

using Parametric workload 

Mixed workloads in public clouds

Profiling Real‐world Workload 

t

Machine learning techniques

generation tool Data analytical 

workloads in private clouds

BigDataBench MICRO  2014

traces

BigDataBench SubsetBigDataBench Subset

MotivationExpensive to run all the benchmarks for system p yand architecture researches

• multiplied by different implementationsmultiplied by different implementations • BigDataBench 3.0 provides about 77 workloads

BigDataBench MICRO  2014

Subsetting MethodologySubsetting Methodology

• Identify a comprehensive set of workload characteristics from a specific perspective

• Eliminate the correlation data in those metricsEliminate the correlation data in those metrics• Map the high dimension metrics to a low dimension

• Use the clustering method to classifyCh t ti kl d f h t• Choose representative workloads from each category

BigDataBench MICRO  2014

BigDataBench MICRO  2014

Recommended