59
BigDataBench Tutorial Jianfeng Zhan, Zhen Jia, and Gang Lu INSTITUT http://prof.ict.ac.cn/BigDataBench TE OF COMPUTING Institute of Computing Technology, Chinese Academy of Sciences and University of Chinese Academy of Sciences G TECHNOLOGY BigDataBench Tutorial MICRO 2014 Cambridge, UK

BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

  • Upload
    others

  • View
    16

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench Tutorial 

Jianfeng Zhan, Zhen Jia, and Gang Lu

INS

TITUT

http://prof.ict.ac.cn/BigDataBench

TE OF C

OM

PU

TING

Institute of Computing Technology, Chinese  Academy of Sciences  and University of Chinese 

Academy of Sciences

G TEC

HN

OLO

GY

BigDataBench TutorialMICRO 2014  Cambridge, UK

Page 2: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

AcknowledgementsAcknowledgements 

BigDataBench contributorsLei Wang, Chunjie Luo, Zhen Jia, Wanling Gao, Dr. g, j , , g ,Rui Han, Dr. Yuqing Zhu, Qiang Yang, XinlongLin, Jingwei Li, Wei  Zhu, g ,Shujie Zhang, Dr. Chuliang Weng     Dr Yongqiang HeDr. Yongqiang HeXiaona Li      Bizhu QiuKent Zhan, Zijian Ming

BigDataBench MICRO  2014

, j g

Page 3: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Acknowledgements (Cont’)Acknowledgements (Cont ) 

Great thanks for  Prof. Jason Mars to invite us to give this tutorial at Micro’14. g

BigDataBench MICRO  2014

Page 4: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench Tutorial Program (1)BigDataBench Tutorial Program (1)

9:00‐9:40 Jianfeng ZhanWhat is BigDataBench?  gBigDataBench benchmarking methodology

9 40 10 00 G L9:40‐10:00   Gang LuBigDataBench data sets and workloads

10:00‐10:30 Coffee break 

BigDataBench MICRO  2014

Page 5: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench Tutorial Program (2)BigDataBench Tutorial Program (2)

10:30‐11:00  Gang LuHow to use BigDataBench data sets  and gworkloads?How to generate Large‐scale data sets?How to generate Large scale data sets?Multi‐tenancy version of BigDataBench 

11:00‐12:00  Zhen Jia BigDataBench subsettingg gHow to use the simulator versions of BigDataBench ?

BigDataBench MICRO  2014

BigDataBench ? 

Page 6: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Handbook of BigDataBenchHandbook of BigDataBench 

Please feel free to download and distribute this handbook.

http://prof.ict.ac.cn/BigDataBench_micro_14/Draft versionDraft version93 pages

BigDataBench MICRO  2014

Page 7: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench handbook: 1st partBigDataBench handbook: 1st part

Section 1 Summary of BigDataBench 3.1Section 2 Benchmarking methodologySection 2 Benchmarking methodologySection 3 BigDataBench specificationSection 4 BigDataBench implementationsSection 5 BigDataBench subsettingSection 5 BigDataBench subsetting

BigDataBench MICRO  2014

Page 8: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench handbook: 2th partBigDataBench handbook: 2th part

Section 6 BigDataBench simulator versionSection 7 Multi‐tenancy version ofSection 7 Multi tenancy version of BigDataBenchS i 8 U lSection 8 User manual Section 9 BigDataBench usersgSection 10 Q&A

BigDataBench MICRO  2014

Page 9: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

OutlineOutline

What is BigDataBench?  

BigDataBench benchmarking methodology

BigDataBench MICRO  2014

Page 10: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Why Big Data Benchmarking?Why Big Data Benchmarking?

Measuring big data systems and architectures quantitatively

BigDataBench MICRO  2014

quantitatively

Page 11: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

What is BigDataBench?What is BigDataBench? An open source big data benchmarking project p g g p j• http://prof.ict.ac.cn/BigDataBench

• Search Google using “BigDataBench”

BigDataBench MICRO  2014

Page 12: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

What is BigDataBench (cont’)?g ( )BDGS(Big Data Generator Suite) for scalable data

Facebook Social Network

ImageNet

Wikipedia  Entries

E‐commerce  Transaction

English broadcasting audio

Amazon Movie Reviews

ProfSearch Resumes

DVD Input Streams

Google Web Graph

g p

SoGou Data

Image scene

MNIST

Genome sequence data Assembly of the human genome

ImpalaNoSql

14  Real‐world Data Sets

Shark

Impala

Search Engine

H d RDMA

SocialNetwork

E-commerce

MPI

Software Stacks33 Workloads

DataMPI

Hadoop RDMAMultimedia Bioinformatics

BigDataBench MICRO  2014

Software Stacks33 Workloads

Page 13: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench evolutionBigDataBench evolution 

5 application domains: 14 data sets and 33 workloadsSame specifications: diverse implementationsMulti‐tenancy version

h b d l

BigDataBench 3.1

Multidisciplinary effort

BigDataBench subset and simulator version  2014.12

BigDataBench 3.0

Typical Internet service domains

2014.4p y

32 workloads: diverse implementations 

BigDataBench 2.0

2013.12

Typical Internet service domainsAn architectural perspective19 workloads & data generation tools

CloudRank 1.0DCBench 1.0BigDataBench 1.02013.7

Search engine6 workloads 

11 data analytics workloads

Mixed data analytics workloads 

BigDataBench MICRO  2014

Page 14: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Why BigDataBench?Why BigDataBench?Specification

Application  domains

Workload Types

Workloads

Scalable data sets (from real 

Multipleimpleme

Multitenancy

Subsets

Simulatorca o do a s ypes oads se s ( o ea

data)p e e

ntationsa cy e s o

version

BigDataBench Y Five Four 33 8 Y Y Y Y

BigBench Y One Three 10 3 N N N NCloudSuite N N/A Two 8 3 N N N Y

HiBench N N/A Two 10 3 N N N N

CALDA Y N/A One 5 1 Y N N N/YCSB Y N/A One 6 N/A Y N N NLinkBench Y One One 10 1 Y N N N

AMPBenchmarks

N N/A One 4 1 Y N N N

BigDataBench MICRO  2014

Page 15: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Observations from CloudSuiteObservations from CloudSuiteThe characteristic of big data workloadsThe characteristic of  big data workloads

High L1I miss • Frontend inefficiencies 

Low ILPC i ffi i i• Core inefficiencies 

LLC  is good but overprovision• Data access inefficiencies• Data‐access inefficiencies

Less off‐chip bandwidth and MLP• Bandwidth inefficienciesBandwidth inefficiencies

Clearing the Clouds, ASPLOS 2012 Best paper

BigDataBench MICRO  2014

Page 16: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

Page 17: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O time

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

Page 18: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

System Behaviors of BigDataBenchSystem Behaviors of BigDataBenchCPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O timeLow CPU utilization

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

Low CPU utilization relatively and lots of I/O time

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

Page 19: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

System Behaviors Of BigDataBenchSystem Behaviors Of BigDataBench CPU utilization I/O wait ratio

Diversified system level behaviors:

20%

40%

60%

80%

100%CPU utilization I/O wait ratio

erce

ntag

e

High CPU utilization & less I/O timeLow CPU utilization

0%

20%

H-G

rep(

7)Km

eans

(1)

geR

ank(

1)dC

ount

(1)

H-B

ayes

(1)

M-B

ayes

M-K

mea

nsPa

geR

ank

-Rea

d(10

)fe

renc

e(9)

tQue

ry(9

) dC

ount

(8)

-Pro

ject

(4)

Ord

erBy

(3)

S-G

rep(

1)M

-Gre

p-T

PC-D

S-…

Ord

erBy

(7)

-TPC

-DS-

…-T

PC-D

S-…

S-So

rt(1)

Wor

dCou

ntM

-Sor

tS_

BigD

ata

Pe

Low CPU utilization relatively and lots of I/O timeM di CPU tili ti

S-K

S-Pa

gH

-Wor H M

M-P H-

H-D

iffI-S

elec

tS-

Wor S- S-O H I-O S S

M-W

AVG

_S

10

100

time

Medium CPU utilization and I/O

0.01

0.1

1

10

… … …

Wei

ghte

d I/O

0.01H

-Gre

p(7)

S-Km

eans

(1)

S-Pa

geR

ank(

1)H

-Wor

dCou

nt(1

)H

-Bay

es(1

)M

-Bay

esM

-Km

eans

M-P

ageR

ank

H-R

ead(

10)

H-D

iffer

ence

(9)

Sele

ctQ

uery

(9)

S-W

ordC

ount

(8)

S-Pr

ojec

t(4)

S-O

rder

By(3

)S-

Gre

p(1)

M-G

rep

H-T

PC-D

S-…

I-Ord

erBy

(7)

S-TP

C-D

S-…

S-TP

C-D

S-…

S-So

rt(1)

M-W

ordC

ount

M-S

ort

AVG

_S_B

igD

ata

BigDataBench MICRO  2014

Weighted disk I/O time ratio

S H I-S S A

Page 20: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Workloads Classification Of BigDataBench

Finding from system behaviorsSystem behaviors vary across different workloads y yWorkloads can be divided into 3 categories:

BigDataBench MICRO  2014

Page 21: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

IPC of BigDataBench vs. other 

2

benchmarks

1

1.5

IPC

0

0.5

rep(

7)an

s(1)

ank(

1)un

t(1)

yes(

1)Ba

yes

mea

nsR

ank

d(10

)ce

(9)

ry(9

) un

t(8)

ect(4

)By

(3)

rep(

1)-G

rep

ry3(

9)By

(7)

C-D

S-…

ry8(

1)or

t(1)

Cou

ntM

-Sor

tgD

ata

PC-C

Suite

HPC

CR

SEC

PEC

fpEC

int

H-G

rS-

Kmea

S-Pa

geR

aH

-Wor

dCou

H-B

ay M-B

M-K

mM

-Pag

eH

-Rea

H-D

iffer

enI-S

elec

tQue

S-W

ordC

ouS-

Proj

eS-

Ord

erS-

Gr

M-

-TPC

-DS-

quer

I-Ord

erS-

TPC

-TPC

-DS-

quer

S-So

M-W

ordC M

AVG

_S_B

ig TPAV

G_C

loud

Avg_

HAv

g_PA

RAV

G_S

PAV

G_S

PE

H- S-

The average IPC of the big data workloads are larger than CloudSuite, SPECFP and SPECINT,  same as PARSEC and slightly smaller than HPCCPARSEC and slightly smaller than HPCC

The avrerage IPC of BigDataBench is 1.3 times of that of CloudSuite

BigDataBench MICRO  2014

Some workloads have high IPC (M_Kmeans, S‐TPC‐DS‐Query8)

Page 22: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Instructions Mix of BigDataBench vs. other benchmarks

For big data workloads:b hMore branch instructions

The percentage is 20% (1.5 times larger than others), except for TPC‐CThe ratio of integer instructions to FP instructions is very high

BigDataBench MICRO  2014

The average is 73

Page 23: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Cache Behaviors of BigDataBench th b h kvs. other benchmarks 

CPU‐intensive I/O‐intensive hybrid

L1I MPKIL1I MPKILarger than traditional benchmarks, but lower than that of CloudSuite (12 Vs. 31)

Different among big data workloadsDifferent among big data workloadsCPU‐intensive(8), I/O intensive(22), and hybrid workloads(9)

MPI workloads have less instruction cache miss 

BigDataBench MICRO  2014

Only 3.4 on the average 

Page 24: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Cache Behaviors of BigDataBenchCache Behaviors of BigDataBench

L2 Cache:The IO‐intensive workloads undergo more L2 MPKI

L3 Cache:The average L3 MPKI of the big data workloads is smallerThe average L3 MPKI of the big data workloads is smaller than all of the other workloads

The underlying software stacks impact dataThe underlying software stacks impact data locality

MPI workloads have better data locality and less cache misses

BigDataBench MICRO  2014

Page 25: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Our observation from BigDataBenchOur observation from BigDataBench

Unique characteristicMore branch instructions &  Higher ratio of integer to FP instructions

Different behaviors between Big Data workloadsDisparity of ILP and memory access behaviorsDisparity  of ILP and memory access behaviors• Several workloads can achieve higher IPC • Several workloads can achieve higher Off‐chip bandwidth 

CloudSuite is a subclass of Big Data

High front‐end stall is not the unique characteristics of big data workloads 

Related with software stacks

BigDataBench MICRO  2014

Page 26: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench PublicationsBigDataBench Publicationsi h i h k S i f S i 20 hBigDataBench: a Big Data Benchmark Suite from Internet Services.  20th IEEE 

International Symposium On High Performance Computer Architecture (HPCA‐2014).Characterizing and Subsetting Big Data Workloads. Zhen Jia, Jianfeng Zhan, Wang Lei, Rui Han, Sally A. McKee, Qiang Yang, Chunjie Luo, and Jingwei Li. In 2014 IEEE International Symposium on Workload Characterization (IISWC). IEEE, 2014.Characterizing data analysis workloads in data centers.  2013 IEEE International Symposium on Workload Characterization (IISWC 2013)(Best paper award)BigOP: generating comprehensive big data workloads as a benchmarking framework.  19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking. The Fourth workshop on big data benchmarking (WBDB 2014)

BigDataBench MICRO  2014

Page 27: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench usersBigDataBench users 

More than 20 groups have published papers  using BigDataBenchg g

h // f i /Bi D B h/ /http://prof.ict.ac.cn/BigDataBench/users/

BigDataBench MICRO  2014

Page 28: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

OutlineOutline

What is BigDataBench?  

BigDataBench benchmarking methodology

BigDataBench MICRO  2014

Page 29: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Five StepsFive Steps

Multi‐tenancy

Big data 

Diverse implementation

yor subset

Investigate 

Typical workloads and data set

benchmark specification

important application domain

BigDataBench MICRO  2014

Page 30: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Multi‐tenancy 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation 

version

Mix with different percentages

pp

Domain … Data operations & 

workload patterns

specification …

tools

Reduce benchmarking cost

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench

subset

benchmarking cost

BigDataBench MICRO  2014

Page 31: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Investigate Application DomainsInvestigate Application DomainsWh t li tiInternet Services What application domains should we pay attention to?

Internet Services 

BigDataBench MICRO  2014

Page 32: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Internet ServicesInternet Services

Search Engine Social Network

Taking up 80% of internet services 

5%15%

Search Engine Social NetworkElectronic Commerce Media StreamingOthers

according to page views and daily visitors

40%

15%

5%

25%

http://www.alexa.com/topsites/global;0Top 20 websites

BigDataBench MICRO  2014

Page 33: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Multimedia DataMultimedia Data

BigDataBench MICRO  2014

Page 34: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

The Explosive Growth of Multimedia Data

new 

VIDEOS on YouTube 

new 

PHOTOS on FLICKR every hours MUSIC streaming every minute minute on PANDORA every minute

VIDEO feeds fromdata growth 

arei t VOICE llVIDEO feeds from surveillance cameras

are IMAGES, VIDEOS, documents, …

minutes VOICE calls on Skype every minute

htt // ld l / t t/ l d /2014/11/ h ti bi d t DKB 2 df

BigDataBench MICRO  2014

http://www.oldcolony.us/wp‐content/uploads/2014/11/whatisbigdata‐DKB‐v2.pdf

Page 35: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

The Explosive Growth of Human Genome Data

htt // h / it /d f lt/fil / i d bi d t 1 0 df

BigDataBench MICRO  2014

http://www.osehra.org/sites/default/files/genomics_and_big_data_1_0.pdf

Page 36: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Application Domain We ChooseApplication Domain We Choose

Internet ServicesServices

Search EngineSocial NetworkSocial NetworkE‐commerceMultimedia

Emerging 

MultimediaBioinformatics

but Important

BigDataBench MICRO  2014

Page 37: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MethodologyBigDataBench Methodology

Application 

Domain 1

Application 

Data models of different 

types & semantics

Domain … Data operations & 

workload patterns

Application 

Domain N

BigDataBench MICRO  2014

Page 38: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Success story: Relational model of data

E. F. Codd, A relational Model of Data for Large shared data banks Communication ofLarge shared data banks. Communication of ACM, vol 13. no.6, 1970.Set concept : general mathematical meaning

General representation of datapBasis of relational algebra (theoretical foundation of database)of database)5 basic operations

S l P j P d U i Diff• Select, Project, Product, Union, Difference 

BigDataBench MICRO  2014

Page 39: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Success story: parallel computing

By a multidisciplinary group of well‐known researchers

e.g.:Jim Gray,Michael JordanDavid A. Patterson

Operations & PatternsAbstracted from 13representative parallelcomputation patternsParallel computation  inherent demand for big data 

BigDataBench MICRO  2014processing (volume & complexity)

Page 40: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Primitive Operations & Patterns in Big DataPrimitive Operations & Patterns in Big Data 

3 Categories of Operations11 basic operations

3 P i P tt3 Processing Patterns

BigOP: generating comprehensive big data workloads as a benchmarking framework. 

BigDataBench MICRO  2014

g g g p g gDASFAA 2014

Page 41: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Some Examples (not exhausted)Some Examples (not exhausted)

BigOP: generating comprehensive big data workloads as a benchmarking framework.  g g g p g g19th International Conference on Database Systems for Advanced Applications (DASFAA 2014)

BigDataBench MICRO  2014

Page 42: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Image SearchImage  Search

Perceptual Hash  Algorithm 

Input Image

samping Basic information

of image

SIFT Imagefeatures

Hash Imagefingerprint Database

Set Ope

SimilarSort

ration

SimilarimagesOutput

Sort

BigDataBench MICRO  2014

42

Page 43: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Feature Extraction SIFTFeature Extraction‐‐SIFT

GaussianFilter

InputImage Convolution

Image ScaleSpace

Sampling ImagePyramid

MatrixSubtraction DOG

Image

Sort

Key PointOf Image

GaussianWindow

CountFeatureVectorsOutput Sampling

BigDataBench MICRO  2014 43

Page 44: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Workload & Data Set in BigDataBench

•Text•Graph•Table•Multimedia

• Structured• Semi‐Structured•Unstructured

Data Model Semantics

Workload Patterns

Data Operations

•Different combination of units of computation  

•Unit of computation

BigDataBench MICRO  2014

Page 45: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark pp

Domain … Data operations & 

workload patterns

specification …

Application 

Domain N

Benchmark 

specification N

BigDataBench MICRO  2014

Page 46: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Data management’s traditionData management s tradition 

Specification First. Functions of abstraction are units ofFunctions of abstraction are units of computation that appear frequently in the application domain being benchmarkedapplication domain being benchmarked. They are expressed in a generic form that is independent of the underlying system implementation.implementation.

BigDataBench MICRO  2014

Page 47: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

TPC C examplesTPC‐C examples 

BigDataBench MICRO  2014

Page 48: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation pp

Domain … Data operations & 

workload patterns

specification …

tools

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench MICRO  2014

Page 49: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Real World Data setsReal‐World Data sets

14 real‐world data sets

BigDataBench MICRO  2014

Page 50: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Big Data Generation Tool BDGSBig Data Generation Tool‐‐BDGS

Provide scalable data set extracted from real‐world data sets

Text

BDGSbased on real data

TableGraph

BigDataBench MICRO  2014

Page 51: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Naïve Text generatorNaïve Text generator

select word randomly

big

hit t

system

miningdata

evaluatemachine

select word randomlyarchitectureCPU

benchmarkingmemory

learning

cpu

wordsfollowing multinomial distribution

documnet

Only modeling on word level;Words are selected according to the same gdistribution

BigDataBench MICRO  201451/

Page 52: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Improved Text generatorImproved Text generator

big CPUevaluatemachine

topic2 CPUselect word randomly

architecture

CPU

benchmarking

miningdata

t

topic1

topic3

select topic randomlyCPU

words

systemmemorylearning

documnet

topic3

topicsfollowing multinomial distribution under topic2following multinomial distribution

Modeling on topic and word levelWords are drew from distribution under particular topicTopics are drew from same distribution, as a result, each document has same topic proportion

BigDataBench MICRO  201452/

Page 53: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Optimized Text GeneratorOptimized Text Generator

big systemdata

evaluatemachine

topic1topic2

select word randomly

architecture

benchmarking

miningdata

CPU

memory

l i

CPU

topic3

select  topic randomlyselect topic 

proportion randomly

words

following multinomial distribution under topic1

learning

documnettopics

following multinomial distribution

Topic distribution parameters

following dirichlet distribution

Modeling on topic and word levelWords are drew from distribution under particular topicTopics are selected from different distribution with parameters following a dirichlet distribution

BigDataBench MICRO  201453/

Page 54: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Workloads With Diverse Implementations

MapReduce

Software Stack

DataMPIMPI

Spark

BigDataBench MICRO  2014

Page 55: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MethodologyBigDataBench Methodology

Application  Benchmark  Real‐world data sets 

Multi‐tenancy 

Domain 1

Application 

Data models of different 

types & semantics

specification 1

Benchmark  Data generation 

version

Mix with different percentages

pp

Domain … Data operations & 

workload patterns

specification …

tools

Reduce benchmarking cost

Application 

Domain N

Benchmark 

specification N

Workloads with diverse implementations 

BigDataBench

subset

benchmarking cost

BigDataBench MICRO  2014

Page 56: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Multi‐tenancy version of BigDataBench

Scenarios of multiple tenants running heterogeneous applications in cloud datacenters

Latency‐critical online servicesLatency‐insensitive offline batch applications

Mining real‐world Workload traces Mixed workloads

Benchmarking scenarios

Workload traces (Google and Facebook) Workload matching 

using Parametric workload 

Mixed workloads in public clouds

Profiling Real‐world Workload 

t

Machine learning techniques

generation tool Data analytical 

workloads in private clouds

BigDataBench MICRO  2014

traces

Page 57: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench SubsetBigDataBench Subset

MotivationExpensive to run all the benchmarks for system p yand architecture researches

• multiplied by different implementationsmultiplied by different implementations • BigDataBench 3.0 provides about 77 workloads

BigDataBench MICRO  2014

Page 58: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

Subsetting MethodologySubsetting Methodology

• Identify a comprehensive set of workload characteristics from a specific perspective

• Eliminate the correlation data in those metricsEliminate the correlation data in those metrics• Map the high dimension metrics to a low dimension

• Use the clustering method to classifyCh t ti kl d f h t• Choose representative workloads from each category

BigDataBench MICRO  2014

Page 59: BigDataBench First Part-1213 - Big Data and AI Benchmark ...prof.ict.ac.cn/BigDataBench_micro_14/BigDataBench_First_Part.pdf · BDGS(Big Data Generator Suite) for scalable data Facebook

BigDataBench MICRO  2014