Mbench: Benchmarking a Multicore Operating System Using ...prof.ict.ac.cn/bpoe_6_vldb/wp-content/uploads/2015/... · Multicore Operating System Using Mixed Workloads ... The seven

Mbench: Benchmarking a

Multicore Operating System

Using Mixed Workloads

Gang Lu and Xinlong Lin

Institute of Computing Technology,

Chinese Academy of Sciences

BPOE-6, Sep 4, 2015

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Fast evolution of hardware

Intel Xeon E5-4669 v3: 18 cores

Intel Xeon E5-4667 v3: 16 cores

If with 4 NUMA nodes, 72 and 64 cores, 144 and

128 threads

Booming big data applications

Backgrounds

2

More cores can

accommodate more applications

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Motivation

3

* Malte Schwarzkopf, Derek G. Murray, and Steven Hand. 2012. The seven deadly sins of cloud computing

research. HotCloud 2012.

“In the OS community, the lack of a representative

“desktop mix” benchmark prompted a call for better

multi-core benchmarks, and in a similar vein, we believe

the cloud computing community needs representative

cluster benchmarks.” ---The seven deadly sins of cloud computing research*

Actually, not only work in cloud computing, but also in

general data centers, we need new multi-core OS

benchmarks.

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Current OS benchmark suites

Focus on performance or scalability

4 single core -> multi-core -> many core

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Few used mixed workloads

5

What are OS researchers using?

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Few used mixed workloads

6


Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

1999,Celluar Disco TPC-D + RayTrace

2003, Xen OSDB + SPEC WEB99 + dd + fork bomb

2005, K42 SPEC SDET + streaming applications

2009, HeliOS SAT solver + a disk indexer

2013, Tessellation video player + dropbox

7


Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Why using mixed workloads

In industry, admins tend to consolidate

workloads to gain resource utilization

Virtualization systems

Even monolithic OSes like Linux

The performance curve of a single workload

can be largely impacted

Tail latency is quite sensitive

8

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Why using mixed workloads

The performance curve of a single

workload can be largely affected

9

(memcached, PARSEC.streamcluster)

Ratios of the performance of colocating workload A

with B to the performance of solo running of

workload A, which are denoted as (A, B).

Slowdown for mixed MOSBENCH workloads*.

(Background workload is gmake)

* Kuz Ihor,Anderson Zachary, Shinde Pravin, Roscoe Timothy. Multicore

OS benchmarks : we can do better. HotOS 2011.

Ideal curve

(running alone)

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Tail latency is sensitive

Average performance is not as

sensitive as worst performance

10

(Search, bodytracker) and (bodytracker, streamcluster)

Ratios of the performance of colocating workload A with B to the performance

of solo running of workload A, which are denoted as (A, B).

much sensitive

not sensitive

LXC: Linux Containers

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Mbench

Goal--Providing real mixed workloads

What properties?

Benchmark selection

Full coverage

Micro and application benchmarks

Latency critical workloads

Least redundancies

Experiment control

Tunable workloads composition

Performance monitoring and analyzing

11

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

What do we include in Mbench?

Summary of the benchmarks

12

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Construction of Mbench

Micro benchmarks

Key principles

Covering each subsystem

cpu, memory, file system, network

Covering kernel functionalities

system calls, in-kernel mechanisms

13

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Micro benchmarks

SPEC CPU

14

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Micro benchmarks

cachebench

15

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Micro benchmarks

IOzone

16

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Micro benchmarks

netperf

17

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Micro benchmarks

Will-It-Scale

Iterations of: brk, dup, eventfd, fallocate, futex,

getppid, lock, lseek......

We extend it to support exporting latencies of

invocations

18

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es


Application benchmarks

Key principles

Real application

Different workload type

Different resource priorities

19

BigDataBench

is a good choice!

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

BigDataBench

With 33 workloads, different programming

models

Real-world big data

20 Lei Wang; Jianfeng Zhan, etc.. BigDataBench: A big data benchmark suite from internet services. HPCA’14

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es



Search

Front end: tomcat

Back end: nutch

21

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es



Hadoop

sort, grep

Spark

kmeans, pagerank 22

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es



PARSEC

small offline batches

23

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es



memcached

Derived from MOSBENCH*

Modified with tail latency and different running

models

24

* Silas Boyd-Wickizer, Austin T. Clements, Yandong Mao, Aleksey Pesterev, M. Frans Kaashoek, Robert Morris,

and Nickolai Zeldovich. An analysis of Linux scalability to many cores. OSDI’10.

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es



PostgreSQL

pgbench

25

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

How to use?

Two example problems

What workload mixes yield the most

information when used to evaluate a

multicore OS?

How are the services concurrent with

other applications?

26

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Use cases (1)

Micro benchmarks

27

Performance degradation of co-running two benchmarks on a single server. The

numbers 0∼10 on the axes denotes SPECCPU.{bzip2, sphix3}, cachebench.{read,

write, modify}, IOzone.{write, read, modify}, netperf.{tcp_stream, tcp_rr, tcp_crr},

respectively. The numbers in the grids are performance degradation percentage (%)

of a benchmark on y-axis interfered by a background benchmark on x-axis.

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Use cases (1)

Micro benchmarks

28

A new system

Performance degradation of co-running two benchmarks on a single server. The

numbers 0∼10 on the axes denotes SPECCPU.{bzip2, sphix3}, cachebench.{read,

write, modify}, IOzone.{write, read, modify}, netperf.{tcp_stream, tcp_rr, tcp_crr},

respectively. The numbers in the grids are performance degradation percentage (%)

of a benchmark on y-axis interfered by a background benchmark on x-axis.

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Use cases (2)


29

performance degradation of tail latency of the Search workload co-located with

the PARSEC benchmarks. Each time we run an individual benchmark at the

background. For the three kernels, their tail latencies of running Search at 300

req/s on a 12-core server are respectively 108.6, 134.5, 127.7 ms.

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Use cases (2)


30

performance degradation of tail latency of the Search workload co-located with

the PARSEC benchmarks. Each time we run an individual benchmark at the

background. For the three systems, their tail latencies of running Search at 300

req/s on a 12-core server are respectively 129.3, 210.9, and 128.5 ms.

A new system

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

What else should we do?

Experiment control

Controlling of mixed workloads is more difficult

selection, time synchronization

Resource allocation policies

CPU pining, NUMA allocation

Performance monitoring

Monolithic kernel, virtualization, multi-kernel

System & architecture levels

Log collecting and analysis

31

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

An experiment control tool

Mcontroller

32

Structure of the experiment control tool (Mcontroller)

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

What can Mcontroller do?

Experiment control

Customizable Controlling

workload selection, time series of each workload

batch running of multiple experiments

Resource allocation

CPU pining, NUMA allocation

Performance monitoring

Supports Linux, Linux Containers, Xen

Nearly all system & architecture characteristics

proc, perf, Oprofile

33

Interfaces are extensible!

Benchmarks can be easily added!

Ins

titute

of C

om

pu

ting

Tech

no

log

y, C

hin

ese A

cad

em

y o

f Scie

nc

es

Conclusions

Current OS benchmarks do not evaluate

performance isolation!

OS researchers choose mixed workloads at will

Mbench: an OS benchmark suite with

mixed workloads

micro & application

latency critical workloads

resource & workload types

We developed an experimental control tool

Supports many good features and is extensible

34

Thank you! Mahalo!

Documents

Mbench: Benchmarking a Multicore Operating System Using ...prof.ict.ac.cn/bpoe_6_vldb/wp-content/uploads/2015/... · Multicore Operating System Using Mixed Workloads ... The seven