30
EECS 750: Advanced Operating Systems 01/24/2014 Heechul Yun 1

EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

  • Upload
    lyanh

  • View
    224

  • Download
    3

Embed Size (px)

Citation preview

Page 1: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

EECS 750: Advanced Operating Systems

01/24/2014

Heechul Yun

1

Page 2: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Administrative

• Sign up presentations – Email me two papers that you want to present

– First-in-first-service

• The first paper reading begins next week – Borrowed-Virtual-Time (BVT) scheduling: supporting

latency-sensitive threads in a general-purpose scheduler, SOSP’99

– Email me the summary by 11:59 p.m., Sunday

– Please include “[EECS750]” in the subject line

2

Page 3: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Today

• In-depth introduction of overall topics

• Project ideas & available resources

3

Page 4: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Topics

• Performance

– How to manage Multicore, cache, DRAM, SSD, and GPU for good throughput/fairness/QoS?

• Power/Energy

– How to save power/energy while still getting enough performance?

• Reliability

– How to make system more predictable, less buggy?

– If you have bugs, how to find them, automatically?

4

Page 5: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

5 H Sutter, “The Free Lunch Is Over”, Dr. Dobb's Journal, 2005(Updated in 2009)

Page 6: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Multicore

6

Server Desktop Mobile RT/Embedded

Page 7: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Multicore

• Lots of parallelism

• More performance at cheaper cost 7

NVIDIA Tegra K1 SoC: 4xCPU cores + 128 GPU cores + … (Source: nvidia.com)

Page 8: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Operating Systems Perspective

8

CPU

Unicore

T1 T2

Core1

Core2

Core3

Core4

Multicore

T1

T2

T3

T4

T5

T6

T7

T8

• Time-sharing

– Unicore: multiple tasks share a single processor

– Multicore: multiple tasks share multiple processors

Page 9: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Challenges: Shared Resources

9

CPU

Cache, DRAM, Disk

Unicore

T1 T2

Core1

Shared Cache, DRAM, Disk

Core2

Core3

Core4

Multicore

T1

T2

T3

T4

T5

T6

T7

T8

Performance Impact

Page 10: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Data Intensive Applications

10

• Multimedia processing, object tracking, game, big data(*), …

• More stress on the memory hierarchy

(*) Source: Intel, The Growing Importance of Big Data and Real-Time Analytics, 2012

Page 11: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Inter-Core Memory Interference

• Significant slowdown: >6x slowdown on 4 cores

11

Slowdown ratio due to interference

Ru

nti

me

slo

wd

ow

n

0.00

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Shared Memory

foreground

X-axis

Intel Xeon

L3 Cache

Core

background 470.lbm

Core Core Core

Page 12: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Inter-Core Memory Interference

• Performance depends on both cores and co-runners

12

L. Tang et al., “The Impact of Memory Subsystem Resource Sharing on Datacenter Applications”, ISCA’11,

Page 13: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Questions

• How to maximize overall throughput?

• How to provide QoS (Quality-of-Service)?

• How to guarantee performance?

• Time-sharing based scheduling is not sufficient in this multi-core era

• Need to deal with parallelism and shared resources

13

Page 14: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

In This Course

• CPU scheduling (Week 2)

• Contention-aware scheduling (Week 3-4)

• Cache and DRAM management (Week 5-6)

• Shared Disk and GPU management (Week 7-8)

14

Page 15: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Power/Energy

• Mobile

– Battery powered devices

• Data center

– 61 billion kWh in 2006 (1.5% of total U.S electricity) *

15 (*)Source: EPA Report to Congress on Server and Data Center Energy Efficiency, 2007

Page 16: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Data Center Operating Cost Breakdown

16 Source: J Koomey et al, “ASSESSING TRENDS OVER TIME IN PERFORMANCE, COSTS, AND ENERGY USE FOR SERVERS”, 2009

Over 40% on energy

Page 17: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

17

Power Consumption

• CPU and memory consumes significant power – Intel Core-i7 haswell CPU: 15W, 4G DDR3 DRAM module x 2: 10W

– Calxera EnergyCore (ARM server) CPU: 5W

Figure source: Luiz André Barroso and Urs Hölzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, Morgan & Claypool, 2009

Page 18: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Energy Reduction

• Hardware – Processing technology improvement

• 28nm 20nm

– Clock gating, power gating

– …

• Software – DVFS (Dynamic voltage and frequency scaling)

– DPM (Dynamic power management)

– …

18

Page 19: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Example of Previous Work (CPU and Memory DVFS)

19

CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)

200 100 3.46 1690

100 100 3.55 1182

Memxfer5b : memory benchmark program

Half of CPU clock

Energy saved 30%

Exec. time increased only 3%

Page 20: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Motivation

20

CPU(Mhz) Mem(Mhz) Time(s) Energy(mJ)

200 100 4.26 2364

200 50 4.28 2106

Dhrystone: CPU benchmark program

Half of Mem clock

Energy saved 10%

Exec time increased only 0.05%

Page 21: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Energy Equation and Validation

Capacitance (nF) Power (mW)

Kca Kcs Kma* Kms* I R

0.505 0.224 0.540 0.210 6.570 67.434

21

)()(

)()( 2*22*2

ePRI

f

MRfVkfVk

f

CRfVkfVkE

m

mmaccpucs

c

mmscca

Obtained coefficients in the energy equation

• Validated on a ARM926-ejs based platform

Page 22: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Energy vs. Utilization

22

Task set cache stall ratio (MH/(CH+MH) ): 0.3

0.5

0.6

0.7

0.8

0.9

1

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

MAX

CPU-only

Static

utilization

No

rmal

ized

ave

rage

po

wer

co

nsu

mp

tio

n

MultiDVFS

Page 23: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

In This Course

• How to save power/energy? (Week10)

– Mobile device level

– Server cluster level

23

Page 24: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Reliability

• Multi-threaded applications

– Hard to program

– Easy to produce subtle bugs

– Stress-testing is not effective and time consuming

– Cost lots of $$$

24

Page 25: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

State Space Explosion

• Example

Initially: V1 = V2 = … = V10 = 0

Thread 1

1: V1 = 1 2: stop

Thread 2

1: V2 = 1 2: stop

Thread 10

1: V10 = 1 2: stop

3,628,800 (10!) interleavings, 1024 (2^10) states

Exhaustive testing is impractical

(*) Slide from Godefroid’s talk in 2004 at PASTE with minor modification for our purpose

Page 26: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

In This Course

• How to find bugs? (Week 11)

– Data race detection, atomicity violation detection, systematic testing and model checking

• How to prevent bugs? (Week 12)

– Deterministic runtimes

26

Page 27: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Project

• Available Resources

– BeagleBone Black x1, Odroid-XU-E x1, Samsung ARM Chromebook x1, Nexus 7 (1st gen) x1

– I can buy up to $300 equipment for your project

27

Page 28: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Project Ideas

• DRAM Bank-aware user-level malloc library

– Goal: control memory allocations over DRAM banks

• can save energy by packing all your allocations in one bank.

• can reduce worst-case latency by assigning dedicated banks for latency critical applications

– Read

• ULCC: A User-Level Facility for Optimizing Shared Cache Performance on Multicores,

• PALLOC, Jantz’s VEE paper

– I can provide simple malloc source code to start from

28

Page 29: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Project Ideas

• Page recoloring kernel patch (& tool)

– Re-map already allocated pages of certain colors (or DRAM banks) to different ones

– Read

• Gaining insights into multicore cache partitioning: Bridging the gap between simulation and real systems

29

Page 30: EECS 750: Advanced Operating Systems - KU ITTCheechul/courses/eecs750/S14/slides/Intro-2.pdf · •CPU scheduling (Week 2) •Contention-aware scheduling ... –Investigate the performance

Project Ideas

• Comprehensive power/performance analysis

– Investigate the performance impact of CPU DVFS, memory DVFS, and DPM on ARM development boards or recent Intel CPU with energy counters

– Develop a better measurement tool

– Read

• An undergraduate individual study

• http://web.eece.maine.edu/~vweaver/projects/rapl/

• MultiDVFS paper (ECRTS'10)

30