26
3rd Joint Workshop on Emb edded and Ubiquitous Comp uting 1 Effect of Context Aware Scheduler on TLB Satoshi Yamada PhD Candidate Kusakabe Laboratory

Effect of Context Aware Scheduler on TLB

  • Upload
    hector

  • View
    44

  • Download
    0

Embed Size (px)

DESCRIPTION

Effect of Context Aware Scheduler on TLB. Satoshi Yamada PhD Candidate Kusakabe Laboratory. Contents. Introduction Overhead of Context Switch Context Aware Scheduler Benchmark Applications and Measurement Environment Result Related Works Conclusion. widely spread multithreading. - PowerPoint PPT Presentation

Citation preview

Page 1: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

1

Effect of Context Aware Scheduler on TLB

Satoshi Yamada

PhD Candidate

Kusakabe Laboratory

Page 2: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

2

Contents

• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmark Applications and Measurement E

nvironment• Result• Related Works• Conclusion

Page 3: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

3

widely spread multithreading

• Multithreading hides the latency of disk I/O and network access

• Threads in many languages, Java, Perl, and Python correspond to OS threads

*  More context switches happen today*  Process scheduler in OS is more responsible for the system performance

Page 4: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

4

• Overhead of a context switch – includes that of loading a new working set for next

process– is deeply related with the utilization of caches

• Agarwal. etc “Cache performance of operating system and multiprogramming workloads” (1988)

• Mogul, et al. “The effect of of context switches no cache performance” (1991)

Context Switch and Caches

Process A Process B

Process A

Cache

Switch

Process A

Process BSwitch

Working setsoverflows the cache

Process B

A only

B only

A and B

Page 5: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

5

Advantage of Sibling Thread

mm

signal

file..

mm

signal

file..

fork()mm_struct

signal_struct

task_struct

create a PROCESS create a THREAD

task_struct

OS does not have to switch memory address spaces in switch sibling threads

signal_struct

.

.

we can expect the reductionof the overhead of context switch

Parent Parenttask_struct

mm

signal

file..

copy

mm_struct

signal_struct

.

.

Child

mm

signal

file..

share

clone()mm_struct task_struct

signal_struct

.

...

Child

Sibling Threads

Page 6: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

6

Contents

• Introduction• Overhead of Context Switch and TLB• Context Aware Scheduler• Benchmark Applications and Measurement E

nvironment• Result• Related Works• Conclusion

Page 7: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

7

TLB flush in Context Switch

• TLB is a cache which stores the translation from virtual addresses into physical address– TLB translation latency: 1 ns– TLB miss overhead: several accesses to memory

• On x86 processors, most of TLB entries are invalidated (flushed) in every context switch by changing memory address space

TLB flush does not happen in the context switch among sibling threads

Page 8: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

8

Overhead due to a context switch

by lat_ctx in LMbench

Page 9: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

9

Contents

• Introduction• Overhead of Context Switch and TLB• Context Aware Scheduler• Benchmark Applications and Measurement E

nvironment• Result• Related Works• Conclusion

Page 10: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

10

O(1) Scheduler in Linux

• O(1) scheduler runqueue has– active queue and expired que

ue– priority bitmap and array of lin

ked list of threads

• O(1) scheduler – searches priority bitmap– chooses a thread with the hig

hest priority

Scheduling overhead is independent of the number of threads

Page 11: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

11

Context Aware (CA) Scheduler

• CA scheduler creates auxiliary runqueues per group of threads

• CA scheduler compares Preg and Paux• Preg: the highest priority in regular O(1) scheduler runqueue• Paux: the highest priority in the auxiliary runqueue

• if Preg - Paux <= threshold, then we choose Paux

Page 12: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

12

Context Aware Scheduler

Linux O(1) scheduler

Context switches between processes: 3 times

Context switches between processes: 1 time

CA scheduler

A C DB E

A C D B E

Page 13: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

13

Fairness

• O(1) scheduler keeps the fairness by epoch– cycles of active queue and

expired queue

• CA scheduler also follows epoch – guarantee the same level o

f fairness as O(1) scheduler

Page 14: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

14

Contents

• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion

Page 15: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

15

Benchmarks• Java

– Volano Benchmark (Volano)– lusearch program in DaCapo benchmark suite (D

aCapo)

• C– Chat benchmark (Chat)– memory program in SysBench benchmark suite

(SysBench)Information of Each Benchmark Applications

Page 16: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

16

Measurement Environment

• Hardware

• Sun’s J2SE 5.0• threshold of context aware scheduler

– 1 and 10

• Perfctr to count the TLB misses• GNU’s time command to measure the total system

performance

Page 17: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

17

Contents

• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion

Page 18: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

18

Effect on TLB

• CA scheduler significantly reduces TLB misses• Bigger threshold is more effective

– frequent changes of priority by dynamic priority especially in DaCapo and Volano

Results of TLB misses (million times)

Page 19: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

19

Effect on System Performance

Results by time command (seconds)

Results of the Counters in Each Application(seconds)

CA scheduler • enhances the throughput on every application• reduces the total elapsed time by 43%

Page 20: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

20

Contents

• Introduction• Overhead of Context Switch• Context Aware Scheduler• Benchmarks and Measurement Environment• Result• Related Works• Conclusion

Page 21: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

21

H. L. Sujay Parekh, et. al,“Thread Sensitive Scheduling for SMT Process

ors” (2000)

• Parekh’s scheduler– tries groups of threads to execute in parallel a

nd sample the information about• IPC• TLB misses• L2 cache misses, etc

– schedules on the information sampled

Sampling Phase Scheduling Phase Sampling Phase Scheduling Phase

Page 22: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

22

Pranay Koka, et. al, “Opportunities for Cache Friendly Process” (2

005)

• Koka’s scheduler– traces the execution of each thread– put the focus on the shared memory space

between threads– Schedule on the information above

Tracing Phase Scheduling Phase Tracing Phase Scheduling Phase

Page 23: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

23

Conclusion

• Conclusion– CA scheduler is effective in reducing TLB misses– CA scheduler enhances the throughput of every

application

• Future Works– Evaluation on other platforms– Investigation of fairness among an epoch

• compare with Completely Fair Scheduler (Linux 2.6.23)

Page 24: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

24

widely spread multithreading

• Multithreading hides the latency of disk I/O and network access

• Threads in many languages, Java, Perl, and Python correspond to OS threads

ThreadA ThreadB

disk

*  More context switches happen today*  Process scheduler in OS is more responsible for the system performance

ThreadB waits

Page 25: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

25

Context Aware (CA) scheduler

A C DB E

A C D B E

Linux O(1) scheduler       

CA scheduler

Context switches between processes: 3 times

Context switches between processes: 1 time

Our CA scheduler aggregates sibling threads

Page 26: Effect of Context Aware Scheduler on TLB

3rd Joint Workshop on Embedded and Ubiquitous Computing

26

Process A

Process C

Results of Context Switch

L2 cache size: 2MB

(micro seconds)

Process BCache 0

1MB

2MB