Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Tuning SCHED_ULE on FreeBSD
George Neville-Neil
May 7, 2009
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 1 / 29
Introduction
Outline
I BSD Scheduler HistoryI SCHED_ULEI Tuning HooksI Testing MethodologyI Effects
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 2 / 29
Introduction
BSD Scheduler History
I BSD written for uni-processor machinesI No SMPI No HTTI No multicoresI Up through FreeBSD 5 only modified not wholesale rewritten
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 3 / 29
Introduction
Why SCHED_ULE?
I SMP and multi-coreI SMP is NOT multi-coreI Cache effects
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 4 / 29
Introduction
Why keep SCHED_BSD?
I One size does not fit allI There are still uniprocessorsI Embedded systemsI A baseline to compare against
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 5 / 29
Introduction
Scheduler Responsibilities and Goals
I Arbitrate amongst competing processesI Adhere to the will of the administratorI Stay out of the way
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 6 / 29
Introduction
Why tune the scheduler?
I Can change overall performance of the systemI Favor one type of job over anotherI Not all workloads are interactive
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 7 / 29
Introduction
Don’t Panic
I The scheduler is one of the most important components of thekernel
I You (probably) cannot destroy your system via scheduler tuningI Proceed with cautionI Measure, modify, measure, modifyI All of the tunables can simply turned off if they cause trouble
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 8 / 29
Tuning Hooks
Interactivity Tunables
name Name of scheduler, ULE or 4BSDinteract Interactivity score threshold
slice Time slice for timeshare threads (100ms)
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 9 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
steal_thresh Minimum load on a remote CPU before we’ll steal work.steal_idle Attempt to steal idle work from other CPUs before this
CPU goes idle.steal_htt Steals work from another core on idle.
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 10 / 29
Tuning Hooks
Stealing
I Stealing in SCHED_ULE can be virtuousI Cores can steal work from each otherI It is a way of balancing work in an SMP/multi-core system
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 11 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
balance Enable the long term load balancer.balance_interval Average frequency in stathz ticks to run the long term
load balancer (below).affinity Number of ticks to keep a thread from changing CPU.
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 12 / 29
Tuning Hooks
SCHED_ULE Tuning Hooks
idlespinthresh Threshold before idle spinning can occuridlespins Number of times the idle thread will spin waiting for new
workstatic_boost Assign static priorities to sleeping threadspreepmt_thresh Minimum priority for preemption, lower priorities are
more likely to be picked.
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 13 / 29
Testing
Testing Methodology
I We introduce a dummy load on the systemI Read data from another processI Do some math in a loopI Should have few or no voluntary context switchesI Wish to reduce involuntary context switches
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 14 / 29
Testing
Context Switching
I Changing the process which is executing on a coreVoluntary Process takes an action that blocks or calls
sched_yield()Involuntary with Preemption On exiting a critical section or
interrupt service routine a process may bepre-empted.
Involuntary without Preemption
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 15 / 29
Testing
The output of top(1)
1 l a s t p id : 1023; load averages : 0.96 , 0.53 , 0.25 up 0+00:08:21 14:40:282 100 processes : 10 running , 58 sleeping , 32 wa i t i ng3 CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e4 Mem: 17M Act ive , 9848K Inac t , 106M Wired , 68K Cache , 16M Buf , 7785M Free5 Swap : 8192M Tota l , 8192M Free67 PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND8 1019 gnn 0 21 0 0 0 0 0.00% dummy29 982 gnn 0 0 0 0 0 0 0.00% tcsh
10 1015 gnn 0 0 0 0 0 0 0.00% dummy111 1011 gnn 0 0 0 0 0 0 0.00% usdlogd
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 16 / 29
Testing
Tuning Tests
I Turn off balancingI Change the time sliceI Test system has eight cores totalI Each test was run for 15 minutes while observing top.
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 17 / 29
Testing
Turn off Balancing
I The CPU balancer runs every 133 ticksI In a system that is being hand tuned why run the balancer?I What’s the effect of turning off the balancer
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 18 / 29
Testing
With Balancing
1 l a s t p id : 1023; load averages : 0.96 , 0.53 , 0.25 up 0+00:08:21 14:40:282 100 processes : 10 running , 58 sleeping , 32 wa i t i ng3 CPU: 12.5% user , 0.0% nice , 0.0% system , 0.0% i n t e r r u p t , 87.5% i d l e4 Mem: 17M Act ive , 9848K Inac t , 106M Wired , 68K Cache , 16M Buf , 7785M Free5 Swap : 8192M Tota l , 8192M Free67 PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND8 1019 gnn 0 21 0 0 0 0 0.00% dummy29 982 gnn 0 0 0 0 0 0 0.00% tcsh
10 1015 gnn 0 0 0 0 0 0 0.00% dummy111 1011 gnn 0 0 0 0 0 0 0.00% usdlogd
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 19 / 29
Testing
Without Balancing
1 l a s t p id : 1024; load averages : 0.98 , 0.61 , 0.30 up 0+00:09:21 14:41:282 100 processes : 10 running , 58 sleeping , 32 wa i t i ng3 CPU: 12.4% user , 0.0% nice , 0.1% system , 0.0% i n t e r r u p t , 87.5% i d l e4 Mem: 17M Act ive , 9852K Inac t , 106M Wired , 68K Cache , 16M Buf , 7785M Free5 Swap : 8192M Tota l , 8192M Free67 PID USERNAME VCSW IVCSW READ WRITE FAULT TOTAL PERCENT COMMAND8 1019 gnn 0 20 0 0 0 0 0.00% dummy29 982 gnn 0 0 0 0 0 0 0.00% tcsh
10 1015 gnn 0 0 0 0 0 0 0.00% dummy111 1011 gnn 0 0 0 0 0 0 0.00% usdlogd
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 20 / 29
Testing
Balancing Results
I A slight increase in load average (0.96 to 0.99)I The load average remains slightly higherI The number if involuntary context switches does not change
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 21 / 29
Testing
Time Slice
I The default time slice is 13 ticksI Increase the time slice to 64, 128, and 256 ticksI At each level run for 15 minutes
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 22 / 29
Testing
Time Slice Evaluation
2800 20 40 60 80 100 120 140 160 180 200 220 240 260
22
0
2
4
6
8
10
12
14
16
18
20
Time Slice (ticks)
IVCS
W
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 23 / 29
Testing
How long does a switch take?
I A set of scheduler stats are availableI Need to build the kernel with SCHED_STATSI Locally added calls to rdtsc to mi_switchI Store the difference between these values on each switchI Crude but effectiveI Reading the sysctl every 3 seconds
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 24 / 29
Testing
Switch Timing Results
160 2 4 6 8 10 12 14
28,000
0
2000
4000
6000
8000
10,000
12,000
14,000
16,000
18,000
20,000
22,000
24,000
26,000
Time (Samples)
Cycl
es
Using cpuset
Slice 256
Slice 13 (Default)
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 25 / 29
Concluding Remarks
Scheduler Statistics
preempt Pre emptions anywhere in the systemowepreempt Were in a critical section and should have pre-empted
turnstile Switches due to mutex contentionsleepq Switches due to sleep
relinquish Called a yield functionneedresched Pre emption of user processes on exit from the kernel
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 26 / 29
Concluding Remarks
Turning All This Off
I Sometimes you know what must be doneI Assigning processes to cores is also possibleI See cpuset(4) man pageI See also Brooks Davis’ presentation
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 27 / 29
Concluding Remarks
Further Reading
I /usr/src/sys/kern/sched_ule.cI /usr/src/sys/kern/sched_switch.cI “ULE: A Modern Scheduler for FreeBSD”, by Jeff RobersonI “The Design and Implementation of the FreeBSD Operating
System”, by McKusick and Neville-NeilI R. Jain, “The Art of Computer Systems Performance Analysis:
Techniques for Experimental Design, Measurement, Simulation,and Modeling,”
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 28 / 29
Concluding Remarks
Questions?
George Neville-Neil ([email protected]) Tuning SCHED_ULE on FreeBSD May 7, 2009 29 / 29