Upload
vaughan-stokes
View
31
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Soft Timers Efficient Microsecond Software Timer Support for Network Processing. MOHIT ARON and PETER DRUSCHEL Rice University Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000. Presented By Glenn Diviney. What’s wrong with “Hard” timers?. - PowerPoint PPT Presentation
Citation preview
Soft TimersSoft TimersEfficient MicrosecondEfficient Microsecond
Software Timer Support for NetworkSoftware Timer Support for NetworkProcessingProcessing
MOHIT ARON and PETER DRUSCHELMOHIT ARON and PETER DRUSCHEL
Rice UniversityRice University
Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000.Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000.
Presented By Glenn DivineyPresented By Glenn Diviney
What’s wrong with “Hard” What’s wrong with “Hard” timers?timers? Polling vs. InterruptsPolling vs. Interrupts
Interrupts have high overhead and low latencyInterrupts have high overhead and low latency Polling has high latency and low overheadPolling has high latency and low overhead
Interruption is is expensiveInterruption is is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is CPU pipeline gets disrupted, cache and TLB get dirty. This is
expensiveexpensive
Generally not significant, so long as it’s done on the Generally not significant, so long as it’s done on the ms frequencyms frequency
Example: Network interrupts can occur at the rate of tens of Example: Network interrupts can occur at the rate of tens of microsecondsmicroseconds
Gigabit Ethernet requires a packet transmission every 12 Gigabit Ethernet requires a packet transmission every 12 µµs (1500 s (1500 bytes each)!bytes each)!
This amounts to a significant burden on the system if a context This amounts to a significant burden on the system if a context switch is involved each timeswitch is involved each time
InterruptsInterrupts
Device interrupts have a low latency but high Device interrupts have a low latency but high overhead due to the added context switchingoverhead due to the added context switching
The executing thread gets preemptedThe executing thread gets preempted Can occur at inopportune times which will slow down other Can occur at inopportune times which will slow down other
work due to the cache pollution, TLB pollution, and pipeline work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costspurge resulting in high indirect costs
PollingPolling
Polling has low overhead, but can have high latency Polling has low overhead, but can have high latency due to the frequency of the poll:due to the frequency of the poll:
The OS’s timer granularity depends directly on the frequency The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by of the timer interrupts, as well as the overhead incurred by the interruptthe interrupt
The cache, TLB, and pipeline costs can be avoided if the The cache, TLB, and pipeline costs can be avoided if the polling is done at the right timepolling is done at the right time
What’s a Soft Timer?What’s a Soft Timer?
““An operating system facility that allows efficient An operating system facility that allows efficient scheduling of software events at microsecond scheduling of software events at microsecond granularity.”granularity.”
Takes advantage of states where handlers can be Takes advantage of states where handlers can be invoked at low cost: “Trigger States”invoked at low cost: “Trigger States”
As in the case when the system is already context-switched As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while to the kernel… why not see if other work can be done “while you’re in there?”you’re in there?”
Schedule future events probabilisticallySchedule future events probabilistically
How Soft Timers work: How Soft Timers work: hardwarehardware Pentiums are usually shipped with a programmable Pentiums are usually shipped with a programmable
timer, which can be told how often to interrupt the timer, which can be told how often to interrupt the CPU.CPU.
These interrupts are usually assigned the highest priority in These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache missesthe OS, which can lead to TLB and cache misses
Testing indicated a total cost to be 4.45 Testing indicated a total cost to be 4.45 µµs on a 300mhz web s on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 server, which is insignificant at ms intervals but terrible at 20 µµs intervalss intervals
Timer chip programmed to interrupt at ms intervalsTimer chip programmed to interrupt at ms intervals
How Soft Timers work: How Soft Timers work: softwaresoftware At unpredictable intervals, the system will arrive at “trigger At unpredictable intervals, the system will arrive at “trigger
states”states” End of a system callEnd of a system call End of an exception handlerEnd of an exception handler End of an interrupt handlerEnd of an interrupt handler CPU idleCPU idle
In these states, invoking an event handler is just a function In these states, invoking an event handler is just a function call’s worth of overheadcall’s worth of overhead
TLB and Cache are already “disturbed” due to the triggering TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurredevent, so no additional cost should be incurred
In these states, the OS’s Soft Timer facility checks for any In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware pending events without incurring the cost of the hardware timertimer
Checks the clock (usually a CPU register) and compares it to Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event.the scheduled time of the earliest soft timer event.
The catchThe catch
Events might get delayed past a scheduled timeEvents might get delayed past a scheduled time Only the hardware interrupt is guaranteed to happen Only the hardware interrupt is guaranteed to happen
(providing an upper bound on execution)(providing an upper bound on execution) Other trigger states appear as random events to the system, Other trigger states appear as random events to the system,
or may not happen at all between hardware interruptsor may not happen at all between hardware interrupts
ImplementationImplementation Soft timers provide the following operationsSoft timers provide the following operations
measure_resolution(): returns a 64-bit value which represents measure_resolution(): returns a 64-bit value which represents the clock resolution in hertzthe clock resolution in hertz
measure_time(): returns a 64 bit value representing the measure_time(): returns a 64 bit value representing the current time whose resolution is given by current time whose resolution is given by measure_resolution()measure_resolution()
schedule_soft_event(T, handler): schedules “handler” to run schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future“T” ticks in the future
interrupt_clock_resolution(): provides the minimal resolution, interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupterwhich is that of the hardware interrupter
When invoked, the Soft Timer facility executes all When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value handlers which have a T that is less than the value given by a call to measure_time() by 1.given by a call to measure_time() by 1.
Implementation (cont)Implementation (cont) If X is the resolution of the hardware interrupter, the If X is the resolution of the hardware interrupter, the
events will be bounded by: events will be bounded by: TT < Actual Event Time < Actual Event Time < T + X + 1< T + X + 1 Just a reassurance that the event will happen eventuallyJust a reassurance that the event will happen eventually
Generally, the assumption is that the event will Generally, the assumption is that the event will happen as: happen as: Actual Event Time = Actual Event Time = T + dT + d
““d” is the “random” time between non-hardware triggersd” is the “random” time between non-hardware triggers
Applications Applications
Rate-based clockingRate-based clocking Recall 12Recall 12µs interrupt for gigabit Ethernetµs interrupt for gigabit Ethernet Transmission rate becomes variable, but the protocol Transmission rate becomes variable, but the protocol
could maintain an average “actual” rate and adjust the could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target ratescheduling accordingly to achieve a target rate
Network pollingNetwork polling Pure polling reduces interrupts and the impact of Pure polling reduces interrupts and the impact of
memory access, but it also can induce latencies by memory access, but it also can induce latencies by delaying packet processingdelaying packet processing
Soft Timers are a perfect alternative to pure polling or Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timera hybrid hardware approach with a network poll timer
Soft Timers show a latency close to interrupt driven Soft Timers show a latency close to interrupt driven processing in common caseprocessing in common case
Base overhead test setupBase overhead test setup
FreeBSD was extended to include the Soft Timer FreeBSD was extended to include the Soft Timer facilitiesfacilities They also added support for an the-chip APIC timer in They also added support for an the-chip APIC timer in
addition to the already-supported 8253 off-chip timeraddition to the already-supported 8253 off-chip timer
Connected “a number” of 300 to 500 Mhz machines to Connected “a number” of 300 to 500 Mhz machines to a 100mpbs networka 100mpbs network One acted as a web serverOne acted as a web server Others repeatedly requested a 6KB file to the point Others repeatedly requested a 6KB file to the point
where the web server was saturatedwhere the web server was saturated
Base overhead test resultsBase overhead test results
Used a “null handler” to measure the per-timer Used a “null handler” to measure the per-timer event costs:event costs: Of note: Of note:
The results suggest that the overhead does not scale The results suggest that the overhead does not scale with processor speedwith processor speed
Soft Timers caused no observable costSoft Timers caused no observable cost
Base overhead test resultsBase overhead test results
What about TLB and Cache misses?What about TLB and Cache misses? Touched 50 data cache linesTouched 50 data cache lines Touched 2 instruction cache lines on 2 separate Touched 2 instruction cache lines on 2 separate
pagespages All lines touched were All lines touched were different each time, and different each time, and
occurred at 10occurred at 10µs then µs then 2020µs intervalsµs intervals Results for events scheduled every 10Results for events scheduled every 10µsµs could not be could not be
obtained for 8253-based timers due to the high obtained for 8253-based timers due to the high overhead of that facilityoverhead of that facility
Base overhead test resultsBase overhead test results
Prior reasoning about Soft-Timers reducing TLB and Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmedCache misses is confirmed Data cache miss reduced by 20-31%Data cache miss reduced by 20-31% Instruction cache miss not reducedInstruction cache miss not reduced
Author assumes this is due to only 2 lines being Author assumes this is due to only 2 lines being touchedtouched
TLB misses reduced by 7-13%TLB misses reduced by 7-13%
Different workload test setupDifferent workload test setup
Intended to induce variation in when the trigger Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft events occur, which is the Achilles Heel of Soft TimersTimers Measured the distribution of times between Measured the distribution of times between
successive trigger stats for various workloads on a successive trigger stats for various workloads on a 300MHz PII machine300MHz PII machine
Mean granularity in the tens-of-Mean granularity in the tens-of-µs, with less than 6% µs, with less than 6% over 100µsover 100µs
Rate-Based Clocking: Timer OverheadRate-Based Clocking: Timer Overhead Web server TCP implementation using Soft Timers Web server TCP implementation using Soft Timers
vs. hardware timers vs. hardware timers At 100mbps, 1500 byte packet takes 120At 100mbps, 1500 byte packet takes 120µs so it has no µs so it has no
observable impact on the networkobservable impact on the network Therefore, the metric to isolate is the timer overhead, but Therefore, the metric to isolate is the timer overhead, but
possible benefits of rate-based clocking are not exposedpossible benefits of rate-based clocking are not exposed
- Cache/TLB pollution is 4-8% better
-Average time between transmissions only slightly higher with Soft Timers
- Huge reduction in overhead
TCP: targeting average TCP: targeting average transmission intervaltransmission interval It was suggested that TCP could control transmission It was suggested that TCP could control transmission
intervals by noting the average time since transmitting vs. intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next the requested transmission interval and adjusting the next Soft Timer interval accordingly.Soft Timer interval accordingly.
Two tests on a busy Apache Webserver (300MHz PII): one Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40with a target of 40µs, the other with a target of 60µsµs, the other with a target of 60µs
In most cases, the target rate was hit, although with more In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. deviation than the same rate with the hardware timers.
at line speed of 12at line speed of 12µs:µs: For the 60µs target, the ST transmit interval was 60µs with a std For the 60µs target, the ST transmit interval was 60µs with a std
dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 40µs target, the ST transmit interval was 40µs with a std For the 40µs target, the ST transmit interval was 40µs with a std
dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 Delta in timers for hardware interval accounted for because of Delta in timers for hardware interval accounted for because of
interrupt disabling in FreeBSDinterrupt disabling in FreeBSD
Network PerformanceNetwork Performance
Substantial improvements in response time and Substantial improvements in response time and throughput with rate-based clockingthroughput with rate-based clocking
Network pollingNetwork polling
Significant improvements across the board with Soft Significant improvements across the board with Soft TimersTimers
Using the on-chip Timer Using the on-chip Timer (APIC) … defeats “the catch”(APIC) … defeats “the catch” Used to shorten the tail on the event-time distributionUsed to shorten the tail on the event-time distribution
This timer can be scheduled and cancelled at a very low costThis timer can be scheduled and cancelled at a very low cost Invoked when a deadline is specified while scheduling the Invoked when a deadline is specified while scheduling the
next Soft Timer event. next Soft Timer event. This is used to provide an upper bound on execution with This is used to provide an upper bound on execution with
low overhead because it gets cancelled when the Soft Timer low overhead because it gets cancelled when the Soft Timer “beats it to the punch”“beats it to the punch”
ConclusionsConclusions
Soft timers allow for high granularity and low Soft timers allow for high granularity and low overhead when compared to hardware timersoverhead when compared to hardware timers
But they have a useful range between the highest granularity But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval of the hardware timer and the Soft Timer trigger interval (~10(~10µs-~100µs on 300 to 500MHz CPUs)µs-~100µs on 300 to 500MHz CPUs)
Useful range appears to widen as CPU gets faster, Useful range appears to widen as CPU gets faster, approximately linearly. approximately linearly.
Should be used for events requiring this kind of granularity, Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delaysassuming they can tolerate probabilistic delays
Can be integrated with the on-chip APIC to provide find-Can be integrated with the on-chip APIC to provide find-grained events with tight deadlines and low overheadgrained events with tight deadlines and low overhead
When restricted to the appropriate class of problems, they When restricted to the appropriate class of problems, they always seem to improve things always seem to improve things