Soft Timers Efficient Microsecond Software Timer Support for Network Processing

Soft TimersSoft TimersEfficient MicrosecondEfficient Microsecond

Software Timer Support for NetworkSoftware Timer Support for NetworkProcessingProcessing

MOHIT ARON and PETER DRUSCHELMOHIT ARON and PETER DRUSCHEL

Rice UniversityRice University

Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000.Published in ACM Transactions on Computer Systems, vol. 18(3), pp. 197.228, 2000.

Presented By Glenn DivineyPresented By Glenn Diviney

What’s wrong with “Hard” What’s wrong with “Hard” timers?timers? Polling vs. InterruptsPolling vs. Interrupts

Interrupts have high overhead and low latencyInterrupts have high overhead and low latency Polling has high latency and low overheadPolling has high latency and low overhead

Interruption is is expensiveInterruption is is expensive CPU pipeline gets disrupted, cache and TLB get dirty. This is CPU pipeline gets disrupted, cache and TLB get dirty. This is

expensiveexpensive

Generally not significant, so long as it’s done on the Generally not significant, so long as it’s done on the ms frequencyms frequency

Example: Network interrupts can occur at the rate of tens of Example: Network interrupts can occur at the rate of tens of microsecondsmicroseconds

Gigabit Ethernet requires a packet transmission every 12 Gigabit Ethernet requires a packet transmission every 12 µµs (1500 s (1500 bytes each)!bytes each)!

This amounts to a significant burden on the system if a context This amounts to a significant burden on the system if a context switch is involved each timeswitch is involved each time

InterruptsInterrupts

Device interrupts have a low latency but high Device interrupts have a low latency but high overhead due to the added context switchingoverhead due to the added context switching

The executing thread gets preemptedThe executing thread gets preempted Can occur at inopportune times which will slow down other Can occur at inopportune times which will slow down other

work due to the cache pollution, TLB pollution, and pipeline work due to the cache pollution, TLB pollution, and pipeline purge resulting in high indirect costspurge resulting in high indirect costs

PollingPolling

Polling has low overhead, but can have high latency Polling has low overhead, but can have high latency due to the frequency of the poll:due to the frequency of the poll:

The OS’s timer granularity depends directly on the frequency The OS’s timer granularity depends directly on the frequency of the timer interrupts, as well as the overhead incurred by of the timer interrupts, as well as the overhead incurred by the interruptthe interrupt

The cache, TLB, and pipeline costs can be avoided if the The cache, TLB, and pipeline costs can be avoided if the polling is done at the right timepolling is done at the right time

What’s a Soft Timer?What’s a Soft Timer?

““An operating system facility that allows efficient An operating system facility that allows efficient scheduling of software events at microsecond scheduling of software events at microsecond granularity.”granularity.”

Takes advantage of states where handlers can be Takes advantage of states where handlers can be invoked at low cost: “Trigger States”invoked at low cost: “Trigger States”

As in the case when the system is already context-switched As in the case when the system is already context-switched to the kernel… why not see if other work can be done “while to the kernel… why not see if other work can be done “while you’re in there?”you’re in there?”

Schedule future events probabilisticallySchedule future events probabilistically

How Soft Timers work: How Soft Timers work: hardwarehardware Pentiums are usually shipped with a programmable Pentiums are usually shipped with a programmable

timer, which can be told how often to interrupt the timer, which can be told how often to interrupt the CPU.CPU.

These interrupts are usually assigned the highest priority in These interrupts are usually assigned the highest priority in the OS, which can lead to TLB and cache missesthe OS, which can lead to TLB and cache misses

Testing indicated a total cost to be 4.45 Testing indicated a total cost to be 4.45 µµs on a 300mhz web s on a 300mhz web server, which is insignificant at ms intervals but terrible at 20 server, which is insignificant at ms intervals but terrible at 20 µµs intervalss intervals

Timer chip programmed to interrupt at ms intervalsTimer chip programmed to interrupt at ms intervals

How Soft Timers work: How Soft Timers work: softwaresoftware At unpredictable intervals, the system will arrive at “trigger At unpredictable intervals, the system will arrive at “trigger

states”states” End of a system callEnd of a system call End of an exception handlerEnd of an exception handler End of an interrupt handlerEnd of an interrupt handler CPU idleCPU idle

In these states, invoking an event handler is just a function In these states, invoking an event handler is just a function call’s worth of overheadcall’s worth of overhead

TLB and Cache are already “disturbed” due to the triggering TLB and Cache are already “disturbed” due to the triggering event, so no additional cost should be incurredevent, so no additional cost should be incurred

In these states, the OS’s Soft Timer facility checks for any In these states, the OS’s Soft Timer facility checks for any pending events without incurring the cost of the hardware pending events without incurring the cost of the hardware timertimer

Checks the clock (usually a CPU register) and compares it to Checks the clock (usually a CPU register) and compares it to the scheduled time of the earliest soft timer event.the scheduled time of the earliest soft timer event.

The catchThe catch

Events might get delayed past a scheduled timeEvents might get delayed past a scheduled time Only the hardware interrupt is guaranteed to happen Only the hardware interrupt is guaranteed to happen

(providing an upper bound on execution)(providing an upper bound on execution) Other trigger states appear as random events to the system, Other trigger states appear as random events to the system,

or may not happen at all between hardware interruptsor may not happen at all between hardware interrupts

ImplementationImplementation Soft timers provide the following operationsSoft timers provide the following operations

measure_resolution(): returns a 64-bit value which represents measure_resolution(): returns a 64-bit value which represents the clock resolution in hertzthe clock resolution in hertz

measure_time(): returns a 64 bit value representing the measure_time(): returns a 64 bit value representing the current time whose resolution is given by current time whose resolution is given by measure_resolution()measure_resolution()

schedule_soft_event(T, handler): schedules “handler” to run schedule_soft_event(T, handler): schedules “handler” to run “T” ticks in the future“T” ticks in the future

interrupt_clock_resolution(): provides the minimal resolution, interrupt_clock_resolution(): provides the minimal resolution, which is that of the hardware interrupterwhich is that of the hardware interrupter

When invoked, the Soft Timer facility executes all When invoked, the Soft Timer facility executes all handlers which have a T that is less than the value handlers which have a T that is less than the value given by a call to measure_time() by 1.given by a call to measure_time() by 1.

Implementation (cont)Implementation (cont) If X is the resolution of the hardware interrupter, the If X is the resolution of the hardware interrupter, the

events will be bounded by: events will be bounded by: TT < Actual Event Time < Actual Event Time < T + X + 1< T + X + 1 Just a reassurance that the event will happen eventuallyJust a reassurance that the event will happen eventually

Generally, the assumption is that the event will Generally, the assumption is that the event will happen as: happen as: Actual Event Time = Actual Event Time = T + dT + d

““d” is the “random” time between non-hardware triggersd” is the “random” time between non-hardware triggers

Applications Applications

Rate-based clockingRate-based clocking Recall 12Recall 12µs interrupt for gigabit Ethernetµs interrupt for gigabit Ethernet Transmission rate becomes variable, but the protocol Transmission rate becomes variable, but the protocol

could maintain an average “actual” rate and adjust the could maintain an average “actual” rate and adjust the scheduling accordingly to achieve a target ratescheduling accordingly to achieve a target rate

Network pollingNetwork polling Pure polling reduces interrupts and the impact of Pure polling reduces interrupts and the impact of

memory access, but it also can induce latencies by memory access, but it also can induce latencies by delaying packet processingdelaying packet processing

Soft Timers are a perfect alternative to pure polling or Soft Timers are a perfect alternative to pure polling or a hybrid hardware approach with a network poll timera hybrid hardware approach with a network poll timer

Soft Timers show a latency close to interrupt driven Soft Timers show a latency close to interrupt driven processing in common caseprocessing in common case

Base overhead test setupBase overhead test setup

FreeBSD was extended to include the Soft Timer FreeBSD was extended to include the Soft Timer facilitiesfacilities They also added support for an the-chip APIC timer in They also added support for an the-chip APIC timer in

addition to the already-supported 8253 off-chip timeraddition to the already-supported 8253 off-chip timer

Connected “a number” of 300 to 500 Mhz machines to Connected “a number” of 300 to 500 Mhz machines to a 100mpbs networka 100mpbs network One acted as a web serverOne acted as a web server Others repeatedly requested a 6KB file to the point Others repeatedly requested a 6KB file to the point

where the web server was saturatedwhere the web server was saturated

Base overhead test resultsBase overhead test results

Used a “null handler” to measure the per-timer Used a “null handler” to measure the per-timer event costs:event costs: Of note: Of note:

The results suggest that the overhead does not scale The results suggest that the overhead does not scale with processor speedwith processor speed

Soft Timers caused no observable costSoft Timers caused no observable cost


What about TLB and Cache misses?What about TLB and Cache misses? Touched 50 data cache linesTouched 50 data cache lines Touched 2 instruction cache lines on 2 separate Touched 2 instruction cache lines on 2 separate

pagespages All lines touched were All lines touched were different each time, and different each time, and

occurred at 10occurred at 10µs then µs then 2020µs intervalsµs intervals Results for events scheduled every 10Results for events scheduled every 10µsµs could not be could not be

obtained for 8253-based timers due to the high obtained for 8253-based timers due to the high overhead of that facilityoverhead of that facility


Prior reasoning about Soft-Timers reducing TLB and Prior reasoning about Soft-Timers reducing TLB and Cache misses is confirmedCache misses is confirmed Data cache miss reduced by 20-31%Data cache miss reduced by 20-31% Instruction cache miss not reducedInstruction cache miss not reduced

Author assumes this is due to only 2 lines being Author assumes this is due to only 2 lines being touchedtouched

TLB misses reduced by 7-13%TLB misses reduced by 7-13%

Different workload test setupDifferent workload test setup

Intended to induce variation in when the trigger Intended to induce variation in when the trigger events occur, which is the Achilles Heel of Soft events occur, which is the Achilles Heel of Soft TimersTimers Measured the distribution of times between Measured the distribution of times between

successive trigger stats for various workloads on a successive trigger stats for various workloads on a 300MHz PII machine300MHz PII machine

Mean granularity in the tens-of-Mean granularity in the tens-of-µs, with less than 6% µs, with less than 6% over 100µsover 100µs

Stats on the distributionsStats on the distributions

Rate-Based Clocking: Timer OverheadRate-Based Clocking: Timer Overhead Web server TCP implementation using Soft Timers Web server TCP implementation using Soft Timers

vs. hardware timers vs. hardware timers At 100mbps, 1500 byte packet takes 120At 100mbps, 1500 byte packet takes 120µs so it has no µs so it has no

observable impact on the networkobservable impact on the network Therefore, the metric to isolate is the timer overhead, but Therefore, the metric to isolate is the timer overhead, but

possible benefits of rate-based clocking are not exposedpossible benefits of rate-based clocking are not exposed

- Cache/TLB pollution is 4-8% better

-Average time between transmissions only slightly higher with Soft Timers

- Huge reduction in overhead

TCP: targeting average TCP: targeting average transmission intervaltransmission interval It was suggested that TCP could control transmission It was suggested that TCP could control transmission

intervals by noting the average time since transmitting vs. intervals by noting the average time since transmitting vs. the requested transmission interval and adjusting the next the requested transmission interval and adjusting the next Soft Timer interval accordingly.Soft Timer interval accordingly.

Two tests on a busy Apache Webserver (300MHz PII): one Two tests on a busy Apache Webserver (300MHz PII): one with a target of 40with a target of 40µs, the other with a target of 60µsµs, the other with a target of 60µs

In most cases, the target rate was hit, although with more In most cases, the target rate was hit, although with more deviation than the same rate with the hardware timers. deviation than the same rate with the hardware timers.

at line speed of 12at line speed of 12µs:µs: For the 60µs target, the ST transmit interval was 60µs with a std For the 60µs target, the ST transmit interval was 60µs with a std

dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7dev of 35.9 vs. the hardware at 63µs with a std dev of 27.7 For the 40µs target, the ST transmit interval was 40µs with a std For the 40µs target, the ST transmit interval was 40µs with a std

dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8dev of 34.5 vs. the hardware at 43.6µs with a std dev of 26.8 Delta in timers for hardware interval accounted for because of Delta in timers for hardware interval accounted for because of

interrupt disabling in FreeBSDinterrupt disabling in FreeBSD

Network PerformanceNetwork Performance

Substantial improvements in response time and Substantial improvements in response time and throughput with rate-based clockingthroughput with rate-based clocking

Network pollingNetwork polling

Significant improvements across the board with Soft Significant improvements across the board with Soft TimersTimers

Using the on-chip Timer Using the on-chip Timer (APIC) … defeats “the catch”(APIC) … defeats “the catch” Used to shorten the tail on the event-time distributionUsed to shorten the tail on the event-time distribution

This timer can be scheduled and cancelled at a very low costThis timer can be scheduled and cancelled at a very low cost Invoked when a deadline is specified while scheduling the Invoked when a deadline is specified while scheduling the

next Soft Timer event. next Soft Timer event. This is used to provide an upper bound on execution with This is used to provide an upper bound on execution with

low overhead because it gets cancelled when the Soft Timer low overhead because it gets cancelled when the Soft Timer “beats it to the punch”“beats it to the punch”

ConclusionsConclusions

Soft timers allow for high granularity and low Soft timers allow for high granularity and low overhead when compared to hardware timersoverhead when compared to hardware timers

But they have a useful range between the highest granularity But they have a useful range between the highest granularity of the hardware timer and the Soft Timer trigger interval of the hardware timer and the Soft Timer trigger interval (~10(~10µs-~100µs on 300 to 500MHz CPUs)µs-~100µs on 300 to 500MHz CPUs)

Useful range appears to widen as CPU gets faster, Useful range appears to widen as CPU gets faster, approximately linearly. approximately linearly.

Should be used for events requiring this kind of granularity, Should be used for events requiring this kind of granularity, assuming they can tolerate probabilistic delaysassuming they can tolerate probabilistic delays

Can be integrated with the on-chip APIC to provide find-Can be integrated with the on-chip APIC to provide find-grained events with tight deadlines and low overheadgrained events with tight deadlines and low overhead

When restricted to the appropriate class of problems, they When restricted to the appropriate class of problems, they always seem to improve things always seem to improve things

Q/AQ/A

Documents

Soft Timers Efficient Microsecond Software Timer Support for Network Processing